Part of Human Scale Natural Language Processing.
The intention of this exercise is to get you thinking about textual
composition as a kind of collage, and how the affordances of computation
affect the process of collage and textual composition in general.
The exercise
Here’s the exercise:
- Gather a corpus, maybe fifty to one hundred words. This can be your
writing (copied and pasted, or free-written in the moment), or writing
that you found on the internet.
- Put this text into a document that can be shared and edited by
others (e.g., a google doc).
- Devise a means of splitting the text up into parts, and split the
text into those parts. (Letter by letter? word by word? something
else?)
- Send your document to everyone in your breakout room. At the end,
you’ll access to your document plus the documents of everyone else in
your group. (You might decide to put all of your text units into the
same document.)
- Create a short text only by re-arranging the units of text
that others in your group have sent you.
- Show off your creation and discuss.
I’m allocating about fifteen minute for steps (1)–(3), and twenty
minutes for steps (4)–(6). (Don’t worry about creating something
polished or even interesting in this period—first thought best thought.)
We’ll reconvene as a class afterwards and discuss the following
questions:
- What kinds of writing are facilitated by this technique? What
principles of composition did you use to create your text?
- What was difficult to do with this technique?
- What specifically computational approaches did you exploit
(if any)?
- How does it differ from other forms of composition?
- “Authorship”
is a fake idea but let’s talk about it!
Some
underlying (questionable) assumptions of natural language
processing
Forms of text analysis based the following assumptions pre-date
computation (see, e.g., ancient stichometry
the idea of a concordance.
Nevertheless, these assumptions seem to dovetail perfectly with the
affordances of computation.
- Language consists of a one-dimensional array of tokens. (“Token” in
the sense of an indivisable unit that represents some stretch of
language, such as a word or a character.)
- Two or more tokens can belong to the same “type,” i.e., they can be
“identical.” In other words, it’s possible to say “the same thing” more
than once, and tokens retain this identity regardless of context.
- It’s possible to determine (or at least infer) the semantics (i.e.,
meaning) of a stretch of language through an examination of the
arrangements and statistical properties of tokens. A simple example of
this is the
word cloud, in which the typographical size of words correlates to
the number of times that token occurs in the text under analysis,
communicating a relative prevalence of topics related to those words.
More sophisticated examples include automated text summarization and word
vectors.
Some affordances of
computation
Likewise, forms of text composition that use rules and procedure
predate the digital computer. (I’d include Tzara’s “How to make a
Dadaist poem” in this category, along with Su Hui’s “Xuánjī Tú”, and
Jackson Mac Low’s rule-driven
poetry among many others.) However, computation opens up new
possibilities in this style of composition, owing to a few of its
affordances, e.g.:
- Computers can manipulate symbols quickly, performing
searches and arrangements in milliseconds that would take an unaided
person months or years;
- Computers can work on large amounts of data, and procedures
that work with a small dataset can often be applied with no change to
much larger datasets
- Computers can and must be programmed, which both requires
that rules and procedure are expressed unambiguously, and allows for
those rules and procedures to be repeated automatically, without
variation
- Computers facilitate lossless exchange of information through
digital copies, which can easily be exchanged and catalogued (via
various forms of networking).