Collaborative digital Dadaist writing

Part of Human Scale Natural Language Processing.

The intention of this exercise is to get you thinking about textual composition as a kind of collage, and how the affordances of computation affect the process of collage and textual composition in general.

The exercise

Here’s the exercise:

Gather a corpus, maybe fifty to one hundred words. This can be your writing (copied and pasted, or free-written in the moment), or writing that you found on the internet.
Put this text into a document that can be shared and edited by others (e.g., a google doc).
Devise a means of splitting the text up into parts, and split the text into those parts. (Letter by letter? word by word? something else?)
Send your document to everyone in your breakout room. At the end, you’ll access to your document plus the documents of everyone else in your group. (You might decide to put all of your text units into the same document.)
Create a short text only by re-arranging the units of text that others in your group have sent you.
Show off your creation and discuss.

I’m allocating about fifteen minute for steps (1)–(3), and twenty minutes for steps (4)–(6). (Don’t worry about creating something polished or even interesting in this period—first thought best thought.) We’ll reconvene as a class afterwards and discuss the following questions:

What kinds of writing are facilitated by this technique? What principles of composition did you use to create your text?
What was difficult to do with this technique?
What specifically computational approaches did you exploit (if any)?
How does it differ from other forms of composition?
“Authorship” is a fake idea but let’s talk about it!

Some underlying (questionable) assumptions of natural language processing

Forms of text analysis based the following assumptions pre-date computation (see, e.g., ancient stichometry the idea of a concordance. Nevertheless, these assumptions seem to dovetail perfectly with the affordances of computation.

Language consists of a one-dimensional array of tokens. (“Token” in the sense of an indivisable unit that represents some stretch of language, such as a word or a character.)
Two or more tokens can belong to the same “type,” i.e., they can be “identical.” In other words, it’s possible to say “the same thing” more than once, and tokens retain this identity regardless of context.
It’s possible to determine (or at least infer) the semantics (i.e., meaning) of a stretch of language through an examination of the arrangements and statistical properties of tokens. A simple example of this is the word cloud, in which the typographical size of words correlates to the number of times that token occurs in the text under analysis, communicating a relative prevalence of topics related to those words. More sophisticated examples include automated text summarization and word vectors.

Some affordances of computation

Likewise, forms of text composition that use rules and procedure predate the digital computer. (I’d include Tzara’s “How to make a Dadaist poem” in this category, along with Su Hui’s “Xuánjī Tú”, and Jackson Mac Low’s rule-driven poetry among many others.) However, computation opens up new possibilities in this style of composition, owing to a few of its affordances, e.g.:

Computers can manipulate symbols quickly, performing searches and arrangements in milliseconds that would take an unaided person months or years;
Computers can work on large amounts of data, and procedures that work with a small dataset can often be applied with no change to much larger datasets
Computers can and must be programmed, which both requires that rules and procedure are expressed unambiguously, and allows for those rules and procedures to be repeated automatically, without variation
Computers facilitate lossless exchange of information through digital copies, which can easily be exchanged and catalogued (via various forms of networking).