Human Scale Natural Language Processing

Essential information

SFPC, online, summer 2024. Instructor: Allison Parrish. Send me e-mail.

Important links: Google Drive folders for section 1 and section 2.

Description

Natural Language Processing (NLP) is a subfield of AI that drives pervasive technologies like spell check, search, bots and content moderation. This multi-billion dollar, energy-intensive industry increasingly dictates the shape of everyday language while perpetuating harmful biases. We will practice “human-scale” natural language processing by forgoing pre-existing datasets and models in favor of communally-written texts. This includes exercises in which participants invent new textual categories and hand-tag each other’s writing. Participants will learn the basics of text processing, analysis and generation in Python, including parsing, regular expressions, Markov chains and vector similarity.

Many exercises will be performed using analog media as well (for example, cut-ups, free-writes, etc.). In addition, we will prioritize technical approaches that function well on low-end hardware rather than carbon-intensive computation.

Course objectives

Ethos and methodology

This is a hands-on class, meaning that you will be writing code. Novice programmers will find plenty of code to re-use and re-assemble in the example notebooks. Programmers with more experience are encouraged to experiment with and build upon the material presented in class. The quality of student work is correlated most closely with curiosity and creative concepts, not with technical proficiency.

We are using the Python programming language. Python is widely used in many areas of computational practice, from academia to entrepreneurship; it runs on both supercomputer clusters and microcontrollers. Python is free and open source and has a vibrant community of contributors and enthusiasts. I think Python is a versatile and powerful language that is nonetheless friendly for beginners. If you’re interested in supplementing your Python instruction beyond the content of this class, I’ve linked to a number of resources below.

I strongly discourage the use of LLM-based programming assistants. The purpose of writing a computer program is to produce an unambiguous statement of your intent; you can’t do this unless you understand what you’re writing. Furthermore, evidence suggests that the use of LLM programming assistants is detrimental to both students of programming and software engineers. Refer, e.g., to Vaithilingam et al., whose study shows that LLM-based code generation tools do not “improve the task completion time or success rate,” but do lead to “difficulties in understanding, editing, and debugging” that “significantly hinder” programmers’ “task-solving effectiveness.” Prefer bringing your question to me, a co-teacher, or a fellow student before bringing it to ChatGPT.

We’ll be using Google’s collaboration tools in class (e.g., Google Drive, Google Docs, etc.). As such, you’ll need a Google account. Let me know if this is a problem for you, and we’ll work something out.

Finally, I will be conducting this class in the English language, and many of the code examples will refer to grammatical and linguistic properties of English specifically. Likewise, student contributions to the collective corpus should be in English. This is unfortunate but necessary for both collaboration and for the deep dive into linguistic structure that we’ll be doing in the class. On the final day of class, we’ll have a discussion about the wisdom of an “English-only” rule, and what in the class would have to change for it to be compatible with languages other than English, or many languages at once.

Schedule

Session 1: Text as material

Section 1: 2024-06-13; section 2: 2024-06-15.

Session 2: Text and procedure

Section 1: 2024-06-20; section 2: 2024-06-22.

Session 3: Language models

Section 1: 2024-06-27; section 2: 2024-06-29.

Session 4: Syntax

Section 1: 2024-07-11; section 2: 2024-07-13.

Session 5: Semantics

Section 1: 2024-07-18; section 2: 2024-07-20.

Resources for learning Python

We’re going to be thorough with the basics, but we’re also going to move fast. Fortunately, there are many resources out there for learning Python. You might benefit from going through some of them. I recommend:

Reading list

Due to time constraints, this class does not incorporate required readings or reading discussions. I’ve included below a brief bibliography of papers and articles that are relevant to the class. I’d be happy to discuss the content of any of these papers with you individually, or lead a small extracurricular reading discussion group. (Also pleased to provide PDFs for anything you don’t have access to individually.) Get in touch if you need more recommendations!

Please also check the online syllabi for Reading and writing electronic text, Computational letterforms and layout and Computational approaches to narrative, three related classes that I teach at NYU ITP/IMA.

Papers and articles

Baraka, Amiri. “Technology & Ethos.” Raise, Race, Rays, Raze; Essays since 1965, Random House, 1972, pp. 155–58.

Booten, Kyle, and Lillian-Yvonne Bertram. “Unbreathed Words: A Conversation with Lillian-Yvonne Bertram.ASAP/Journal, vol. 7, no. 2, 2022, pp. 261–72.

Drucker, Johanna. “Why Distant Reading Isn’t.” PMLA, vol. 132, no. 3, May 2017, pp. 628–35. https://doi.org/10.1632/pmla.2017.132.3.628.

Giles, Harry Josephine. “Some Strategies of Bot Poetics.” Harry Josephine Giles, 6 Apr. 2016, https://harrygiles.org/2016/04/06/some-strategies-of-bot-poetics/.

Golumbia, David. “ChatGPT Should Not Exist.” Medium, 14 Dec. 2022, https://davidgolumbia.medium.com/chatgpt-should-not-exist-aab0867abace.

Hovy, Dirk, and Shannon L. Spruit. “The Social Impact of Natural Language Processing.” Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, 2016, pp. 591–98. ACLWeb, https://doi.org/10.18653/v1/P16-2096.

Long, Karawynn. “Language Is a Poor Heuristic for Intelligence.” Nine Lives, 26 June 2023, https://buttondown.email/ninelives/archive/language-is-a-poor-heuristic-for-intelligence/.

McQuillan, Dan. “Predicted Benefits, Proven Harms: How AI’s Algorithmic Violence Emerged from Our Own Social Matrix.” The Sociological Review Magazine, June 2023. https://doi.org/10.51428/tsr.ekpj9730.

Morris, John. “How to Write Poems with a Computer.” Michigan Quarterly Review, vol. 6, no. 1, 1967, pp. 17–20.

Pipkin, Everest. “A Long History of Generated Poetics: Cutups from Dickinson to Melitzah.” Medium, 20 Sept. 2016, https://everestpipkin.medium.com/a-long-history-of-generated-poetics-cutups-from-dickinson-to-melitzah-fce498083233.

Soria, Claudia. “Decolonizing Minority Language Technology.” State of the Internet’s Languages Report, 1 Jan. 2020, https://internetlanguages.org/en/stories/decolonizing-minority-language/.

Trettien, Whitney Anne. Computers, Cut-Ups and Combinatory Volvelles: An Archaeology of Text-Generating Mechanisms. 2009. MIT, http://whitneyannetrettien.com/thesis/.

Whalen, Zach. “The Many Authors of The Several Houses of Brian, Spencer, Liam, Victoria, Brayden, Vincent, and Alex: Authorship, Agency, and Appropriation.Journal of Creative Writing Studies, vol. 4, no. 1, 2019, p. 45.

Books

Bertram, Lillian-Yvonne. Travesty Generator. Noemi Press, 2019.

Funkhouser, Chris. Prehistoric Digital Poetry: An Archaeology of Forms, 1959-1995. University of Alabama Press, 2007.

Hartman, Charles O. Virtual Muse: Experiments in Computer Poetry. Wesleyan University Press, 1996.

Mac Low, Jackson, and Anne Tardos. Thing of Beauty: New and Selected Works. University of California Press, 2007.

Anything and everything from Counterpath’s Using Electricity series.