Toward a Minor Tech:Foerster

From creative crowd wiki
Jump to navigation Jump to search

Small Talk: About the Size of Language Models

Since the end of the 2010s, a trend towards the development of increasingly large, complex and capable machine learning models can be observed (Dean 2022). In February 2020, Microsoft released T-NLG, the largest language model to date with 17 trillion parameters (Rosset 2020). Just three months later, it was far surpassed by OpenAI's GPT-3 with a scope of 175 trillion (Brown et al. 2020). Large language models are based on the promise that the larger the underlying data set, the better the performance. Indeed, the generated text often seems so coherent and convincing, that it might be difficult to distinguish it from human language.

However, the big data paradigm comes with a high cost: Both the size as well as the architecture of large language models have been associated with various risks and dangers, including a heavy environmental footprint, the reproduction of societal biases and the creation of deceitful language (Bender et al., 2021, Luitse and Denkena 2021). The infrastructure needed for these models’ training requires huge amounts of computing power. The training is extremely expensive and time-consuming and therewith promotes the field’s monopolization with a few big tech companies such as Google, Microsoft, Meta and OpenAI dominating the market.

In what follows, however, I will focus on the knowledge that is implicitly inscribed in large language models particularly. Many critical researchers so far have pointed out the problematic composition of the training data resulting in the aforementioned bias. Crawled from the Internet, the data and thus the generated language mainly represent hegemonic, i.e., mostly white, privileged, young, male and anglophone perspectives that have discriminatory effects against marginalized positions (West et al., 2019; Benjamin 2019). Other authors have pointed to the difference between form and content (Bunz 2019; Bender et al., 2021). The language may be semantically correct and convincing, but the meaning of the statements is merely calculated and may be fundamentally wrong. Depending on the use of the models, f.e., in journalism or the medical field, this characteristic of large language models might pose another danger. On the grounds that the risk of malicious use was too high, OpenAI initially restricted access to GPT-2 in 2019 and therewith further increased attention and the program's appeal. It was assumed that the generated text appeared indistinguishable from that of human writers. Various forms of fakes, such as misleading news articles, automated faked content or the impersonation of others were considered a risk that OpenAI wanted to avoid (OpenAI 2019). Moreover, the knowledge incorporated in the language models is static: since the models cannot learn after the training is completed, they increasingly produce outdated and factually incorrect statements.

Taking up the aforementioned criticism, there are debates and frequent calls to develop fundamentally smaller language models. Among others, David Chapman, who together with Phil Agre developed alternatives to prevailing planning approaches in artificial intelligence in the late 1980s (Agre and Chapman 1990), recently called for the development of the smallest language models possible: "AI labs, instead of competing to make their LMs bigger, should compete to make them smaller, while maintaining performance. Smaller LMs will know less (this is good!), will be less expensive to train and run, and will be easier to understand and validate." (Chapman 2022). More precisely, language models should "'know' as little as possible-and retrieve 'knowledge' from a defined text database instead." (ibid.). In August 2022, MetaAI has already released Atlas, a small language model that was extended with an external database and which outperformed significantly larger models with a fraction of the parameter count (Izacard et al. 2022).

Small language models are seen as an opportunity to reduce the ecological and economic costs of training (e.g., Schick and Schütze 2021). In addition, they are associated with the possibility of being more transparent and thus easier to control and adapt. The previously dominant politics of scale, I want to argue, is now joined by a politics of architecture. This shift might change the previous market structure and facilitate access to the development of language models for smaller and more diverse players.

The linking of external databases as sources of the latest knowledge and information with large language models is already a common practice and is currently discussed under the terms knowledge-grounding or retrieval augmentation (f.e. Lewis et al. 2020).

While the knowledge in large language models is implicit and shows up only performatively in the generated language, the knowledge of the databases is explicit, accessible and adaptable. The models are trained to extract information from database articles in a structured way and casually insert it into a text or conversation without sounding like an encyclopedia entry themselves. The articles and database entries can be interchanged, so the process can scale as needed.

I thus want to argue that the linking of small language models and external databases represent a transversal shift in scale: While the size of the language models is downscaled, the linking with databases implies an upscaling.

However, the databases might be another source of bias and potential misinformation. Wikipedia is frequently selected as knowledge base in order to ground the conversation with broad and actual information (Dinan et al. 2019). However, Wikipedia is a contested space: more than 90 percent of the authors are male, primarily from Europe and the USA (Meta contributors 2018). Of 800.000 biographies of the German version, only 16% cover women (Wikipedia 2022). Moreover, Wikipedia articles are not necessarily factually correct.

This case, nevertheless, opens up a discussion about alternative, sustainable and potentially non-discriminatory language model architectures and their training data as well as fundamental questions such as: how should machines speak and who has the authority, and the means of building and adapting these programs?

References

Agre, Philip E. and David Chapman. 1990. "What Are Plans for?" In Pattie Maes, ed. Designing Autonomous Agents: Theory and Practice from Biology to Engineering and Back. Cambridge: MIT Press, 17-34.

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. Virtual Event Canada: ACM.

Benjamin, Ruha. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Cambridge, UK: Polity.

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” arXiv.

Bunz, Mercedes. 2019. “The Calculation of Meaning: On the Misunderstanding of New Artificial Intelligence as Culture.” Culture, Theory and Critique 60 (3–4): 264–78.

Chapman, David [Meaningness]. (2022, October 1). AI labs should compete to build the smallest possible language models … [Tweet]. Twitter. https://twitter.com/Meaningness/status/1576195630891819008

Dinan, Emily, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. 2019. “Wizard of Wikipedia: Knowledge-Powered Conversational Agents.” arXiv. http://arxiv.org/abs/1811.01241.

Dean, Jeffrey. 2022. “A Golden Decade of Deep Learning: Computing Systems & Applications.” Daedalus 151 (2): 58–74.

Izacard, Gautier, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022. “Few-Shot Learning with Retrieval Augmented Language Models.”

Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, et al. 2020. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Luitse, Dieuwertje, and Wiebke Denkena. 2021. “The Great Transformer: Examining the Role of Large Language Models in the Political Economy of AI.” Big Data & Society 8 (2): 1–14.

Meta contributors. 2018. "Community Insights/2018 Report/Contributors," Meta, discussion about Wikimedia projects, https://meta.wikimedia.org/w/index.php?title=Community_Insights/2018_Report/Contributors&oldid=19625522 (last retrieved December 15, 2022).

OpenAI. 2019. “Better Language Models and Their Implications.” February 14, 2019. https://openai.com/blog/better-language-models/. (Last retrieved: December 15, 2022)

Rosset, Corby. 2020. “Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft.” Microsoft Research Blog (blog). February 10, 2020. https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/.

Schick, Timo, and Hinrich Schütze. ‘It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners’. arXiv, 12 April 2021.

Wikipedia. 2022. “Portal:Frauen/Biografien/Statistiken”, https://de.wikipedia.org/wiki/Portal:Frauen/Biografien/Statistiken (last retrieved December 15, 2022).

West, Sarah Myers, Meredith Whittaker and Kate Crawford. 2019. Discriminating Systems. Gender, Race, and Power in AI. AI Now Institute.