Toward a Minor Tech:Susanne 500
Small Talk: About the Size of Language Models
Susanne Förster
Large language models are based on the promise that the larger the underlying data set, the better the performance. This development is particularly related to the Transformer Network architecture, which was introduced by Google in 2017 and is used in GPT-3 and other leading large language models.
Many critical researchers have pointed out how the composition of training data has resulted in the reproduction of societal biases. Crawled from the Internet, the data and thus the generated language mainly represents hegemonic identities whilst discriminating against marginalized ones (Benjamin 2019). Other authors have observed the semantically correct, but factually wrong, output of models that may pose additional dangers if adopted by journalism or medicine (Bender et al., 2021). Furthermore, the knowledge incorporated in the language models is static, implicit and thus inaccessible: since the models cannot learn after the training is completed, they will always produce outdated and factually incorrect statements.
In response, there have been demands to scale the models down – i.e., a training with the smallest possible number of parameters. Technology Entrepreneur David Chapman recently tweeted: "AI labs […] should compete to make [language models] smaller, while maintaining performance. Smaller LMs will know less (this is good!), will be less expensive to train and run, and will be easier to understand and validate." (Chapman 2022). Moreover, they should “retrieve 'knowledge' from a defined text database instead." (ibid).
The linking of external databases such as Wikipedia with large language models is already a common practice. They are considered as sources of comprehensive and up-to-date knowledge. The models are trained to extract information from database articles to be then casually inserted into a text or conversation without sounding like an encyclopedia entry themselves, thereby appearing semantically and factually correct. The articles and database entries are accessible and interchangeable, so the process can scale as needed.
With the imagining of small models as “free of knowledge”, the focus changes: now not only size and scale are considered a marker of performance, but also the infrastructural and relational linking of language models to external databases. This linking of small language models to external databases thus represents a transversal shift in scale: While the size of the language models is downscaled, the linking with databases implies a simultaneous upscaling. By linking it to databases and archives, the world appears once more computable and thereby knowable. In this regard, it follows a colonial logic. But at the same time, this architecture has subversive potential as has been opened up for a variety of actors outside of Big Tech and thus might be considered a Minor Tech – or Minor Tech in waiting.