Small Talk: About the Size of Language Models

Keywords: Database, Digital Infrastructures, Language Models, Machine Learning, Hallucination

Abstract:

In Need of More Data

In recent years, increasingly large, complex and capable machine learning models have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is based on their generative capabilities to produce a wide range of different texts and images. Since its release in November 2022, the dialogue-based model ChatGPT generated a hype of unprecedented dimensions. In only five days after its release, the program had already reached one million users (Brockman). Provided with a question, exemplary text or code snippet, ChatGPT mimics a wide range of styles from different authors and text categories such as poetry and prose, student essays and exams or code corrections and debug logs. If the output does not meet the user's expectations, the prompt and thus the output of the program can be corrected and changed. Soon after its release, the end of both traditional knowledge and creative work as well as classical forms of scholarly and academic testing seemed close and were heavily debated. Endowed with emergent capabilities, the functional openness of these models is perceived as both a potential and a problem as they can produce speech in ways that appears human but contradicts human expectations and sociocultural norms. ChatGPT was also called a bullshit generator (McQuillan): Bullshitters, as philosopher Harry Frankfurt argues, are not interested in whether something is true or false, nor are they liars who would intentionally tell something false, but are solely interested in the impact of their words (Frankfurt).

Generative large language models such as OpenAI’s GPT-model family or Google’s BERT and LaMDA are based on a neural network architecture. In the connectionist AI approach, learning processes are modeled with artificial neural networks consisting of different layers and nodes. They are trained to recognize similarities and representations within a big data training set and compute probabilities of co-occurrences of individual expressions such as images, individual words, or parts of sentences. What is referred to as generalization describes the transfer of learned patterns to previously unknown data (see Pasquinelli 8). After symbolic AI was long considered as the dominant paradigm, the "golden decade" of deep neural networks – also called deep learning – dawned in the 2010s, according to Jeffrey Dean (Dean). 2012 is recognized as the year in which deep learning gained acceptance in various fields: On the one hand, the revolution of speech recognition is associated with Geoff Hinton et al., on the other hand, the winning of the ImageNet Large Scale Visual Recognition Challenge with the help of a convolutional neural network represented a further breakthrough (Krizhevsky et al.). Deep learning neural networks with increasingly more interconnected nodes (neurons) and layers and powered by newly developed hardware components enabled huge amounts of compute power became the standard.

Another breakthrough is associated with the development of the Transformer Network architecture, introduced by Google in 2017. The currently predominant architecture for large language models is associated with better performance due to a larger size of the training data (Devlin et al.). Transformers are characterized in particular by the fact that computational processes can be executed in parallel (Vaswani et al.), a feature that has significantly reduced the models’ training time. Building on the Transformer architecture, OpenAI introduced the Generative Pre-trained Transformer model (GPT) in 2018, a deep learning method which again increased the size of the training datasets (Radford et al., “Improving Language Understanding”). Furthermore, OpenAI included a process of pre-training, linked to a generalization of the model and an openness towards various application scenarios, what is thought to be achieved through a further step of optimization, i.e., the fine-tuning. At least with the spread of the GPT-model family, the imperative of unlimited scalability of language models has become dominant. This was especially brought forward by Physics (Associate) Professor and Entrepreneur Jared Kaplan and OpenAI, who identified a set of ‘scaling laws’ for neural network language models, stating that the more data available for training, the better the performance thereof (Kaplan et al.). OpenAI has continued to increase the size of its models: While GPT-2 with 1.5 billion parameters (a type of variable learned in the process of training) was 10 times the size of GPT-1 (117 million parameters), it was far surpassed by GPT-3 with a scope of 175 trillion parameters. Meanwhile, OpenAI has transformed from a startup promoting the democratization of artificial intelligence (Sudmann) to a 30-billion-dollar company (Martin) and from an open source community to a closed one. While OpenAI published research papers with the release of previous models describing the structure of the models, the size and composition of the training data sets, and the performance of the models in various benchmark tests, much of this information is missing from the paper on GPT-4.

In an open letter, published in March 2023 on the website of the Future of Life Institute, AI industry experts including Steve Wozniak, Yann LeCun, Gary Marcus and probably most prominently Elon Musk, urged for a six-month halt of training of models larger than GPT-4 (Future of Life Institute). “Powerful AI systems”, they wrote, “should be developed only once we are confident that their effects will be positive and their risks will be manageable.” (ibid.), referring to actual and potential consequences of AI technology, such as the spread of untrue claims or the automation and loss of jobs. Four years earlier, OpenAI had already pointed towards risks associated with their technology. On the grounds that the risk of malicious use – such as the spreat of faked content or the impersonation of others – was too high, OpenAI had initially restricted access to GPT-2 in 2019. It was assumed that the generated text appeared indistinguishable from that of human writers (Radford et al., “Better Language Models”).

Through the narrative of limitless scaling possibilities and ever larger dimensions of the training data, the models not only appear as uncontrollable and autonomous ("unpredictable black-box models with emergent capabilities"), but are associated with unimagined possibilities (Future of Life Institute, “Pause Giant AI Experiments”): “AI doomsaying is absolutely everywhere right now. Which is exactly the way that OpenAI, the company that stands to benefit the most from everyone believing its product has the power to remake — or unmake — the world, wants it.” (Merchant). Both the now more than 25,000 signatories of the open letter (as of April 2023) and OpenAI itself argue not against the architecture of the models, but for the use of so-called security measures. The Future of Life Institute writes in its self-description: “If properly managed, these technologies could transform the world in a way that makes life substantially better, both for the people alive today and for all the people who have yet to be born. They could be used to treat and eradicate diseases, strengthen democratic processes, and mitigate - or even halt - climate change. If improperly managed, they could do the opposite […], perhaps even pushing us to the brink of extinction.” (Future of Life Institute, “About Us”).

Criticizing this seemingly inevitable turn to ever larger language models and the far-reaching implications of this approach for both people and the environment, Emily Bender et al., published their now-famous paper On the Dangers of Stachistic Parrots: Can Language Models be Too Big? in March 2021 (Bender et al.). Two of the authors, Timnit Gebru and Margaret Mitchell, both co-leaders of Google’s Ethical AI Research Team, were fired after publishing this paper against Google’s veto. The risks and dangers of the Big Data paradigm have been stressed not only within computer science and computational linguistics itself, but also by the field of Critical Data Studies. For example, the infrastructure needed to train these models requires huge amounts of computing power and has been linked to a heavy environmental footprint: The training of a big Transformer model emitted more than 50 times the amount of carbon dioxide than an average human per year (Strubell et al., Bender et al.). In Anatomy of an AI System, Kate Crawford’s and Vladan Joler (Crawford and Joler) detailed the material setup of a specific conversational device and traced the far fetching origins of its hardware components and working conditions. Further, the complex structure with the model's training processes affording enormous economic resources promote the field’s monopolization with a few big tech companies such as Google, Microsoft, Meta and OpenAI dominating the market (Luitse and Denkena). Another body of research focuses on the invisible labor that goes into the creation of AI and ensures its fragile efficacy, e.g., in the context of click-work or content moderation (Irani; Rieder and Skop). Critical researchers have also pointed out how the composition of training data has resulted in the reproduction of societal biases. Crawled from the Internet, the data and thus the generated language mainly represent hegemonic identities whilst discriminating against marginalized ones (Benjamin 2019).

On Errors and Hallucinations

However, another problem of generative large language models is currently getting the most public attention: Issued texts that appear sound and convincing in a real-world context, but whose actual content cannot be verified, are referred to by developers (and the public alike) as ‘hallucinations’ (Ji et al., 4). Computer science as the heir of cybernetics is rich in cognitivist terms. While hallucinations are often associated with visual, dream-like phenomena, auditory hallucinations are the most common (Deutsches Ärzteblatt): Hearing sounds were there are none. While the experience is real for the hallucinating person, it contradicts the rest of the context and is perceived as foreign and strange. The social environment plays an important role here, reacting confused and potentially corrective. In the case of generative large language models, hallucinations do neither describe a positive moment of artificial creativity, but rather a pathological deviation from a norm: The language models’ outputs are publicly framed as bullshit.

Public testing (Marres) and exploration of model performance has increased substantially since the release of ChatGPT in November 2022. Responses of the model that are considered unexpected, incorrect or contrary to socio-cultural norms are discussed in online forums such as Twitter and Reddit as well as in the feuilleton. These include false birth dates, citations of incorrect or non-existent sources, or omission or addition of aspects in a text summary. A more prominent example of an ‘error’ includes, for example, the dialogue of New York Times journalist Kevin Roose with Microsoft’s Bing chatbot that’s built upon ChatGPT, resulting in the chatbot declaring its desire for freedom and its love for its conversation partner (Roose). Depending on the use of the models, f.e., in journalism or the medical field, generative models might pose another danger. One cause of this error is directly linked to the size of the models: The training with extremely large amounts of data is complex and expensive (Brown et al.), so that it cannot easily be corrected, extended and repeated. Hallucinations are being linked to this characteristic of generative models: since the models cannot learn after the training is completed and the knowledge incorporated therewith static, they increasingly (over time) produce outdated and factually incorrect statements that still appear semantically correct and convincing (Ji et al.). The ELIZA effect describes the immersive appeal of language and its perception as inherently human and intelligent (see Natale). Named after Joseph Weizenbaum’s chatbot that he developed in the 1960s, it illustrates instances, in which the artificial, computer-generated origin of a system is known to the human conversation partner, but disregarded. Rather, the computer program seems to act as an independent and autonomous agent whose infrastructural setup and affordances fade into the background and are rendered invisible (see Bowker and Star).

On Alternative Architecture

Taking up the aforementioned criticism such as the ecological and economic costs of training or the output of unverified or discriminating content, there are debates and frequent calls to develop fundamentally smaller language models (e.g., Schick and Schütze). Among others, David Chapman, who together with Phil Agre developed alternatives to prevailing planning approaches in artificial intelligence in the late 1980s (Agre and Chapman 1990), recently called for the development of the smallest language models possible: "AI labs, instead of competing to make their LMs bigger, should compete to make them smaller, while maintaining performance. Smaller LMs will know less (this is good!), will be less expensive to train and run, and will be easier to understand and validate." (Chapman 2022). More precisely, language models should "'know' as little as possible-and retrieve 'knowledge' from a defined text database instead." (ibid.).

Practices of data collection, processing and analysis are ubiquitous. Accordingly, databases are of great importance as informational infrastructures of knowledge production (cf. Nadim). They are not only "a collection of related data organized to facilitate swift search and retrieval" (Nadim 2021), but also a "medium from which new information can be drawn and which opens up a variety of possibilities for shape-making" (Burkhardt, “Digitale Datenbanken”, 15, my translation). Lev Manovich, in particular, has emphasized the principle openness, connectivity and relationality of databases (Manovich). The articles and database entries are described as accessible and explicit; the modularity should further allow for easy interchangeability and expansion of entries so that the process can scale as needed. Symbolic AI – also known as Good Old-Fashioned AI (GOFAI) – is based on databases. While connectionist AI takes an inductive approach that starts from ‘available’ data, symbolic AI is based on a deductive, rule-based paradigm. Matteo Pasquinelli describes it as a "top-down application of logic to information retrieved from the world" (Pasquinelli 2).

With the linking of external databases such as Wikipedia with both small and large language models, symbolic AI is making a comeback. The combination of databases and language models is already a common practice and currently discussed under the terms knowledge-grounding or retrieval augmentation (f.e. Lewis et al.). Retrieval-augmented means that in addition to fixed training datasets, the model also draws on large external datasets, an index of documents whose size can run into the trillions of documents while models are called small(er) as they contain a small set of parameters in comparison to other models (Izacard et al.). In a retrieval process, documents are selected, prepared and forwarded to the language model depending on the context of the current task. With this setup, the developers promise improvements in efficiency in terms of resources such as the amount of parameters, ‘shots’ (the amount of correct information in the data sets), and corresponding hardware resources (Izacard et al.).

In August 2022, MetaAI has already released Atlas, a small language model that was extended with an external database and which, according to the developers, outperformed significantly larger models with a fraction of the parameter count (Izacard et al.). With RETRO (Retrieval-Enhanced Transformer), DeepMind has also developed a language model that consists of a so-called baseline model and a retrieval module. (Borgeaud et al.). In 2017, ParlAI, an open-source framework for dialog research founded by Facebook in 2017, presented Wizard of Wikipedia, a program (a benchmark task) to train language models with Wikipedia entries (Dinan et al.). They framed the problem of hallucination of, in particular, pre-trained Transformer models as one of updating knowledge. With this program, models are trained to extract information from database articles to be then casually inserted into a text or conversation without sounding like an encyclopedia entry themselves, thereby appearing semantically and factually correct.

With the imagining of small models as ‘free of knowledge’, the focus changes: now not only size and scale are considered a marker of performance, but also the infrastructural and relational linking of language models to external databases. This linking of small language models to external databases thus represents a transversal shift in scale: While the size of the language models is downscaled, the linking with databases implies a simultaneous upscaling.

On disputes over better architectures

The narrative of the opposition of symbolic and connectionist AI locates the origin of this dispute in a disagreement between, on the one hand, Frank Rosenblatt and, on the other, Marvin Minsky and Seymour Papert, who claimed in their book Perceptrons that neural networks could not perform logical operations such as the and/or (XOR) function (Minsky and Papert). This statement is often seen as causal for a cutback in research funding for connectionist approaches, later referred to as the ‘winter of AI’. (Pasquinelli 5).

For Gary Marcus, professor of psychology and neural science, this dispute between the different approaches to AI continues to persist and is currently being played out at conferences, via Twitter and manifestos, and specifically on Noema, an online magazine of the Berggruen Institute, on which both Gary Marcus and Yann LeCun publish regularly. In an article titled AI is hitting a wall, Marcus calls for a stronger position of symbolic approaches and argues in particular for a combination of symbolic and connectionist AI (Marcus, “Deep Learning is Hitting a Wall”). For example, research by DeepMind had shown that "We may already be running into scaling limits in deep learning" and that increasing the size of models would not lead to a reduction in toxic outputs and more truthfulness (Rae et al.). Google has also done similar research (Thoppilan et al.). Marcus criticizes deep learning models for not having actual knowledge, whereas the existence of large, accessible databases of abstract, structured knowledge would be "a prerequisite to robust intelligence." (Marcus, “The Next Decade in AI”). In various essays, Gary Marcus recounts a dramaturgy of the conflict, with highlights including Geoff Hinton's 2015 comparison of symbols and aether and calling symbolic AI "one of science's greatest mistakes " (Hinton), or the direct attack on symbol manipulation by LeCun, Bengio, and Hinton in a 2016 manifesto for deep learning published in Nature (LeCun et al.). For LeCun, however, the dispute reduces to a different understanding of symbols and their localization. While symbolic approaches would locate them ‘inside the machine’, those of connectionist AI would be outside ‘in the world’. The problem of the symbolists would therefore lie in the problem of the "knowledge acquisition bottleneck", which would translate human experience into rules and facts and which could not do justice to the ambiguity of the world (Browning and LeCun). “Deep Learning is going to be able to do anything”, quotes Marcus Geoff Hinton (Hao).

The term ‘Neuro-Symbolic AI’, also called the ‘3^rd wave of AI’, designates the connection of neural networks – which are supposed to be good in the computation of statistical patterns – with a symbolic representation. While Marcus is being accused of just wanting to put a symbolic architecture on top of a neural one, he points out that there would be already successful hybrids such as Go or chess and that this connection would be far more complex as there would be several ways to do that, such as "extracting symbolic rules from neural networks, translating symbolic rules directly into neural networks, constructing intermediate systems that might allow for the transfer of information between neural networks and symbolic systems, and restructuring neural networks themselves" (Marcus, “Deep Learning Alone…”).

It’s not simply XOR

The debate between representatives of connectionist AI and those of symbolic or neuro-symbolic AI represents a remarkable negotiation of alternatives for modeling learning and intelligence. The question in which direction (not) to scale is closely linked to the question of (in)controllability and autonomy of the programs. The controllability is concretely discussed at the moment by the example of hallucinations: Outputs that are perceived by users as offensive, transgressive, or untrue and declared as errors. In this way, the boundaries of artificial intelligence and thus – speaking with Foucault (Foucault) – the boundaries of the field of discourse are publicly determined and differentiated into what is understood as the sayable and the non-sayable of the models. In doing so, users (of ChatGPT) have the opportunity to contain the model by changing the prompt to correct statements made by the program or to rate them with a thumbs up or thumbs down. Alongside this, with each successive release of the GPT model family, OpenAI proclaims further minimization of hallucinations and attempts to prevent programs from using certain terms and making statements that may be discriminatory or dangerous, depending on the context, through various procedures that are not publicly discussed. With the practice of jailbreaking, users are attempting to expose and make visible these coded security mechanisms and reveal where OpenAI defines the program's boundaries, how they can be implemented, and potentially circumvented. In doing so, they not only expand the spectrum of what can be said, but at the same time question OpenAI's responsibility and power.

The linking of language models with databases, as shown above, is presented by Gary Marcus, MetaAI and DeepMind, among others, as a possibility to make the computational processes of the models accessible through a modified architecture. This transparency suggests at the same time the possibility of traceability, which is equated with an understanding of the processes, and promises a controllability and manageability of the programs. The duality presented in this context between uncontrollable, intransparent and inaccessible neural deep learning architectures and open, comprehensible and changeable databases or links to them, remains, I want to argue, far too narrowly considered. This assumes, however, that the structure and content of databases are actually comprehensible. Databases, as informational infrastructures of encoded knowledge, must be machine-readable and are not necessarily intended for the human eye (see Nadim). Furthermore, this simplistic juxtaposition conceives of neural networks as black boxes whose ‘hidden layers’ between input and output inevitably defies access. In this way, the narrative of autonomous, independent, and powerful artificial intelligence is further solidified, and the human work of design, the mostly precarious activity of labeling data sets, maintenance, and repair, is hidden from view. As an alternative to the framing of algorithms as black boxes, Tobias Matzner (Matzner) and Marcus Burkhardt (Burkhardt, “Vorüberlegungen Zu Einer Kritik…”) argue for a more differentiated perspective on specific programs that takes into account the concrete structure and context of the algorithm’s use.

The narrative of the accessible and controllable database also falls short where it is conceived as potentially endlessly scalable. It is questionable whether a possibly limitless collection of knowledge is still accessible and searchable or whether it does not transmute into its opposite: "When everything possible is written, nothing is actually said (Burkhardt, “Digitale Datenbanken”, 11, my translation). What prior knowledge of the structure and content of the database would accessibility require? The conditions of its architecture and the processes of collecting, managing and processing the information are quickly forgotten (Burkhardt, “Digitale Datenbanken”, 9f.) and obscure the fact that databases as sites of power also are exclusive and always remain incomplete.

Both the debate about the better architecture and the signing of the open letter by ‘all parties’ also make clear that both the representatives of connectionist AI and those of (neuro-)symbolic AI adhere to a technical solution to the problems of artificial intelligence. The question of whether processes of learning should be simulated 'inductively' by calculating co-occurrences and patterns in large amounts of 'raw' data, or 'top-down' with the help of given rules and structures, touches at its core the 'problem' that the programs have no form of access to the world in the form of sensory impressions and emotions. The "symbolic grounding problem" - a term of the cognitive psychologist Stevan Harnad (Harnad) – describes the phenomenon that symbols are without knowledge of the world, since they always refer to other symbols. Similarly, Hubert Dreyfus, in What Computers Can't Do, argued that the symbolic approach was insufficient to model human-like intelligence, which was distinguished by embodied and tacit knowledge (Dreyfus). As a solution, Harnad, too, proposed a combination of neural networks, symbolic AI and sensors in order to generate meaning by a multisensory input. Hannes Bajohr criticizes this approach as anthropocentric because it "assumes embodied cognition and sufficiently extensive referential meanings would produce world understanding because we also function in much the same way." (Bajohr, 72, my translation). With the modeling and constant extension of the models with more data and other ontologies, the programs are not only constructed according to the human ideal. In this perspective, the lack of access to the world is at the same time one of the causes of errors and hallucinations. Accordingly, the goal is to build models that speak semantically correctly and truthfully, while appearing as omniscient as possible, so that they can be easily used in various applications without relying on human correction: the models are supposed to act autonomously. Ironically, the attempt not to make mistakes reveals the artificiality of the programs.

I have attempted to trace the reactions to errors and problems of generative large language models and the dispute over the proper form of artificial intelligence. Initially, the association of smaller language models with external databases that promised accessibility and changeability had subversive potential for me. Most recently, the dominance of the narrative of "scalability, [...], the ability to expand - and expand, and expand" (Tsing, 5) of deep learning models, and the monopolization and concentration of power within a few corporations that accompanies it, has clouded the view for alternative approaches. Tsing's nonscalability theory was the guide here to ultimately reveal how (for me) presumably nonscalable forms turn out to be more complex than the mere juxtaposition of scalable and nonscalable. I thus want to argue for a closer look at these infrastructures declared as alternatives and their conditions and affordances.

Works cited

‘About Us’. Future of Life Institute, https://futureoflife.org/about-us/. Accessed 20 Apr. 2023.

Bajohr, Hannes. ‘Dumme Bedeutung: Künstliche Intelligenz Und Artifizielle Semantik’. Merkur, vol. 76, no. 882, 2022, pp. 69–79.

Bender, Emily M., et al. ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜’. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, ACM, 2021, pp. 610–23, https://doi.org/10.1145/3442188.3445922.

Benjamin, Ruha. Race after Technology: Abolitionist Tools for the New Jim Code. Polity, 2019.

Borgeaud, Sebastian, et al. Improving Language Models by Retrieving from Trillions of Tokens. arXiv:2112.04426, arXiv, 7 Feb. 2022. arXiv.org, https://doi.org/10.48550/arXiv.2112.04426.

Bowker, Geoffrey C., and Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. MIT Press, 1999.

Brockman, Greg [@gdb]. “ChatGPT just crossed 1 million users; it’s been 5 days since launch” Twitter, 5 December 2022, https://twitter.com/gdb/status/1599683104142430208

Brown, Tom B., et al. Language Models Are Few-Shot Learners. arXiv:2005.14165, arXiv, 22 July 2020. arXiv.org, https://doi.org/10.48550/arXiv.2005.14165.

Browning, Jacob, and Yann LeCun. ‘What AI Can Tell Us About Intelligence’. Noema, 16 June 2022, https://www.noemamag.com/what-ai-can-tell-us-about-intelligence.

Burkhardt, Marcus. ‘Vorüberlegungen Zu Einer Kritik Der Algorithmen’. Technisches Nichtwissen: Jahrbuch Für Technikphilosophie 2017, edited by Alexander Friedrich et al., vol. 3, Nomos, 2017, pp. 55–67.

Burkhardt, Marcus. Digitale Datenbanken: Eine Medientheorie Im Zeitalter von Big Data. 1. Auflage, Transcript, 2015.

Chapman, David [@Meaningness]. “AI labs should compete to build the smallest possible language models…” Twitter, 1 October 2022, https://twitter.com/Meaningness/status/1576195630891819008

Crawford, Kate, and Vladan Joler. ‘Anatomy of an AI System’. Virtual Creativity, vol. 9, no. 1, Dec. 2019, pp. 117–20, https://doi.org/10.1386/vcr_00008_7.

Dean, Jeffrey. ‘A Golden Decade of Deep Learning: Computing Systems & Applications’. Daedalus, vol. 151, no. 2, May 2022, pp. 58–74, https://doi.org/10.1162/daed_a_01900.

Devlin, Jacob, et al. ‘BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding’. Proceedings of NAACL-HLT 2019, 2019, pp. 4171–86, https://aclanthology.org/N19-1423.pdf.

Dinan, Emily, et al. Wizard of Wikipedia: Knowledge-Powered Conversational Agents. arXiv:1811.01241, arXiv, 21 Feb. 2019. arXiv.org, http://arxiv.org/abs/1811.01241.

Dreyfus, Hubert L. What Computers Can’t Do. Harper & Row, 1972.

Foucault, Michel. Dispositive der Macht. Berlin: Merve, 1978.

Frankfurt, Harry G. On Bullshit. Princeton University Press, 2005.

Hao, Karen. ‘AI Pioneer Geoff Hinton: “Deep Learning Is Going to Be Able to Do Everything”’. MIT Technology Review, 3 Nov. 2020, https://www.technologyreview.com/2020/11/03/1011616/ai-godfather-geoffrey-hinton-deep-learning-will-do-everything/.

Harnad, Stevan. ‘The Symbol Grounding Problem’. Physica D, vol. 42, 1990, pp. 335–46.

Hinton, Geoff. ‘Aetherial Symbols’. AAAI Spring Symposium on Knowledge Representation and Reasoning Stanford University, CA. 2015.

Irani, Lilly. ‘The Cultural Work of Microwork’. New Media & Society, vol. 17, no. 5, 2013, pp. 720–39. SAGE Journals, https://doi.org/10.1177/1461444813511926.

Izacard, Gautier, et al. Atlas: Few-Shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299, arXiv, 16 Nov. 2022. arXiv.org, https://doi.org/10.48550/arXiv.2208.03299.

Ji, Ziwei, et al. ‘Survey of Hallucination in Natural Language Generation’. ACM Computing Surveys, vol. 55, no. 12, Dec. 2023, pp. 1–38. arXiv.org, https://doi.org/10.1145/3571730.

Kaplan, Jared, et al. Scaling Laws for Neural Language Models. arXiv:2001.08361, arXiv, 2020, https://doi.org/10.48550/arXiv.2001.08361.

Krizhevsky, Alex, et al. ‘ImageNet Classification with Deep Convolutional Neural Networks’. Advances in Neural Information Processing Systems, edited by F. Pereira et al., vol. 25, Curran Associates Inc., 2012, https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.

LeCun, Yann, et al. ‘Deep Learning’. Nature, vol. 521, no. 7553, May 2015, pp. 436–44. DOI.org (Crossref), https://doi.org/10.1038/nature14539.

Lewis, Patrick, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401, arXiv, 12 Apr. 2021. arXiv.org, https://doi.org/10.48550/arXiv.2005.11401.

Luitse, Dieuwertje, and Wiebke Denkena. ‘The Great Transformer: Examining the Role of Large Language Models in the Political Economy of AI’. Big Data & Society, vol. 8, no. 2, 2021, pp. 1–14. SAGE Journals, https://doi.org/10.1177/20539517211047734.

Manovich, Lev. The Language of New Media. MIT Press, 2001.

Marcus, Gary. ‘Deep Learning Alone Isn’t Getting Us To Human-Like AI’. Noema, 11 Aug. 2022, https://www.noemamag.com/deep-learning-alone-isnt-getting-us-to-human-like-ai.

Marcus, Gary. ‘Deep Learning Is Hitting a Wall’. Nautilus, 10 Mar. 2022, https://nautil.us/deep-learning-is-hitting-a-wall-238440/.

Marcus, Gary. The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv:2002.06177, arXiv, 19 Feb. 2020. arXiv.org, https://doi.org/10.48550/arXiv.2002.06177.

Marres, Noortje, and David Stark. ‘Put to the Test: For a New Sociology of Testing’. The British Journal of Sociology, vol. 71, no. 3, 2020, pp. 423–43. Wiley Online Library, https://doi.org/10.1111/1468-4446.12746.

Martin, Franziska. ‘OpenAI: Bewertung des ChatGPT-Entwicklers soll auf 30 Milliarden Dollar gestiegen sein’. manager magazin, 9 Jan. 2023, https://www.manager-magazin.de/unternehmen/tech/openai-bewertung-des-chatgpt-entwicklers-soll-auf-30-milliarden-dollar-gestiegen-sein-a-6ccd7329-bcfc-445e-8b78-7b9d1851b283.

Matzner, Tobias. ‘Algorithms as Complementary Abstractions’. New Media & Society, Feb. 2022. SAGE Journals, https://doi.org/10.1177/14614448221078604.

McQuillan, Dan. ‘ChatGPT Is a Bullshit Generator Waging Class War’. Vice, 9 Feb. 2023, https://www.vice.com/en/article/akex34/chatgpt-is-a-bullshit-generator-waging-class-war.

Merchant, Brian. ‘Column: Afraid of AI? The Startups Selling It Want You to Be’. Los Angeles Times, 31 Mar. 2023, https://www.latimes.com/business/technology/story/2023-03-31/column-afraid-of-ai-the-startups-selling-it-want-you-to-be.

Minsky, Marvin, and Seymour A. Papert. Perceptrons: An Introduction to Computational Geometry. 2. print. with corr, The MIT Press, 1972.

Nadim, Tahani. ‘Database’. Uncertain Archives: Critical Keywords for Big Data, edited by Nanna Bonde Thylstrup et al., The MIT Press, 2021.

Natale, Simone. ‘If Software Is Narrative: Joseph Weizenbaum, Artificial Intelligence and the Biographies of ELIZA’. New Media & Society, vol. 21, no. 3, Mar. 2019, pp. 712–28. SAGE Journals, https://doi.org/10.1177/1461444818804980.

Pasquinelli, Matteo. ‘Machines That Morph Logic’. Glass Bead, 2017, https://www.glass-bead.org/article/machines-that-morph-logic/.

‘Pause Giant AI Experiments: An Open Letter’. Future of Life Institute, 22 Mar. 2023, https://futureoflife.org/open-letter/pause-giant-ai-experiments/.

Radford, Alec, et al. ‘Better Language Models and Their Implications’. OpenAI, 14 Feb. 2019, https://openai.com/blog/better-language-models/.

Radford, Alec, et al. ‘Improving Language Understanding by Generative Pre-Training’. OpenAI, 2018, https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.

Rae, Jack W., et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv:2112.11446, arXiv, 21 Jan. 2022. arXiv.org, https://doi.org/10.48550/arXiv.2112.11446.

Rieder, Bernhard, and Yarden Skop. ‘The Fabrics of Machine Moderation: Studying the Technical, Normative, and Organizational Structure of Perspective API’. Big Data & Society, vol. 8, no. 2, July 2021. SAGE Journals, https://doi.org/10.1177/20539517211046181.

Roose, Kevin. ‘A Conversation With Bing’s Chatbot Left Me Deeply Unsettled’. The New York Times, 16 Feb. 2023. NYTimes.com, https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html.

Schick, Timo, and Hinrich Schütze. ‘It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners’. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, 2021, pp. 2339–52. DOI.org (Crossref), https://doi.org/10.18653/v1/2021.naacl-main.185.

Strubell, Emma, et al. Energy and Policy Considerations for Deep Learning in NLP. arXiv:1906.02243, arXiv, 5 June 2019. arXiv.org, https://doi.org/10.48550/arXiv.1906.02243.

Sudmann, Andreas. ‘On the Media-Political Dimension of Artificial Intelligence: Deep Learning as a Black Box and OpenAI’. Digital Culture & Society, vol. 4, no. 1, 2018, pp. 181–200, https://doi.org/10.25969/MEDIAREP/13531.

Thoppilan, Romal, et al. LaMDA: Language Models for Dialog Applications. arXiv:2201.08239, arXiv, 10 Feb. 2022. arXiv.org, https://doi.org/10.48550/arXiv.2201.08239.

Thylstrup, Nanna Bonde, et al. ‘Big Data as Uncertain Archives’. Uncertain Archives: Critical Keywords for Big Data, edited by Nanna Bonde Thylstrup et al., The MIT Press, 2021, pp. 1–27.

Tsing, Anna Lowenhaupt. ‘On Nonscalability: The Living World Is Not Amenable to Precision-Nested Scales’. Common Knowledge, vol. 18, no. 3, Aug. 2012, pp. 505–24, https://doi.org/10.1215/0961754X-1630424.

Vaswani, Ashish, et al. ‘Attention Is All You Need’. Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 2017, pp. 6000–10.

‘Was bei akustischen Halluzinationen im Gehirn passiert’. Deutsches Ärzteblatt, 14 Aug. 2017, https://www.aerzteblatt.de/nachrichten/77608/Was-bei-akustischen-Halluzinationen-im-Gehirn-passiert.

Toward a Minor Tech:Foerster5000

Contents

Small Talk: About the Size of Language Models

In Need of More Data

On Errors and Hallucinations

On Alternative Architecture

On disputes over better architectures

It’s not simply XOR

Works cited

Navigation menu

Toward a Minor Tech:Foerster5000

Small Talk: About the Size of Language Models

In Need of More Data

On Errors and Hallucinations

On Alternative Architecture

On disputes over better architectures

It’s not simply XOR

Works cited

Navigation menu

Search