Toward a Minor Tech:Foerster5000 - Revision history

Simoon: /* The Bigger the Better?!The Size of Language Models and the Dispute over Alternative Architectures */

2023-08-22T09:48:08Z

The Bigger the Better?!The Size of Language Models and the Dispute over Alternative Architectures

← Older revision		Revision as of 09:48, 22 August 2023
Line 65:		Line 65:

	Conversational AI and generative models in particular are already an integral part of everyday processes of text and image production. The technically generated outputs produce a socially dominant understanding of reality, whose fractures and processes of negotiation are evident in the discussions about hallucinations and jailbreaking. It is therefore of great importance to follow and critically analyze both the technical (‘alternative’) architectures and affordances as well as the assumptions, interests, and power structures of the dominant (individual) actors (Musk, Altman, LeCun, etc.) and big tech corporations that are interwoven with them.		Conversational AI and generative models in particular are already an integral part of everyday processes of text and image production. The technically generated outputs produce a socially dominant understanding of reality, whose fractures and processes of negotiation are evident in the discussions about hallucinations and jailbreaking. It is therefore of great importance to follow and critically analyze both the technical (‘alternative’) architectures and affordances as well as the assumptions, interests, and power structures of the dominant (individual) actors (Musk, Altman, LeCun, etc.) and big tech corporations that are interwoven with them.

			<div class="page-break"></div>

	== Works cited ==		== Works cited ==

Simoon at 20:29, 21 July 2023

2023-07-21T20:29:38Z

← Older revision		Revision as of 20:29, 21 July 2023
Line 1:		Line 1:
	__NOTOC__		__NOTOC__
	~~= The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures =~~
	'''Susanne Förster'''		'''Susanne Förster'''
			= The Bigger the Better?!<br>The Size of Language Models and the Dispute over Alternative Architectures =

			<span class="running-header">The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures</span>

	== Abstract ==		== Abstract ==

Manetta: /* The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures */

2023-07-21T18:35:38Z

The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures

← Older revision		Revision as of 18:35, 21 July 2023
Line 65:		Line 65:

	== Works cited ==		== Works cited ==
			<div class="workscited">
	Agre, Philip E., and David Chapman. "What Are Plans For?" ''Robotics and Autonomous Systems'', vol. 6, no. 1, June 1990, pp. 17–34. ''ScienceDirect'', https://doi.org/10.1016/S0921-8890(05)80026-0.		Agre, Philip E., and David Chapman. "What Are Plans For?" ''Robotics and Autonomous Systems'', vol. 6, no. 1, June 1990, pp. 17–34. ''ScienceDirect'', https://doi.org/10.1016/S0921-8890(05)80026-0.

Line 194:		Line 195:

	Vaswani, Ashish, et al. "Attention Is All You Need". ''Proceedings of the 31st International Conference on Neural Information Processing Systems'', Curran Associates Inc., 2017, pp. 6000–10.		Vaswani, Ashish, et al. "Attention Is All You Need". ''Proceedings of the 31st International Conference on Neural Information Processing Systems'', Curran Associates Inc., 2017, pp. 6000–10.
			</div>

	[[Category:Toward a Minor Tech]]		[[Category:Toward a Minor Tech]]
	[[Category:5000 words]]		[[Category:5000 words]]

Simoon: /* The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures */

2023-07-17T14:32:11Z

The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures

← Older revision		Revision as of 14:32, 17 July 2023
Line 6:		Line 6:
	This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.		This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.

	=~~= ==~~		<div class="page-break"></div>

	In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.		In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.

Simoon at 13:40, 7 July 2023

2023-07-07T13:40:41Z

← Older revision		Revision as of 13:40, 7 July 2023
Line 6:		Line 6:
	This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.		This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.

			== ==
	In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.		In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.

Simoon at 13:13, 22 June 2023

2023-06-22T13:13:03Z

← Older revision		Revision as of 13:13, 22 June 2023
Line 1:		Line 1:
			__NOTOC__
	~~[[Category:Toward a Minor Tech]]~~
	~~[[Category:5000 words]]~~

	~~= Susanne Förster =~~

	= The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures =		= The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures =
			'''Susanne Förster'''

	== Abstract ==		== Abstract ==
Line 196:		Line 192:

	Vaswani, Ashish, et al. "Attention Is All You Need". ''Proceedings of the 31st International Conference on Neural Information Processing Systems'', Curran Associates Inc., 2017, pp. 6000–10.		Vaswani, Ashish, et al. "Attention Is All You Need". ''Proceedings of the 31st International Conference on Neural Information Processing Systems'', Curran Associates Inc., 2017, pp. 6000–10.

			[[Category:Toward a Minor Tech]]
			[[Category:5000 words]]

Christianvonand: /* The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures */

2023-06-20T08:05:13Z

The Bigger the Better?! The Size of Language Models and the Dispute over Alternative Architectures

← Older revision		Revision as of 08:05, 20 June 2023
Line 10:		Line 10:
	This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.		This article looks at a controversy over the ‘better’ architecture for conversational AI that unfolds initially along the question of the ‘right’ size of models. Current generative models such as ChatGPT and DALL-E follow the imperative of the largest possible, ever more highly scalable, training dataset. I therefore first describe the technical structure of large language models and then address the problems of these models which are known for reproducing societal biases or so-called hallucinations. As an ‘alternative’, computer scientists and AI experts call for the development of much smaller language models linked to external databases, that should minimize the issues mentioned above. As this paper will show, the presentation of this structure as ‘alternative’ adheres to a simplistic juxtaposition of different architectures that follows the imperative of a computable reality, thereby causing problems analogous to the ones it tried to circumvent.

	In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.		In recent years, increasingly large, complex and capable machine learning models such as the GPT model family, DALL-E or Stable Diffusion have become the super trend of current (artificially intelligent) technologies. Trained on identifying patterns and statistical features and thus intrinsically scalable, the potential of large language models is seen as based on their generative capabilities to produce a wide range of different texts and images.

	The monopolization and concentration of power within a few big tech companies such as Google, Microsoft, Meta and OpenAI that accompanies this trend is promoted by the enormous economic resources afforded by the models’ training processes (see Luitse and Denkena). The risks and dangers of this big data paradigm have been stressed widely: The working conditions and invisible labor that goes into the creation of AI and ensures its fragile efficacy has been addressed in the context of click-work or content moderation (f.e., Irani; Rieder and Skop). In ''Anatomy of an AI System,'' Kate Crawford’s and Vladan Joler (Crawford and Joler) detailed the material setup of a conversational device and traced the far fetching origins of its hardware components and working conditions. Critical researchers have also pointed out how the composition of training data has resulted in the reproduction of societal biases. Crawled from the Internet, the data and thus the generated language mainly represent hegemonic identities whilst discriminating against marginalized ones (Benjamin). Moreover, the infrastructure needed to train these models requires huge amounts of computing power and has been linked to a heavy environmental footprint: The training of a big Transformer model emitted more than 50 times the amount of carbon dioxide than an average human per year (Strubell et al., Bender et al.). Criticizing this seemingly inevitable turn to ever larger language models and the far-reaching implications of this approach for both people and the environment, Emily Bender et al., published their now-famous paper ''On the Dangers of Stochastic Parrots: Can Language Models be Too Big?'' in March 2021 (Bender et al.). Two of the authors, Timnit Gebru and Margaret Mitchell, both co-leaders of Google’s Ethical AI Research Team, were fired after publishing this paper against Google’s veto.		The monopolization and concentration of power within a few big tech companies such as Google, Microsoft, Meta and OpenAI that accompanies this trend is promoted by the enormous economic resources afforded by the models’ training processes (see Luitse and Denkena). The risks and dangers of this big data paradigm have been stressed widely: The working conditions and invisible labor that goes into the creation of AI and ensures its fragile efficacy has been addressed in the context of click-work or content moderation (f.e., Irani; Rieder and Skop). In ''Anatomy of an AI System,'' Kate Crawford’s and Vladan Joler (Crawford and Joler) detailed the material setup of a conversational device and traced the far fetching origins of its hardware components and working conditions. Critical researchers have also pointed out how the composition of training data has resulted in the reproduction of societal biases. Crawled from the Internet, the data and thus the generated language mainly represent hegemonic identities whilst discriminating against marginalized ones (Benjamin). Moreover, the infrastructure needed to train these models requires huge amounts of computing power and has been linked to a heavy environmental footprint: The training of a big Transformer model emitted more than 50 times the amount of carbon dioxide than an average human per year (Strubell et al., Bender et al.). Criticizing this seemingly inevitable turn to ever larger language models and the far-reaching implications of this approach for both people and the environment, Emily Bender et al., published their now-famous paper ''On the Dangers of Stochastic Parrots: Can Language Models be Too Big?'' in March 2021 (Bender et al.). Two of the authors, Timnit Gebru and Margaret Mitchell, both co-leaders of Google’s Ethical AI Research Team, were fired after publishing this paper against Google’s veto.

Geoffcox at 15:19, 17 June 2023

2023-06-17T15:19:14Z

Show changes

Sfoerster: /* On disputes over better architectures */

2023-06-17T13:35:45Z

On disputes over better architectures

← Older revision		Revision as of 13:35, 17 June 2023
Line 54:		Line 54:
	However, the ideal of an accessible and controllable database falls short where it is conceived as potentially endlessly scalable. It is questionable whether a possibly limitless collection of knowledge is still accessible and searchable or whether it does not transmute into its opposite: "When everything possible is written, nothing is actually said (Burkhardt 11, my translation). What prior knowledge of the structure and content of the database would accessibility require? The conditions of its architecture and the processes of collecting, managing and processing the information are quickly forgotten (ibid. 9f.) and obscure the fact that databases as sites of power also are exclusive and always remain incomplete. Inherent in the idea of an all-encompassing database is a universalism that assumes a generally valid knowledge and thus fails to recognize situated, embodied, temporalized, and hierarchized aspects. Following Wittgenstein, Daston has likewise illustrated that even (mathematical) rules are ambiguous and, as practice, require interpretation of the particular situation (Daston 10).		However, the ideal of an accessible and controllable database falls short where it is conceived as potentially endlessly scalable. It is questionable whether a possibly limitless collection of knowledge is still accessible and searchable or whether it does not transmute into its opposite: "When everything possible is written, nothing is actually said (Burkhardt 11, my translation). What prior knowledge of the structure and content of the database would accessibility require? The conditions of its architecture and the processes of collecting, managing and processing the information are quickly forgotten (ibid. 9f.) and obscure the fact that databases as sites of power also are exclusive and always remain incomplete. Inherent in the idea of an all-encompassing database is a universalism that assumes a generally valid knowledge and thus fails to recognize situated, embodied, temporalized, and hierarchized aspects. Following Wittgenstein, Daston has likewise illustrated that even (mathematical) rules are ambiguous and, as practice, require interpretation of the particular situation (Daston 10).

	== On ~~disputes~~ over ~~better architectures~~ ==		== On Disputes over Better Architectures ==
	The narrative of the opposition of symbolic and connectionist AI locates the origin of this dispute in a disagreement between, on the one hand, Frank Rosenblatt and, on the other, Marvin Minsky and Seymour Papert, who claimed in their book Perceptrons that neural networks could not perform logical operations such as the and/or (XOR) function (Minsky and Papert). This statement is often seen as causal for a cutback in research funding for connectionist approaches, later referred to as the ‘winter of AI’. (Pasquinelli 5). For Gary Marcus, professor of psychology and neural science, this dispute between the different approaches to AI continues to persist and is currently being played out at conferences, via Twitter and manifestos, and specifically on Noema, an online magazine of the Berggruen Institute, on which both Gary Marcus and Yann LeCun publish regularly. In an article titled ''AI is hitting a wall'', Marcus calls for a stronger position of symbolic approaches and argues in particular for a combination of symbolic and connectionist AI (Marcus, “Deep Learning is Hitting a Wall”). For example, research by DeepMind had shown that "We may ''already'' be running into scaling limits in deep learning" and that increasing the size of models would not lead to a reduction in toxic outputs and more truthfulness (Rae et al.). Google has also done similar research (Thoppilan et al.). Marcus criticizes deep learning models for not having actual knowledge, whereas the existence of large, accessible databases of abstract, structured knowledge would be "a prerequisite to robust intelligence." (Marcus, “The Next Decade in AI”). In various essays, Gary Marcus recounts a dramaturgy of the conflict, with highlights including Geoff Hinton's 2015 comparison of symbols and aether, and calling symbolic AI "one of science's greatest mistakes " (Hinton), or the direct attack on symbol manipulation by LeCun, Bengio and Hinton in a 2016 manifesto for deep learning published in Nature (LeCun et al.). For LeCun, however, the dispute reduces to a different understanding of symbols and their localization. While symbolic approaches would locate them ‘inside the machine’, those of connectionist AI would be outside ‘in the world’. The problem of the symbolists would therefore lie in the problem of the "knowledge acquisition bottleneck", which would translate human experience into rules and facts and which could not do justice to the ambiguity of the world (Browning and LeCun). “Deep Learning is going to be able to do anything”, quotes Marcus Geoff Hinton (Hao).		The narrative of the opposition of symbolic and connectionist AI locates the origin of this dispute in a disagreement between, on the one hand, Frank Rosenblatt and, on the other, Marvin Minsky and Seymour Papert, who claimed in their book Perceptrons that neural networks could not perform logical operations such as the and/or (XOR) function (Minsky and Papert). This statement is often seen as causal for a cutback in research funding for connectionist approaches, later referred to as the ‘winter of AI’. (Pasquinelli 5). For Gary Marcus, professor of psychology and neural science, this dispute between the different approaches to AI continues to persist and is currently being played out at conferences, via Twitter and manifestos, and specifically on Noema, an online magazine of the Berggruen Institute, on which both Gary Marcus and Yann LeCun publish regularly. In an article titled ''AI is hitting a wall'', Marcus calls for a stronger position of symbolic approaches and argues in particular for a combination of symbolic and connectionist AI (Marcus, “Deep Learning is Hitting a Wall”). For example, research by DeepMind had shown that "We may ''already'' be running into scaling limits in deep learning" and that increasing the size of models would not lead to a reduction in toxic outputs and more truthfulness (Rae et al.). Google has also done similar research (Thoppilan et al.). Marcus criticizes deep learning models for not having actual knowledge, whereas the existence of large, accessible databases of abstract, structured knowledge would be "a prerequisite to robust intelligence." (Marcus, “The Next Decade in AI”). In various essays, Gary Marcus recounts a dramaturgy of the conflict, with highlights including Geoff Hinton's 2015 comparison of symbols and aether, and calling symbolic AI "one of science's greatest mistakes " (Hinton), or the direct attack on symbol manipulation by LeCun, Bengio and Hinton in a 2016 manifesto for deep learning published in Nature (LeCun et al.). For LeCun, however, the dispute reduces to a different understanding of symbols and their localization. While symbolic approaches would locate them ‘inside the machine’, those of connectionist AI would be outside ‘in the world’. The problem of the symbolists would therefore lie in the problem of the "knowledge acquisition bottleneck", which would translate human experience into rules and facts and which could not do justice to the ambiguity of the world (Browning and LeCun). “Deep Learning is going to be able to do anything”, quotes Marcus Geoff Hinton (Hao).

Sfoerster: /* On the Linking of Language Models and Databases */

2023-06-17T13:35:12Z

On the Linking of Language Models and Databases

← Older revision		Revision as of 13:35, 17 June 2023
Line 42:		Line 42:

	As this depiction richly illustrates, the Future of Life Institute is an organization dedicated to ‘long-termism’, an ideology that promotes posthumanism and the colonization of space (see MacAskill), rather than addressing the multiple contemporary crises (climate, energy, corona pandemic, global refugee movements, and wars) promoted by global financial market capitalism that profoundly reinforce social inequalities. Moreover, "AI doomsaying," i.e., the narrative of artificial intelligence as an autonomously operating agent whose power grows with access to more and more data and ever-improving technology, and whose workings remain inaccessible to human understanding as a black-box, further enhances the influence and power of big tech companies by attributing to their products the power "to remake - or unmake - the world." (Merchant).		As this depiction richly illustrates, the Future of Life Institute is an organization dedicated to ‘long-termism’, an ideology that promotes posthumanism and the colonization of space (see MacAskill), rather than addressing the multiple contemporary crises (climate, energy, corona pandemic, global refugee movements, and wars) promoted by global financial market capitalism that profoundly reinforce social inequalities. Moreover, "AI doomsaying," i.e., the narrative of artificial intelligence as an autonomously operating agent whose power grows with access to more and more data and ever-improving technology, and whose workings remain inaccessible to human understanding as a black-box, further enhances the influence and power of big tech companies by attributing to their products the power "to remake - or unmake - the world." (Merchant).

	~~== On the Linking of Language Models and Databases ==~~
	Taking up the aforementioned criticism such as the ecological and economic costs of training or the output of unverified or discriminating content, there are debates and frequent calls to develop fundamentally smaller language models (e.g., Schick and Schütze). Among others, David Chapman, who together with Phil Agre developed alternatives to prevailing planning approaches in artificial intelligence in the late 1980s (Agre and Chapman 1990), recently called for the development of the smallest language models possible: "AI labs, instead of competing to make their LMs bigger, should compete to make them smaller, while maintaining performance. Smaller LMs will know less (this is good!), will be less expensive to train and run, and will be easier to understand and validate." (Chapman 2022). More precisely, language models should "'know' as little as possible-and retrieve 'knowledge' from a defined text database instead." (ibid.).

	Practices of data collection, processing and analysis are ubiquitous. Accordingly, databases are of great importance as informational infrastructures of knowledge production (cf. Nadim). They are not only "a collection of related data organized to facilitate swift search and retrieval" (Nadim 2021), but also a "medium from which new information can be drawn and which opens up a variety of possibilities for shape-making" (Burkhardt, “Digitale Datenbanken”, 15, my translation). Lev Manovich, in particular, has emphasized the principle openness, connectivity and relationality of databases (Manovich). The articles and database entries are described as accessible and explicit; the modularity should further allow for easy interchangeability and expansion of entries so that the process can scale as needed. Symbolic AI – also known as Good Old-Fashioned AI (GOFAI) – is based on databases. While connectionist AI takes an inductive approach that starts from ‘available’ data, symbolic AI is based on a deductive, rule-based paradigm. Matteo Pasquinelli describes it as a "top-down application of logic to information retrieved from the world" (Pasquinelli 2).

	With the linking of external databases such as Wikipedia with both small and large language models, symbolic AI is making a comeback. The combination of databases and language models is already a common practice and currently discussed under the terms knowledge-grounding or retrieval augmentation (f.e. Lewis et al.). Retrieval-augmented means that in addition to fixed training datasets, the model also draws on large external datasets, an index of documents whose size can run into the trillions of documents while models are called small(er) as they contain a small set of parameters in comparison to other models (Izacard et al.). In a retrieval process, documents are selected, prepared and forwarded to the language model depending on the context of the current task. With this setup, the developers promise improvements in efficiency in terms of resources such as the amount of parameters, ‘shots’ (the amount of correct information in the data sets), and corresponding hardware resources (Izacard et al.).

	In August 2022, MetaAI has already released Atlas, a small language model that was extended with an external database and which, according to the developers, outperformed significantly larger models with a fraction of the parameter count (Izacard et al.). With RETRO (Retrieval-Enhanced Transformer), DeepMind has also developed a language model that consists of a so-called baseline model and a retrieval module. (Borgeaud et al.). In 2017, ParlAI, an open-source framework for dialog research founded by Facebook in 2017, presented Wizard of Wikipedia, a program (a benchmark task) to train language models with Wikipedia entries (Dinan et al.). They framed the problem of hallucination of, in particular, pre-trained Transformer models as one of updating knowledge. With this program, models are trained to extract information from database articles to be then casually inserted into a text or conversation without sounding like an encyclopedia entry themselves, thereby appearing semantically and factually correct.

	With the imagining of small models as ‘free of knowledge’, the focus changes: now not only size and scale are considered a marker of performance, but also the infrastructural and relational linking of language models to external databases. This linking of small language models to external databases thus represents a transversal shift in scale: While the size of the language models is downscaled, the linking with databases implies a simultaneous upscaling.

	== On the Linking of Language Models and Databases ==		== On the Linking of Language Models and Databases ==