Content Form:APRJA 13 Pierre Depaz

From creative crowd wiki
Revision as of 10:34, 13 August 2024 by Geoffcox (talk | contribs)
Jump to navigation Jump to search


Pierre Depaz

Shaping Vectors

Discipline and Control in Word Embeddings

Abstract

This article investigates how the word embeddings at the heart of large language models are shaped into acceptable meanings. We show how such shaping follows two educational logics. The use of benchmarks to discover the capabilities of large language models exhibit similar features to Foucault’s disciplining school enclosures, while the process of reinforcement learning is framed as a modulation made explicit in Deleuze’s control societies. The consequences of this shaping into acceptable meaning is argued to result in semantic subspaces. These semantic subspaces are presented as the restricted lexical possibilities of human-machine dialogic interaction, and their consequences are discussed.

Introduction

When following the direction from man towards programmer in a space composed of word vectors, computational linguists Bolukbasi et al. encountered a problem — the resulting value when starting from woman was homemaker (Bolukbasi et. al., 2016). In order to correct this mistake (programmer should be to woman as programmer is to man), they developed algorithms to "de-bias" word embeddings — the vector representation of text — and thus provide a different configuration of words that would be considered less sexist.

Word embeddings are ways to organize words in space such that their proximity or distance to other words holds semantic information. However, an unwanted proximity or distance might be interpreted as bias by researchers and users alike (Noble, 2018; Bender et. al., 2021; Steyerl, 2023), and can be understood as a sense-making problem, in which a given semantic output does not correspond to the expectation. And yet, as Bolukbasi and their colleagues show, it is possible to reconfigure semantic fields such that they make more acceptable sense. This article investigates how word embeddings, as used in large language models (LLMs), are the result of shaping processes, and how these shaping processes are akin to educational processes.

We define shaping processes as the different steps in the development of a technical artefact, in order to modify both its function and user perceptions. This article focuses on two specific processes, benchmarking and reinforcement learning, to highlight the overall tendency in which such shaping processes inscribe themselves. As such, the central question we address is: under which logic do shaping processes take place? How are technical processes implementing such logics in order to discover meaning-making capabilities in LLMs? And who determines the kind of sense that is being made by a large language model? We hypothesize that these processes can be productively analyzed through the dual lens of discipline and control, as put forth, respectively, by Michel Foucault (Foucault, 1993) and Gilles Deleuze (Deleuze, 1992), particularly in their discussion of education; through this, we show that shaping logics, when it comes to generative cognitive technologies, influence the development and assessment of meaning-making abilities both in the machine and the human.

We begin by exploring how meaning can be encoded digitally by making the relationship between syntax and semantics in computer environments explicit. By comparing binary encoding and vector encoding, we highlight the complexities of the latter, particularly when assessing meaningfulness. We then trace how those vectors are being shaped — that, is being rendered operationally meaningful — within LLMs. Specifically, we pay attention to two particular steps in the creation process of an LLM: benchmarking and reinforcement learning. We highlight how these techniques, a combination of discipline and control, contribute to normalization and standardization of meaning, but also from its modulation and adaptation, and result in semantic subspaces.

Discussing Alan Turing’s proposal of machine intelligence as an educational problem, we conclude by turning to theories of co-construction of intelligence (Bachimont, 2004; Stiegler, 2010) to sketch out, through examples of linguistic normalization, hallucinations, and prompting, how such word embeddings can operate logics of control themselves.

1. From a bit to a vector

The question of discursive communication in technical systems is inseparable from the question of encoding. Whether as frequency-modulated hertzian waves, pixel arrays, or smoke clouds, different encodings enable different discourses (Postman, 1985). This section focuses on the shift from one encoding to the other and its semantic implications, looking at both the bit and the vector as a means to represent information in digital environments and highlighting how sense-making shifts from one to the other.

1.1 External reference in the bit

Before the electrification of computers, the use of binary distinction greatly facilitated automation, from the programming of textile patterns in jacquard looms to the processing of punch cards in census exercises (Ceruzzi, 2003). In the context of mechanical work, the binary sign’s only significant property is that it has two mutually exclusive states; from these states, it becomes possible to encode representation (in the form of binary digits) and action (in the form of Boolean logic). Binary is entirely decontextualized, and it does not matter whether the binary sign is represented as a pair of 0/1, red/blue, low/high, cold/hot, as long as it is a disjointed [1].

While enabling flexible representation, this lack of context requires additional cognitive apparatuses, such as references and conventions against which a particular configuration of binary can be checked. Like all codes, there is a need for a cipher to access the meaning encoded in the binary representation (Kittler, 2008). From 01001010 as input, the convention of 4-delimited base 2 encoding allows us to retrieve decimal numbers, here the number 74. Once such number has been decoded, we can further decode it into a letter, following here the reference table of the American Standard Code for Information Interchange (ASCII), in which case the number 74 will be interpreted as the upper case letter J. An equivalent for actions encoded in binary are truth tables, establishing the results of particular combinations of Boolean logic operations.

This decontextualized binary sign was contemporary with another decontextualization: that of the message. Claude Shannon’s theory of communication famously proposed that meaning was irrelevant when calculating the means of communication and that one should, therefore, focus on maximally faithful recreation of the input signal, avoiding any kind of noise interference (understood as the corruption of the initial value of the transmitting medium) (Shannon, 1948). Encoding information through specific signs, whether Morse code or binary code, lent itself particularly well to this paradigm of information transmission. However, such a system holds a second assumption: it assumes the meaningfulness of the source. Indeed, in order to decode a message under Shannon’s theory at all, one must presuppose there is sensical message to decode.

While binary encoding might be first seen as a decontextualized sign, as a technical object, it also exists in a network of relations, involving at least reference documents, transmission media and human agents that are all necessary for it be productively operationalized. Such productivity is achieved specifically by setting aside meaning to focus on syntax.

1.2 Internal reference in the vector

From the 1950s until the 2010s, the binary digit remained the dominant form of encoding information in digital systems. Throughout the 1970s, though, another form appeared, known as Vector Space Models (VSM). Originally proposed by Gerald Salton, this technique for information retrieval relied on the key insight, proposed by linguist John Firth in 1957 that “[we] shall know words by the company they keep” (Firth 12), hence departing from an essentialist view of language, towards a pragmatic one, in which the context of a given word should be part of its encoding (Salton et. al., 1975). Such encoding became particularly popular in broader digital information system after Yoshua Bengio and his team combined it with neural network algorithms at the dawn of the twentieth century (Cardon, 2018).

A vector is a mathematical entity that consists of a series of numbers grouped together to represent another entity. Often, vectors are associated with spatial operations: the entities they represent can be either a point or a direction. In computer science, vectors are used to represent entities known as features, measurable properties of an object (for instance, a human can be said to have features such as age, height, skin pigmentation, credit score, and political leaning). Today, such representations are at the core of contemporary machine learning models, allowing a new kind of translation between the world and the computer (Rieder, 2020).

In machine learning, a vector represents the current values of a given object, such that a human would have a value of 0 for the property “melting point”, while water would have a value of non-0 for the property “melting point”. Conversely, water would have a value of 0 for the property “gender”, while a human would have a non-0 value for that same property. However, this implies that each feature in this space is related to all the other dimensions of the space: a human could potentially have a non-0 value for the property “melting point”. Vectors are thus always containing the potential features of the whole space in which they exist and are more or less relatively tightly defined in terms of each other.

If binary enabled a syntactic exchange (everything can be encoded as a series of 0s and 1s), vectors enable a semantic exchange (everything can be described in terms of everything else). Combining vectors entails a more malleable manipulation of meaning throughout lexical fields. As a vector goes from Berlin to Germany, it represents the concept capital city (Guo et. al., 2023).

Because features exist in relation to one another, and meaning is constructed through the local similarity of vectors, semantic space both flexibly stores meaning (each number in a vector can subtly change without affecting overall meaning) and systematically retrieves it (all vectors exist in the same dimensions).

1.3 Expected meaning, unexpected meaning

The nature of meaning differs depending on encoding – but this is by not exclusive to digital inscription systems. For instance, Jack Goody’s work on lists and Bruno Latour’s on perspective, both suggest epistemological consequences inherent in the choice of one particular syntactic system over another (Goody, 1986; Latour, 2013). While binary encoding allows a translation between physical phenomena and concepts, between electricity and numbers, and while Boolean logic facilitates the implementation of symbolic processing, vectors open up a new perspective on at least one particular level: the spatial dimension of their semantics.

The breadth of the data encoded, packaged in online corpora such as Common Crawl, is valuable insofar as it is mostly syntactically correct natural language. However, it does not follow that its recombination by way of large language model generation will be sensical because the source of such recombination cannot be attributed to a meaningful agent. The problem with language generation based on vector encoding is, therefore, that meaning is ontologically uncertain because it is statistical (software engineers tried to wrangle uncertainty out of the electrical circuits by forcing the continuous voltage into the discrete binary). Such uncertainty brings the acceptability of meaning into question — which can have either potentially boring or dramatic consequences. While binary encoding limits the acceptability of meaning to faithful signal reconstitution, vector encoding gives it a more complicated dimension.

Reconstituting meaning from binary encoding has always been a clearly defined problem, involving only mathematical reconstitution of the original message. Correctness of meaning, on the other hand, began as a computer-syntactic problem, but shifted with vectors to become a human-semantic problem.







Finally, we sketched[2] out how such combination of discipline and control in shaping word embeddings can affect users. Through dialogic interaction, the user probes the spatial configurations of meaning, but the exact topology of these configurations nonetheless remains elusive, and can thus impact what can be said, and what can be imagined, a new addition to the existing challenges of linguistic expression in the era of computation.[3]

Caption for example PNG image

Notes

This article has benefited greatly from thorough discussions with, and copy edits by, Sara Messelaar Hammerschmdit.

  1. pair
  2. I have corrected the spelling here
  3. And this is the end of the article.

In practice, the representation of binary digits as a pair of 0 and 1 is the most convenient.

Works cited

Biography

Pierre Depaz is currently a Lecturer of Interactive Media at NYU Berlin. His research focuses on understanding how software operates procedural translation of non-computational entities, and how it affects humans’ perceptions and affordances with the world. ORCID: https://orcid.org/0009-0009-1489-247X