Glossary/Technical

PRINT THIS AS A ZINE

Intersectional AI A-to-Z
This glossary of terms for Intersectional AI A-to-Z is a great place to get started. By all means it's only one example of definitions for these complex ideas, and it is meant as an open invitation for conversations and amendments! These concepts show the complexity of the topic seen from multiple angles; yet it is so important to try to break down these concepts into plain language in order to offer more openings for folks to join these conversations. Please chime in, ask questions, help make these definitions better!



&quot;it's often the way the technology is being used, rather than the technology itself, that determines whether it is appropriate to call it AI or not.&quot; (Elements of AI, Building AI) &quot;'[A]rtificial intelligence' means lots of things, depending on whether you’re reading science fiction or selling a new app or doing academic research. When someone says they have an AI-powered chatbot, should we expect it to have opinions and feelings like the fictional C-3PO? Or is it just an algorithm that learned to guess how humans are likely to respond to a given phrase? Or a spreadsheet that matches words in your question with a library of preformulated answers? Or an underpaid human who types all the answers from some remote location? Or—even—a completely scripted conversation where human and AI are reading human-written lines like characters in a play? Confusingly, at various times, all these have been referred to as AI. For the purposes of this book, I’ll use the term AI the way it’s mostly used by programmers today: to refer to a particular style of computer program called a machine learning algorithm.&quot; (Shane 2019, 7–8)

artificial intelligence
The collection of programs that use machine learning...have qualities that allow them to operate autonomously (without constant instructions given by the programmer or user), ...

bias &amp; variance
In a machine learning problem, bias is the technical definition for when a model is underfit to the problem, meaning it cannot find a pattern in the data (or can't find the pattern expected by the creator). This happens when there is not enough or not representative enough data used to train it.

Meanwhile variance or overfit, is when a model overstates a pattern, overcomplicating the relationships in the data based on what the creator's expected outcome.

As bias decreases variance increases, and vice versa. Other trade-offs include accuracy vs interpretability, complexity vs scalability, domain-specific knowledge vs data driven, better algorithm vs more data. (Rajati)

confidence interval
A range of numbers that helps describe how uncertain an estimate is. Any confidence interval has a high (e.g. 95%) chance of containing the true* value (that is, the accurate answer to the question being asked). So the bigger the interval, the more uncertain and the more doubt. Confidence intervals are used in statistics and in AI to determine a model (formula)'s reliability. *Classically presumes a hidden but &quot;true&quot; unknown value that is independent of the model (and this is not always the case of course). Unfortunately, uncertainty remains inherent in prediction and difficult to comprehend in models, even for researchers who create them (D'Ignasio &amp; Klein 2019).

data cleaning
Data does not come in ready to go, it must be preprocessed. This includes many adjustments that can affect the outcome, including selecting a subset of data (sampling), standardizing and scaling it in relation to a baseline (normalization), handling missing data and outliers with decision trees (which Adrian MacKenzie calls &quot;affiliated with arbitrariness&quot;), as well as feature creation and extraction (discussed in F). The transformation of real-world information into &quot;data&quot; is never a neutral process but relies heavily on the conditions and goals of the research in context.

&quot;there is no “neutral,” “natural,” or “apolitical” vantage point that training data can be built upon. There is no easy technical “fix” by shifting demographics, deleting offensive terms, or seeking equal representation by skin tone. The whole endeavor of collecting images, categorizing them, and labeling them is itself a form of politics, filled with questions about who gets to decide what images mean and what kinds of social and political work those representations perform.&quot; (Crawford &amp; Paglen &quot;Excavating AI&quot; 2019)

feature extraction
Features are the items in a model that its designers decide are relevant,

GANs
GAN stands for generative adversarial network and is a now-popular kind of machine learning used to generate new data, such as images seen in the &quot;AI dreaming&quot; aesthetic. It requires two parts: One part is trained on existing data in order to check the second part's work. The second part is trying to generate new data that can fool the first part (hence adversaries).

information
Claude Shannon... &/or vs intelligence

k-means
&amp; k-nearest neighbors KNN is a lazy learning algorithm, which means it waits for test points, whereas eager learning generalizes and builds a model first.

machine learning
A field within AI that focuses on adaptive tools. These are &quot;systems that improve their performance in a given task with more and more experience or data&quot; (Elements of AI). &quot;the “depth” of deep learning refers to the complexity of a mathematical model,&quot; (models) &quot;Essentially all models are wrong but some are useful.&quot; (George Box) &quot;Data science is a recent umbrella term (term that covers several subdisciplines) that includes machine learning and statistics, certain aspects of computer science including algorithms, data storage, and web application development. Data science is also a practical discipline that requires understanding of the domain in which it is applied in, for example, business or science (Elements of AI)

neural network
One type of machine learning that is designed with brain neurons as inspiration, neural networks move information through steps as many &quot;nodes&quot; in a process. These are weighted with probabilities and adjusted according to the expected outcome.

(bag-of-)words
A natural language processing method of analyzing and classifying text that looks only at the frequency each word occurs, while disregarding the order of the words, syntax, or grammar—as if the words were all thrown in a bag.

Y (as output)
the simplest machine learning model looks like this. Written out it means that the output of a model is a function  of the inputs   (which are also called parameters or features and are known) plus the error   (which is unknown). A function, here, just means that some calculation (algorithm, operation, recipe) is performed on the stuff inside  based on what the model's creators have determined yields expected, &quot;appropriate&quot; results.