An AI and Machine Learning Glossary

Adriana Campoy and Pablo Grill
July 5, 2019

Artificial intelligence presents fantastic opportunities for many industries, and sophilabs is excited to be a part of this growing field in technology. For the average lay person, though, it can sometimes be a little tricky to keep up with the terminology. We've put together this short glossary to define some of the most commonly used terms in the field.


Metric that indicates the fraction of correct predictions out of the total number of predictions a machine learning model made. A model that is highly accurate turns out a low number of false negatives. 1


In machine learning, a mathematical model that allows a computer to make predictions or decisions based on the data it receives, rather than making predictions or decisions it was specifically programmed to make. 2

Artificial Intelligence (AI)

Technology that gives computers properties of human intelligence, such as the ability to reason, solve problems, make decisions, and learn from past experience. Some applications of artificial intelligence include robotics, autonomous vehicles, pattern recognition, and machine learning. 3

Big Data

Very large sets of data, usually from multiple sources, that may present data management challenges due to the data's volume, inconsistent quality, the variety of types of data, and the high velocity at which the data is received. In machine learning, big data can be used to train more accurate models that can make better predictions and sounder decisions. 4


Also virtual assistant. Software that can have a conversation with a human user via text or voice by understanding the user's input, identifying the user's intent, and responding accordingly. Simple chatbots understand the user's input based on pre-programmed keywords, whereas smart chatbots use artificial intelligence to understand and adapt to the user's requests. 5

Computer Vision

A subfield of artificial intelligence and machine learning that is concerned with helping computers recognize images and understand their content the way a human would. Some areas of computer vision are image classification, image captioning, and object localization, detection, and segmentation. 6

Data Analysis

A field that aims to summarize, describe, and visualize data in order to understand it better and find ways to approach a problem. 7

Data Visualization

Subfield in data analysis that involves creating graphs, charts, and other visuals in order to get a better understanding of data. 8


A collection of separate but related sets of information that a computer manipulates as a single unit. 9

Deep Learning

Deep learning refers to a machine learning model that has multiple layers of neural networks. These multiple layers allow the model to deal with high levels of complexity and abstraction. For example, a trained deep learning model is able to discern and classify unlabelled and unstructured data, such a collection of millions of photos, videos, texts, and audio recordings. 10

Face Detection

Type of computer vision focused on the task of finding human faces in a photo.

Face Recognition

Type of computer vision focused not only on detecting a human face, but identifying whose face it is.

False Negative

A false negative occurs when a machine learning model fails to predict or identify something it is supposed to find. For example, in a machine learning model that is meant to detect cancer, a true negative result would occur when the model states there is no cancer, and the patient is in fact cancer-free. A false negative, however, would state that there is no cancer when cancer is actually present. False negatives and false positives can be used to measure how well a machine learning algorithm performs.

False Positive

A false positive occurs when a machine learning model predicts something that doesn't happen or identifies something that is not actually there. To return to the example of a cancer-detecting machine learning model, a true positive result in this context states that a patient has cancer when the patient does indeed have cancer. A false positive states that a patient has cancer when they are in fact cancer-free. False positives, along with false negatives, can be used to measure a machine learning algorithm's performance.


A model's ability to adapt properly to new data that has been drawn from the same distribution as the one used to create the model. 11

Image Recognition

Subfield of computer vision that enables a computer to identify people, objects, buildings, or other variables in a photo.

Machine Learning (ML)

A field in artificial intelligence that uses algorithms to enable a computer to make decisions or predictions based on the data it receives, thus allowing a computer to "learn" from data rather than follow programmed instructions.

Named-Entity Recognition (NER)

A data extraction task that finds and categorizes named-entities in a text. Named-entities may include names of individuals, organizations, locations, and products, or numerical expressions of time and monetary value. NER is an application of natural language processing (NLP). 12

Natural Language Processing (NLP)

Field in artificial intelligence that enables computers to read, understand, and manipulate human language. NLP allows computers to understand both written text and speech. Applications of NLP include email filtering systems, speech-to-text conversion, voice commands, automatic translations, named-entity recognition (NER), and sentiment analysis. 13

Neural Networks

Set of algorithms in a machine learning model. Some capabilities of neural networks include classifying and labeling data (e.g., detecting and recognizing faces and voices); finding similarities and anomalies (e.g., comparing documents or detecting fraud); and making predictions (e.g., about human health, consumer trends, or a variety of other topics). 14


Also node. The basic unit in a neural network. A neuron receives an input, does some computation with it, produces an output, and determines whether the signal should travel further within the network. 15


Also mathematical optimization or mathematical programming. Branch of applied mathematics in which the goal is to select the best option. Optimization is essential to many machine learning algorithms, which aim to approximate the optimal solution without calculating it. 16


In machine learning, overfitting occurs when the program models the training data "too well," meaning that anomalies in the training data are learned and acquired as concepts. This has a negative effect on the machine learning program's accuracy, and as a result it may identify or classify data incorrectly. 17


Metric that considers true positive predictions over the total number of positive predictions (true and false) that a machine learning model makes. A very precise machine learning model has a low number of false positives. 18


Programming language that is widely used in the field of machine learning. To read more about why Python is a great language for machine learning, check out our recent blog post.


Recall defines the proportion of actual positives that were identified correctly. It's calculated by dividing true positives by the sum of true positives and false negatives. 19

Reinforcement Learning

In machine learning, a way to train an algorithm through experience by using rewards (when it performs well) and punishments (when it performs poorly). The algorithm learns by trying to maximize its rewards and minimize its punishments. 20

Sentiment Analysis

Also opinion mining. A natural language processing (NLP) problem that aims to understand attitudes and opinions expressed in written or spoken language and turn them into structured data. Sentiment analysis has practical applications in customer service, marketing, and public relations, and is often used to analyze product reviews and social media content. 21

Speech Recognition

Ability of a computer to understand human speech. Applications of speech recognition technology include voice commands and dictation via speech-to-text conversion. Not to be confused with voice recognition (see definition below).

Supervised Learning

The most common process by which a machine learning algorithm is trained. During this process, the algorithm is given training data, and its predictions are corrected by an answer key. In this way, the algorithm is able to learn and then adjust and improve its performance. 22

Training Data

The data a computer receives that allows it to build a machine learning model. For example, to teach a computer to recognize a handwritten letter A, it has to be fed many examples of what a handwritten A looks like, as well as examples of handwritten letters that are not A.


Occurs when a machine learning model has poor performance and can't model the training data. The solution is usually to try different algorithms. 23

Unsupervised Learning

Training process for a machine learning algorithm in which data is not labelled previously, so there is no answer key. The goal of unsupervised learning is not to produce correct answers like in supervised learning, but rather, to discover ways to understand the data by finding associations or putting it into groups. 24

Voice Recognition

The ability of a computer to identify the individual speaker based on the pitch of their voice and patterns in their speech. Voice recognition can be applied to a variety of areas, including security and authentication systems and criminal investigations. 25

  1. "Classification: Accuracy," Machine Learning Crash Course, Google Developers

  2. Wikipedia, The Free Encyclopedia, s.v. "Machine Learning," accessed May 2, 2019. 

  3. Encyclopᴂdia Britannica Online, s.v. "Artificial Intelligence" by B.J. Copeland, accessed May 2, 2019. 

  4. Gil Press, "12 Big Data Definitions: What's Yours?"Forbes, September 4, 2014. 

  5. Anadea, "What is a Chatbot and How to Use it for Your Business," *Mediu*m, January 5, 2018. 

  6. Jason Brownlee, "A Gentle Introduction to Computer Vision," Machine Learning Mastery, March 19, 2019. 

  7. Jason Brownlee, "Quick and Dirty Data Analysis for Your Machine Learning Problem," Machine Learning Mastery, February 14, 2014. 

  8. Ibid. 

  9. Cambridge Dictionary, s.v. "dataset," accessed May 2, 2019. 

  10. "Artificial Intelligence (AI) vs. Machine Learning vs. Deep Learning," Artificial Intelligence Wiki, Jason Brownlee, "What is Deep Learning?" Machine Learning Mastery," August 16, 2016. 

  11. "Generalization," Machine Learning Crash Course, Google Developers. 

  12. Techopedia, s.v. "Named-Entity Recognition (NER)," accessed July 5, 2019. 

  13. "Natural Language Processing: What It Is and Why It Matters," SAS

  14. "A Beginner's Guide to Neural Networking and Deep Learning," Artificial Intelligence Wiki,

  15. Ibid. 

  16. "Introduction to Mathematical Optimization,"

  17. Jason Brownlee, "Overfitting and Underfitting with Machine Learning Algorithms," Machine Learning Mastery, March 21, 2016. 

  18. Shruti Saxena, "Precision vs Recall," *Towards Data Scienc*e, May 11, 2018. 

  19. "Classification: Decision and Recall," Machine Learning Crash Course, Google Developers. 

  20. Techopedia, s.v. "Reinforcement Learning," accessed May 14, 2019. 

  21. "Sentiment Analysis: Nearly Everything You Need to Know," MonkeyLearn

  22. Jason Brownlee, "Supervised and Unsupervised Machine Learning Algorithms," Machine Learning Mastery, March 16, 2016. 

  23. Jason Brownlee, "Overfitting and Underfitting with Machine Learning Algorithms," Machine Learning Mastery, March 21, 2016. 

  24. Jason Brownlee, "Supervised and Unsupervised Machine Learning Algorithms," Machine Learning Mastery, March 16, 2016. 

  25. Chris Kikel, "Difference Between Voice Recognition and Speech Recognition," Total Voice Technologies

"An AI and Machine Learning Glossary" by Adriana Campoy and Pablo Grill is licensed under CC BY SA. Source code examples are licensed under MIT.

Photo by Frank V.

Categorized under research & learning.

We’d love to work with you.

We treat client projects as if they were our own, understanding the underlying needs and astonishing users with the results.