An interactive guide that helps you understand the fundamental concepts and terminology behind artificial intelligence, explained in simple and clear language.
A token is the smallest piece of text an AI model can process—a word, part of a word, punctuation, or character. Each token gets a unique ID number the model uses to understand text.
When you input text, the model first breaks it into tokens and assigns each a unique ID number for processing.
Tokenization converts text into smaller pieces called tokens—words, subwords, or characters—that the model can process.
Tokenization happens automatically as the first step—the model must break text into tokens before it can process anything.
OpenAI Tokenizer, an interactive tool that lets you see exactly how text gets broken down into tokens
An embedding converts each token into numbers that capture its meaning. Tokens with similar meanings are positioned close together in this mathematical space, helping the model understand word relationships.
Embedding transforms tokens into numerical points, organizing similar concepts together. This happens after tokenization and enables the model to understand semantic relationships.
Wikipedia's 'Vector space', a detailed explanation of how vector spaces work mathematically
A context window is the maximum amount of text (in tokens) a model can process at once. Information beyond this limit isn't accessible during processing.
How are these computers all going to work together? They're probably going to work together a lot like people do. Sometimes they're going to work together really well, and other times they're not going to work together so well.
What's happened, there's been a few installations where people have hooked these things together. The one installation that stands out is at Xerox Palo Alto Research Center, or PARC, for short. And they hooked about a hundred computers together on what's called a local area network, which is just a cable that carries all this information back and forth. […]
Then an interesting thing happened. There were twenty people interested in volleyball. So a volleyball distribution list evolved, and then, when the volleyball game next week was changed, you'd write a quick memo and send it to the volleyball distribution list. Then there was a Chinese food cooking list. And before long, there were more lists than people.
And it was a very, very interesting phenomenon, because I think that that's exactly what's going to happen as we start to tie these things [computers] together: they're going to facilitate communication and facilitate bringing people together in the special interests that they have.
And we're about five years away from really solving the problems of hooking these computers together in the office. And we're about ten to fifteen years away from solving the problems of hooking them together in the home. A lot of people are working on it, but it's a pretty fierce problem.
Now, Apple's strategy is really simple. What we want to do is put an incredibly great computer in a book that you carry around with you, that you can learn how to use in twenty minutes. That's what we want to do. And we want to do it this decade. And we really want to do it with a radio link in it so you don't have to hook up to anything—you're in communication with all these larger databases and other computers. We don't know how to do that now. It's impossible technically.
We're trying to get away from programming. We've got to get away from programming because people don't want to program computers. People want to use computers.
We [at Apple] feel that, for some crazy reason, we're in the right place at the right time to put something back. And what I mean by that is, most of us didn't make the clothes we're wearing, and we didn't cook or grow the food that we eat, and we're speaking a language that was developed by other people, and we use a mathematics that was developed by other people. We are constantly taking.
And the ability to put something back into the pool of human experience is extremely neat. I think that everyone knows that in the next ten years we have the chance to really do that. And we [will] look back—and while we're doing it, it's pretty fun, too—we will look back and say, "God, we were a part of that!"
We started with nothing. So whenever you start with nothing, you can always shoot for the moon. You have nothing to lose. And the thing that happens is—when you sort of get something, it's very easy to go into cover-your-ass mode, and then you become conservative and vote for Ronnie. So what we're trying to do is to realize the very amazing time that we're in and not go into that mode.
I can't tell you why you need a home computer right now. I mean, people ask me, "Why should I buy a computer in my home?"
And I say, "Well, to learn about it, to run some fun simulations. If you've got some kids, they should probably know about it in terms of literacy. They can probably get some good educational software, especially if they're younger.
You can hook up to the source and, you know, do whatever you're going to do. Meet women, I don't know. But other than that, there's no good reason to buy one for your house right now. But there will be. There will be."
I don't think finance is what drives people at Apple. I don't think it's money, but feeling like you own a piece of the company, and this is your damn company, and if you see something … We always tell people, "You work for Apple first and your boss second." We feel pretty strongly about that.
When you have a million people using something, then that's when creativity really starts to happen on a very rapid scale. […] We need some revolutions like [the] Lisa [computer], but we also then need to get millions of units out there and let the world innovate—because the world's pretty good at innovating, we've found.
Each model has a token limit—from thousands to millions—which determines how much context it can remember and use.
Latent space is an internal map where the model organizes learned knowledge. Similar concepts cluster together, creating groups of related ideas.
Each point represents an embedding, positioned near similar concepts. This organization helps the model efficiently understand relationships and find connections.
A neural network is a system of interconnected layers that learns from examples. Each layer processes information, extracting increasingly complex patterns to recognize images, understand language, and perform tasks.
Information flows through layers, getting refined at each step. Early layers detect simple features, while deeper layers combine them into complex patterns, transforming questions into answers.
TensorFlow Playground, an interactive visual demonstration that shows how neural networks learn and process information through their layers
Parameters are numerical values the model learns during training. They control how network parts connect and respond, collectively defining the model's knowledge and behavior.
Each parameter influences how the model processes data. More parameters (billions or trillions) often enable more complex patterns, though training data quality also matters.
Understanding Model Parameters: 8B vs 70B Explained, a clear explanation of what parameter counts mean and how they affect model performance
An AI model is a system trained on large amounts of data that learns patterns and relationships. Once trained, it can make predictions, generate content, or understand new information.
A trained neural network that recognizes patterns and relationships, enabling it to make predictions or generate new content.
models.dev, a comprehensive directory of available AI models and development tools
A Transformer processes all words in a text simultaneously, not sequentially. This parallel processing helps it understand word relationships across the entire text, improving context understanding.
Transformers understand connections across entire sentences or documents, not just neighboring words—like reading a whole page at once instead of word by word.
Attention Is All You Need (Vaswani et al., 2017), the groundbreaking research paper that introduced the Transformer architecture
Attention determines which words are most important for understanding meaning. Each word "pays attention" to others and assigns greater weight to relevant ones, helping the model focus on what matters.
Attention identifies crucial words and relationships, highlighting what's most relevant rather than treating all words equally.
Pre-training is the initial phase where a model learns from vast amounts of text data, building a foundation of language patterns, context, and general knowledge.
Pre-training builds a foundation of general knowledge, so the model can learn specific tasks faster by building on existing knowledge rather than starting from scratch.
Fine-tuning trains a pre-trained model further on specialized data. It keeps general knowledge but learns to apply it in focused ways for specific tasks or domains.
Fine-tuning adds specialized skills (like design terminology or medical language) while preserving general knowledge—like teaching technical jargon to someone who already speaks English.
Reinforcement learning trains models through feedback loops. The model tries actions, receives rewards or penalties, and learns which actions lead to better outcomes over time.
Through iterative trying, feedback, and learning from mistakes, the model gradually improves—similar to how humans learn.
Training language models to follow instructions with human feedback (Ouyang et al., 2022), the research paper that demonstrated how reinforcement learning from human feedback improves AI model behavior
Chain of thought shows the model's step-by-step reasoning. Instead of jumping to answers, it breaks problems into smaller steps, making the process transparent and often more accurate.
Chain of thought makes reasoning visible and often improves accuracy by forcing systematic thinking rather than guessing.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022), the research that introduced and demonstrated the effectiveness of chain-of-thought reasoning
Inference is when a trained model uses its knowledge to generate responses. It predicts one token at a time, using each prediction to inform the next until completing the output.
Inference happens every time you interact with AI—the model processes your input and generates a response token by token.
RAG (Retrieval-Augmented Generation) lets models search and retrieve information from external sources before generating answers, providing more accurate and up-to-date responses.
RAG powers AI assistants that search the web or databases. The system retrieves relevant information first, then combines it with the model's training to generate better answers.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020), the original research paper that introduced the RAG approach
An AI agent is an autonomous system that independently plans, takes actions, and accomplishes tasks. Unlike simple chatbots, agents use tools, learn from feedback, remember interactions, and adapt to achieve goals.
Agents operate independently, breaking tasks into steps, executing them, evaluating results, and adjusting strategy—all without constant human intervention.
Building effective agents, Engineering at Anthropic, a comprehensive guide on how to design and build effective AI agents
A workflow is a structured sequence of steps that accomplish a task. Each step takes the previous output, processes it, and passes results to the next, creating a predictable path.
Workflows organize complex processes into connected steps, making them easier to understand, maintain, and optimize.
An LLM (Large Language Model) is a large neural network trained on vast amounts of text. It learns language nuances, predicts next words, and generates coherent, contextually appropriate text.
LLMs power modern AI assistants, chatbots, and language systems. Their ability to understand context, generate human-like text, and work across languages makes them versatile.