LLMs Demystified
Understand how Large Language Models actually work
Why I Built This
When ChatGPT launched, I was like everyone else—completely blown away. But as a DevOps engineer, I have this annoying habit: I can't just use something without understanding how it works.
So I tried to learn. And hit a wall.
Every article threw around words like transformers, attention mechanisms, embeddings, and tokens as if I should already know what they mean. Papers were filled with equations that made my eyes glaze over.
And the explanations? "Neural networks learn patterns from data." Cool. But HOW? What does that actually mean?
I could use the API. I could prompt engineer my way through problems. But I had no idea what was happening inside that black box.
So I'm building this page. Not because I've figured it all out—but because teaching is how I learn. If you're frustrated like I was, let's figure this out together.
LLMs Are Not Magic. They're Prediction Machines.
Here's the uncomfortable truth that took me way too long to grasp: Large Language Models don't "understand" anything. They don't "think." They don't have opinions or consciousness.
An LLM is just a very sophisticated autocomplete. Given some text, it predicts what token comes next.
The Core Loop
You type text
'The quick brown fox'
LLM converts to numbers
Text → Tokens → Vectors
Predicts next token
Statistically: 'jumps' is likely
Then it appends that token and repeats. Again. And again. That's it. That's the entire magic trick.
The Pipeline We'll Explore
Input BlockNOW
- • Tokenization
- • Token Embeddings
- • Position Encoding
Processor
- • Attention
- • Feed-Forward
- • Layer Norm
Output Block
- • Prediction Head
- • Softmax
- • Sampling
We're starting with the Input Block—how text becomes numbers.
BY THE END OF THIS SECTION, YOU'LL UNDERSTAND:
- →Tokenization: How "hello" becomes [15496] and why that matters
- →Embeddings: How tokens become 768-dimensional vectors that capture meaning
- →Why "cat" and "dog" are closer than "cat" and "chair" in vector space
Ready to demystify the black box? Let's start with the first step: Tokenization.