personal learning notes
The Road to Understanding LLMs
Unpacking the concepts that brought us from backprogation all the way to talking AI assistants by tracing the chronology of foundational research in AI and ML.
Backpropagation
How do models ‘learn’? The algorithm that unlocked deep learning
Long Short-Term Memory
Teaching networks to remember over long sequences
Convolutional Neural Networks
Spatial structure meets learnable filters
Neural Language Model
Words as vectors, language as probability
Word2Vec
King − Man + Woman = Queen
Sequence to Sequence
Encoding meaning, then decoding it in another language
Adam Optimizer
The optimizer that trains almost everything
Attention
Letting the decoder look back at relevant input
Residual Networks
Skip connections that let gradients flow through 152 layers
BPE Tokenisation
Breaking words into learnable subword pieces
Attention Is All You Need
The architecture behind every modern LLM
GPT-1
Pre-train once, fine-tune for anything
BERT
Reading left and right at the same time
Scaling Laws
More data, more compute, predictably better
GPT-3
175 billion parameters and emergent few-shot abilities