personal learning notes

The Road to Understanding LLMs

Unpacking the concepts that brought us from backprogation all the way to talking AI assistants by tracing the chronology of foundational research in AI and ML.

ERA 1 · THE FOUNDATIONS 1986–2003

1986

Backpropagation

Rumelhart, Hinton & Williams

How do models ‘learn’? The algorithm that unlocked deep learning

1997

Long Short-Term Memory

Hochreiter & Schmidhuber

Teaching networks to remember over long sequences

1998

Convolutional Neural Networks

LeCun et al.

Spatial structure meets learnable filters

2003

Neural Language Model

Bengio et al.

Words as vectors, language as probability

ERA 2 · THE BUILDING BLOCKS 2013–2016

2013

Word2Vec

Mikolov et al.

King − Man + Woman = Queen

2014

Sequence to Sequence

Sutskever, Vinyals & Le

Encoding meaning, then decoding it in another language

2015

Adam Optimizer

Kingma & Ba

The optimizer that trains almost everything

2015

Attention

Bahdanau, Cho & Bengio

Letting the decoder look back at relevant input

2016

Residual Networks

He et al.

Skip connections that let gradients flow through 152 layers

2016

BPE Tokenisation

Sennrich, Haddow & Birch

Breaking words into learnable subword pieces

ERA 3 · THE BREAKTHROUGH 2017

2017

Attention Is All You Need

Vaswani et al.

The architecture behind every modern LLM

ERA 4 · THE LLM ERA 2018–2022

2018

GPT-1

Radford et al.

Pre-train once, fine-tune for anything

2019

BERT

Devlin et al.

Reading left and right at the same time

2020

Scaling Laws

Kaplan et al.

More data, more compute, predictably better

2020

GPT-3

Brown et al.

175 billion parameters and emergent few-shot abilities