Complete Study Roadmap

Build Your OwnClaude-Level AI

A complete, research-backed roadmap covering every phase — from math fundamentals to training billion-parameter language models. Curated papers, books, courses, and hands-on projects.

6Phases
5–7Years (Full-Time)
25+Research Papers
15+Books
20+Courses
Estimate Your Timeline ↓
// calculator

Your Personal Timeline

Adjust your weekly study hours to see a personalized completion estimate.

Timeline Estimator

Total study effort required: approx. 10,000 – 15,000 hours

Hours / week20h
Prior experience0%
12.0
Years
144
Months
12,500
Total Hours
// roadmap

6 Phases to Claude-Level AI

Each phase builds on the previous. Click a phase to expand books, papers, courses, and projects.

P01
P02
P03
P04
P05
P06
// arxiv

Must-Read Research Papers

The foundational papers that define modern large language models. Read these in order.

2017
Attention Is All You Need
Vaswani et al. (Google Brain)
THE transformer paper. Every modern LLM is built on this architecture.
Read on arXiv →
2018
BERT: Pre-training of Deep Bidirectional Transformers
Devlin et al. (Google)
Established masked language modeling pre-training. Still widely used.
Read on arXiv →
2020
Language Models are Few-Shot Learners (GPT-3)
Brown et al. (OpenAI)
175B parameter model that showed emergent capabilities at scale.
Read on arXiv →
2022
Constitutional AI: Harmlessness from AI Feedback
Bai et al. (Anthropic)
How Claude is trained to be helpful and safe. Core Anthropic technique.
Read on arXiv →
2020
Scaling Laws for Neural Language Models
Kaplan et al. (OpenAI)
Predicts how model loss scales with compute, data, and parameters.
Read on arXiv →
2022
Training Language Models to Follow Instructions (InstructGPT)
Ouyang et al. (OpenAI)
Introduced RLHF pipeline. The paper that created the 'assistant' paradigm.
Read on arXiv →
2023
LLaMA: Open and Efficient Foundation Language Models
Touvron et al. (Meta AI)
Best documented open-source LLM. Great for studying architecture.
Read on arXiv →
2022
FlashAttention: Fast and Memory-Efficient Exact Attention
Dao et al.
IO-aware exact attention. Critical for training long-context models.
Read on arXiv →
2023
Direct Preference Optimization (DPO)
Rafailov et al.
Simplified RLHF without separate reward model. Widely adopted.
Read on arXiv →
2021
LoRA: Low-Rank Adaptation of Large Language Models
Hu et al. (Microsoft)
Fine-tune 70B models on a single GPU. Most useful paper for practitioners.
Read on arXiv →
2017
Proximal Policy Optimization Algorithms
Schulman et al. (OpenAI)
The RL algorithm that powers RLHF training. Must understand.
Read on arXiv →
2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Bubeck et al. (Microsoft)
Fascinating analysis of GPT-4's emergent capabilities and reasoning.
Read on arXiv →
2023
GPT-4 Technical Report
OpenAI
State-of-the-art LLM. Understand what the frontier looks like.
Read on arXiv →
2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Rajbhandari et al. (Microsoft)
Distributed training memory optimization. You need this to train large models.
Read on arXiv →
2014
Adam: A Method for Stochastic Optimization
Kingma & Ba
The optimizer behind almost every LLM. Understand it at a deep level.
Read on arXiv →
2021
RoPE: Rotary Position Embedding
Su et al.
Position encoding used in LLaMA, Mistral, Claude. Better than learned.
Read on arXiv →
2020
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Gao et al. (EleutherAI)
How to build training datasets at scale. Data quality matters most.
Read on arXiv →
2024
Mixtral of Experts
Jiang et al. (Mistral AI)
Best open MoE model. Understanding sparse expert routing.
Read on arXiv →
// library

Essential Books

Hand-picked books from beginner to advanced, covering math, ML, deep learning, and NLP.

📐
Mathematics for Machine Learning
Deisenroth, Faisal & Ong
beginner
Free PDF. Covers linear algebra, calculus, probability — everything you need.
🧠
Deep Learning
Goodfellow, Bengio & Courville
intermediate
Free at deeplearningbook.org. The textbook for the field. Dense but essential.
📙
The Elements of Statistical Learning
Hastie, Tibshirani & Friedman
advanced
Free PDF at ESL website. Statistical ML — rigorous and thorough.
📗
Hands-On Machine Learning (3rd Ed.)
Aurélien Géron
beginner
Best practical ML book. Scikit-learn to neural networks. Highly recommended.
🤗
NLP with Transformers
Lewis Tunstall et al. (Hugging Face)
intermediate
The go-to practical book for transformers. Pairs with HF libraries perfectly.
📖
Build a Large Language Model From Scratch
Sebastian Raschka
intermediate
Step-by-step code. You build a GPT-2 equivalent from scratch. Outstanding.
🦜
Speech and Language Processing (3rd Ed.)
Jurafsky & Martin
intermediate
Free online. Stanford's NLP bible. Theory of language models from first principles.
💻
Deep Learning with PyTorch
Eli Stevens, Luca Antiga
beginner
Best PyTorch book. Goes from tensors to production models.
🛠️
Designing Machine Learning Systems
Chip Huyen
intermediate
ML in production: data, training, deployment, monitoring. Invaluable.
🤖
Human Compatible: AI and the Problem of Control
Stuart Russell
beginner
Why AI safety matters and how to think about aligned AI systems.
🛡️
The Alignment Problem
Brian Christian
beginner
Accessible deep dive into AI alignment challenges. Essential background.
⚙️
Designing Data-Intensive Applications
Martin Kleppmann
intermediate
Distributed systems for ML infrastructure. Required for serious scaling.
📘
Pattern Recognition and Machine Learning
Christopher Bishop
advanced
Bayesian perspective on ML. Essential for cutting-edge research understanding.
Neural Networks and Deep Learning (online book)
Michael Nielsen
beginner
Free at neuralnetworksanddeeplearning.com. Best visual explanation of backprop.
// learn

Top Courses

Free and paid courses from the world's best AI educators.

MIT OpenCourseWare
Linear Algebra (18.06SC)
Gilbert Strang
Freemath
View Course →
Khan Academy
Multivariable Calculus & Probability
Khan Academy
Freemath
View Course →
Coursera / DeepLearning.AI
Machine Learning Specialization
Andrew Ng
Paidml
View Course →
Coursera / DeepLearning.AI
Deep Learning Specialization (5 courses)
Andrew Ng
Paiddl
View Course →
Fast.ai
Practical Deep Learning for Coders (Part 1 & 2)
Jeremy Howard
Freedlml
View Course →
YouTube / karpathy.ai
Neural Networks: Zero to Hero
Andrej Karpathy
Freedlnlp
View Course →
Stanford University
CS224N — NLP with Deep Learning
Christopher Manning
Freenlp
View Course →
Stanford University
CS229 — Machine Learning
Andrew Ng
Freemlmath
View Course →
Hugging Face
NLP Course (Transformers & Datasets)
Hugging Face Team
Freenlp
View Course →
NYU / Collège de France
Deep Learning Course (DS-GA 1008)
Yann LeCun
Freedl
View Course →
UC Berkeley
CS285 — Deep Reinforcement Learning
Sergey Levine
Freeml
View Course →
OpenAI
Spinning Up in Deep RL
OpenAI Research
Freeml
View Course →
Full Stack Deep Learning
LLM Bootcamp 2023
FSDL Team
Freenlpdl
View Course →
DeepLearning.AI (Short Course)
Reinforcement Learning from Human Feedback
DeepLearning.AI
Paidnlp
View Course →
Google
Machine Learning Crash Course
Google Team
Freeml
View Course →
Coursera / DeepLearning.AI
MLOps Specialization
Andrew Ng
Paidml
View Course →
3Blue1Brown YouTube
Neural Networks (visual explanations)
Grant Sanderson
Freedlmath
View Course →
Weights & Biases
Effective MLOps: Model Development
W&B Team
Freeml
View Course →
// tracker

Skills Checklist

Track your progress. Your selections are saved in your browser.

0%Done

Overall Progress

0 of 34 skills completed

Phase 1 — Math Foundations
Linear Algebra basics (vectors, matrices, eigenvalues)
Calculus (derivatives, chain rule, gradient)
Probability & statistics (Bayes, distributions)
Information theory basics (entropy, KL divergence)
Implement gradient descent from scratch
Phase 2 — Machine Learning
Understand bias-variance tradeoff
Linear & logistic regression from scratch
Decision trees, random forests, XGBoost
Model evaluation (CV, AUC, F1)
Feature engineering pipeline
Phase 3 — Deep Learning
Backpropagation from scratch
CNNs (convolutions, pooling, ResNet)
RNNs and LSTMs
Batch normalization & dropout
Build micrograd & makemore (Karpathy)
Train a model on GPU in PyTorch
Phase 4 — Transformers & LLMs
Understand self-attention mechanism
Implement multi-head attention from scratch
BPE tokenizer from scratch
Build nanoGPT end-to-end
Understand positional encoding (RoPE)
Fine-tune a pretrained model (BERT/GPT-2)
Phase 5 — RLHF & Alignment
Understand reward modelling
PPO algorithm from scratch
Run DPO on small model
Understand Constitutional AI principles
Build a basic red-teaming evaluation
Read all Anthropic alignment papers
Phase 6 — Infrastructure
Multi-GPU training setup (DeepSpeed)
LoRA fine-tuning (PEFT library)
Quantization (GGUF, AWQ, GPTQ)
Deploy LLM API (vLLM / llama.cpp)
Pre-train 125M GPT on custom corpus
Set up experiment tracking (W&B)