Complete Study Roadmap

Build Your OwnClaude-Level AI

A complete, research-backed roadmap covering every phase — from math fundamentals to training billion-parameter language models. Curated papers, books, courses, and hands-on projects.

6Phases

5–7Years (Full-Time)

25+Research Papers

15+Books

20+Courses

Estimate Your Timeline ↓

// calculator

Your Personal Timeline

Adjust your weekly study hours to see a personalized completion estimate.

Timeline Estimator

Total study effort required: approx. 10,000 – 15,000 hours

Hours / week20h

Prior experience0%

12.0

Years

144

Months

12,500

Total Hours

// roadmap

6 Phases to Claude-Level AI

Each phase builds on the previous. Click a phase to expand books, papers, courses, and projects.

P01

P02

P03

P04

P05

P06

// arxiv

Must-Read Research Papers

The foundational papers that define modern large language models. Read these in order.

2017

Attention Is All You Need

Vaswani et al. (Google Brain)

THE transformer paper. Every modern LLM is built on this architecture.

Read on arXiv →

2018

BERT: Pre-training of Deep Bidirectional Transformers

Devlin et al. (Google)

Established masked language modeling pre-training. Still widely used.

Read on arXiv →

2020

Language Models are Few-Shot Learners (GPT-3)

Brown et al. (OpenAI)

175B parameter model that showed emergent capabilities at scale.

Read on arXiv →

2022

Constitutional AI: Harmlessness from AI Feedback

Bai et al. (Anthropic)

How Claude is trained to be helpful and safe. Core Anthropic technique.

Read on arXiv →

2020

Scaling Laws for Neural Language Models

Kaplan et al. (OpenAI)

Predicts how model loss scales with compute, data, and parameters.

Read on arXiv →

2022

Training Language Models to Follow Instructions (InstructGPT)

Ouyang et al. (OpenAI)

Introduced RLHF pipeline. The paper that created the 'assistant' paradigm.

Read on arXiv →

2023

LLaMA: Open and Efficient Foundation Language Models

Touvron et al. (Meta AI)

Best documented open-source LLM. Great for studying architecture.

Read on arXiv →

2022

FlashAttention: Fast and Memory-Efficient Exact Attention

Dao et al.

IO-aware exact attention. Critical for training long-context models.

Read on arXiv →

2023

Direct Preference Optimization (DPO)

Rafailov et al.

Simplified RLHF without separate reward model. Widely adopted.

Read on arXiv →

2021

LoRA: Low-Rank Adaptation of Large Language Models

Hu et al. (Microsoft)

Fine-tune 70B models on a single GPU. Most useful paper for practitioners.

Read on arXiv →

2017

Proximal Policy Optimization Algorithms

Schulman et al. (OpenAI)

The RL algorithm that powers RLHF training. Must understand.

Read on arXiv →

2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Bubeck et al. (Microsoft)

Fascinating analysis of GPT-4's emergent capabilities and reasoning.

Read on arXiv →

2023

GPT-4 Technical Report

OpenAI

State-of-the-art LLM. Understand what the frontier looks like.

Read on arXiv →

2019

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Rajbhandari et al. (Microsoft)

Distributed training memory optimization. You need this to train large models.

Read on arXiv →

2014

Adam: A Method for Stochastic Optimization

Kingma & Ba

The optimizer behind almost every LLM. Understand it at a deep level.

Read on arXiv →

2021

RoPE: Rotary Position Embedding

Su et al.

Position encoding used in LLaMA, Mistral, Claude. Better than learned.

Read on arXiv →

2020

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Gao et al. (EleutherAI)

How to build training datasets at scale. Data quality matters most.

Read on arXiv →

2024

Mixtral of Experts

Jiang et al. (Mistral AI)

Best open MoE model. Understanding sparse expert routing.

Read on arXiv →

// library

Essential Books

Hand-picked books from beginner to advanced, covering math, ML, deep learning, and NLP.

📐

Mathematics for Machine Learning

Deisenroth, Faisal & Ong

beginner

Free PDF. Covers linear algebra, calculus, probability — everything you need.

🧠

Deep Learning

Goodfellow, Bengio & Courville

intermediate

Free at deeplearningbook.org. The textbook for the field. Dense but essential.

📙

The Elements of Statistical Learning

Hastie, Tibshirani & Friedman

advanced

Free PDF at ESL website. Statistical ML — rigorous and thorough.

📗

Hands-On Machine Learning (3rd Ed.)

Aurélien Géron

beginner

Best practical ML book. Scikit-learn to neural networks. Highly recommended.

🤗

NLP with Transformers

Lewis Tunstall et al. (Hugging Face)

intermediate

The go-to practical book for transformers. Pairs with HF libraries perfectly.

📖

Build a Large Language Model From Scratch

Sebastian Raschka

intermediate

Step-by-step code. You build a GPT-2 equivalent from scratch. Outstanding.

🦜

Speech and Language Processing (3rd Ed.)

Jurafsky & Martin

intermediate

Free online. Stanford's NLP bible. Theory of language models from first principles.

💻

Deep Learning with PyTorch

Eli Stevens, Luca Antiga

beginner

Best PyTorch book. Goes from tensors to production models.

🛠️

Designing Machine Learning Systems

Chip Huyen

intermediate

ML in production: data, training, deployment, monitoring. Invaluable.

🤖

Human Compatible: AI and the Problem of Control

Stuart Russell

beginner

Why AI safety matters and how to think about aligned AI systems.

🛡️

The Alignment Problem

Brian Christian

beginner

Accessible deep dive into AI alignment challenges. Essential background.

⚙️

Designing Data-Intensive Applications

Martin Kleppmann

intermediate

Distributed systems for ML infrastructure. Required for serious scaling.

📘

Pattern Recognition and Machine Learning

Christopher Bishop

advanced

Bayesian perspective on ML. Essential for cutting-edge research understanding.

⚡

Neural Networks and Deep Learning (online book)

Michael Nielsen

beginner

Free at neuralnetworksanddeeplearning.com. Best visual explanation of backprop.

// learn

Top Courses

Free and paid courses from the world's best AI educators.

MIT OpenCourseWare

Linear Algebra (18.06SC)

Gilbert Strang

Freemath

View Course →

Khan Academy

Multivariable Calculus & Probability

Khan Academy

Freemath

View Course →

Coursera / DeepLearning.AI

Machine Learning Specialization

Andrew Ng

Paidml

View Course →

Coursera / DeepLearning.AI

Deep Learning Specialization (5 courses)

Andrew Ng

Paiddl

View Course →

Fast.ai

Practical Deep Learning for Coders (Part 1 & 2)

Jeremy Howard

Freedlml

View Course →

YouTube / karpathy.ai

Neural Networks: Zero to Hero

Andrej Karpathy

Freedlnlp

View Course →

Stanford University

CS224N — NLP with Deep Learning

Christopher Manning

Freenlp

View Course →

Stanford University

CS229 — Machine Learning

Andrew Ng

Freemlmath

View Course →

Hugging Face

NLP Course (Transformers & Datasets)

Hugging Face Team

Freenlp

View Course →

NYU / Collège de France

Deep Learning Course (DS-GA 1008)

Yann LeCun

Freedl

View Course →

UC Berkeley

CS285 — Deep Reinforcement Learning

Sergey Levine

Freeml

View Course →

OpenAI

Spinning Up in Deep RL

OpenAI Research

Freeml

View Course →

Full Stack Deep Learning

LLM Bootcamp 2023

FSDL Team

Freenlpdl

View Course →

DeepLearning.AI (Short Course)

Reinforcement Learning from Human Feedback

DeepLearning.AI

Paidnlp

View Course →

Google

Machine Learning Crash Course

Google Team

Freeml

View Course →

Coursera / DeepLearning.AI

MLOps Specialization

Andrew Ng

Paidml

View Course →

3Blue1Brown YouTube

Neural Networks (visual explanations)

Grant Sanderson

Freedlmath

View Course →

Weights & Biases

Effective MLOps: Model Development

W&B Team

Freeml

View Course →

// tracker

Skills Checklist

Track your progress. Your selections are saved in your browser.

0%Done

Overall Progress

0 of 34 skills completed

Phase 1 — Math Foundations

Linear Algebra basics (vectors, matrices, eigenvalues)

Calculus (derivatives, chain rule, gradient)

Probability & statistics (Bayes, distributions)

Information theory basics (entropy, KL divergence)

Implement gradient descent from scratch

Phase 2 — Machine Learning

Understand bias-variance tradeoff

Linear & logistic regression from scratch

Decision trees, random forests, XGBoost

Model evaluation (CV, AUC, F1)

Feature engineering pipeline

Phase 3 — Deep Learning

Backpropagation from scratch

CNNs (convolutions, pooling, ResNet)

RNNs and LSTMs

Batch normalization & dropout

Build micrograd & makemore (Karpathy)

Train a model on GPU in PyTorch

Phase 4 — Transformers & LLMs

Understand self-attention mechanism

Implement multi-head attention from scratch

BPE tokenizer from scratch

Build nanoGPT end-to-end

Understand positional encoding (RoPE)

Fine-tune a pretrained model (BERT/GPT-2)

Phase 5 — RLHF & Alignment

Understand reward modelling

PPO algorithm from scratch

Run DPO on small model

Understand Constitutional AI principles

Build a basic red-teaming evaluation

Read all Anthropic alignment papers

Phase 6 — Infrastructure

Multi-GPU training setup (DeepSpeed)

LoRA fine-tuning (PEFT library)

Quantization (GGUF, AWQ, GPTQ)

Deploy LLM API (vLLM / llama.cpp)

Pre-train 125M GPT on custom corpus

Set up experiment tracking (W&B)