Nano Chat: Complete Small Language Model Pipeline
Nano Chat provides a complete small language model pipeline from tokenization to deployment.
Nano Chat: Building a Small Language Model End-to-End - Tokenizer to Deployment
Educational open-source project by Andrej Karpathy (former OpenAI co-founder, Tesla AI Director) demonstrating how to build a ChatGPT-class model from scratch. Covers the complete pipeline: custom BPE tokenizer training, transformer pretraining, alignment, and chat UI deployment.
Pipeline: Stage 1 - BPE tokenizer from scratch (corpus collection, byte conversion, frequent pair merging, vocabulary construction). Stage 2 - Standard transformer decoder pretrained on clean web text with AdamW, LR scheduling, gradient accumulation. Stage 3 - Instruction tuning on conversational data for dialogue capability. Stage 4 - Web UI with local inference, quantization, KV cache management.
The 561M version trains for ~$100 on 8xH100. Inference runs on consumer GPUs or even CPUs.
Educational value: Demystifies LLMs from black box to white box for researchers, engineers, students, and enterprises.
Reflects the 2026 SLM renaissance: many tasks need only 0.5-2B parameters with benefits of lower latency, edge device compatibility, data privacy, and fast iteration.
Nano Chat is a blueprint, not a product - proving conversational AI requires deep understanding of fundamentals and sound engineering practice, not mysterious black magic or astronomical budgets.