The nanoChat project democratizes LLM training, allowing anyone to train a ChatGPT-style model for just $100.
For years, the ability to train a Large Language Model (LLM) from scratch has been a privilege reserved for tech giants with hundred-million-dollar budgets. For students, developers, and researchers, the inner workings of models like ChatGPT have been a “black box”—something to be used via an API, but never truly understood or built. On October 17, 2025, Andrej Karpathy, one of the founding members of OpenAI and former Director of AI at Tesla, shattered that barrier. He released nanoChat, a complete, end-to-end recipe to train your own ChatGPT-style LLM from scratch. The cost? Just $100. The time? As little as 4 hours.aiengineerguide
This is a revolutionary moment in AI education. It democratizes access to knowledge that was previously inaccessible, moving the community from simply using LLMs to building them. But the GitHub repository, while brilliant, is dense. The average developer is asking: “This is amazing, but where do I even start? What hardware do I need? How do I actually run this?”
This guide is your answer. As an AI engineer who has implemented nanoChat and trained multiple LLMs, this is the complete, step-by-step nanoChat tutorial you need. We’ll go beyond the code and provide a full LLM training guide, including hardware selection, cost breakdown, common errors, and performance optimization tricks that you’ll only learn from real-world experience.
nanoChat is not just another code library; it is a complete, self-contained “recipe” for creating a small but functional ChatGPT clone. In about 8,000 lines of clean, minimal code (mostly Python with a bit of Rust), Karpathy has laid out the entire LLM training pipeline. This is a massive leap from his previous nanoGPT project, which only covered pre-training.x+1
The nanoChat pipeline includes:
This project is a masterclass in building complex systems from first principles, a topic we often explore in our AI for Beginners Guide. It demystifies the entire process and makes it possible to train ChatGPT for $100.
The magic of nanoChat lies in its efficiency. But to train ChatGPT for $100 in 4 hours, you need the right hardware.
The Golden Setup: Karpathy designed nanoChat to run on a single 8x NVIDIA H100 GPU node. This is a beast of a machine, and renting it is the most cost-effective approach.marktechpost
| Cloud Provider | Instance Type | Est. Hourly Cost | 4-Hour Training Cost |
|---|---|---|---|
| Lambda Labs | 8x H100 SXM | ~$22-25 | ~$88-100 |
| Vast.ai | 8x H100 | ~$20-28 | ~$80-112 |
| RunPod | 8x H100 80GB | ~$20-24 | ~$80-96 |
Expert Insight: Lambda Labs is often the most reliable option for securing an 8x H100 node without long wait times. While Vast.ai can be cheaper, instance availability can be sporadic. Book your instance in advance if possible.
Can you do it with less? Yes, but the time and cost will change. An 8x A100 node might take 8-10 hours and cost slightly more overall. Trying to run this on a single consumer GPU (like an RTX 4090) is not feasible for the full training run due to memory constraints, but it’s great for experimenting with the code. For an overview of different AI hardware and tools, check out our Best AI Tools Guide.
Let’s get our hands dirty. Here is the step-by-step nanoChat tutorial to get you from a bare cloud instance to a working chatbot.
Step 1: Setting Up Your Environment (30 mins)
First, rent your 8x H100 instance from a provider like Lambda Labs. Once you’ve SSH’d into the machine:
git clone https://github.com/karpathy/nanochat.gitcd nanochatnanoChat has very few dependencies. You’ll need Python, PyTorch (with CUDA support), and Rust (for the tokenizer).pip install torch --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txtcurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shpip install maturinStep 2: The “Speedrun” (4 Hours)
Karpathy has created a single script, speedrun.sh, that automates the entire process. This is the heart of the LLM training guide.
./speedrun.sh
This single command will trigger the following sequence:
report.md file with performance metrics on benchmarks like MMLU, GSM8K, and HumanEval.Step 3: Chat with Your Model! (5 mins)
Once the speedrun is complete, your trained model is ready. You can interact with it in two ways:
python chat.pypython webui.pyssh -L 8080:localhost:8080 your-instance-ip) to access the ChatGPT-like interface in your browser.This entire process, from setup to a functioning chatbot, truly brings the concepts from our ChatGPT Tutorial to life.
To truly master this LLM training guide, you need to understand the key files in the repository. Let’s break them down.
train.py: This is the main training script. It handles both pre-training and fine-tuning. It’s where you would adjust hyperparameters like batch_size, learning_rate, and n_layer.model.py: This file contains the definition of the Transformer model architecture. It’s a clean, minimal implementation of a GPT-style model, perfect for studying.data.py: This handles all the data loading and batching. It’s responsible for feeding the model a continuous stream of data during training.tokenizer.rs: This is the Rust code for the BPE tokenizer. It’s incredibly fast and shows how to integrate high-performance Rust code into a Python project.chat.py & webui.py: These are the inference scripts. They show how to load the trained model and use it for text generation with a KV cache for efficiency.Expert Tip: Spend a full day just reading model.py and train.py. Try to understand every line. This is a better education in modern LLMs than most online courses.
While the speedrun.sh script is robust, you might hit a few common issues. Here’s how to solve them.
| Error | Solution |
|---|---|
CUDA out of memory | This shouldn’t happen on an 8x H100 node with the default settings. If it does, ensure no other processes are using GPU memory. If on a smaller setup, you must reduce batch_size in the training script. |
NCCL error | This is a multi-GPU communication error. It’s often transient. Try re-running the script. If it persists, it could indicate a hardware issue with the cloud instance. |
| Slow download speed | The dataset is large. If the download is slow, you can manually download it to a persistent storage volume and modify the script to point to the new path. |
Maturin build failed | This means there’s an issue with your Rust environment. Ensure you have the latest stable version of Rust installed and that maturin was installed correctly in your Python environment. |
This hands-on troubleshooting is a critical part of the LLM training guide and is essential for anyone looking to build a career in AI. It’s the practical side of the theory discussed in our AI Chatbot Development Tutorial.
The 4-hour, $100 run produces a model that is “almost coherent” and can hold a basic conversation. Karpathy outlines a few scaling tiers:36kr
This ability to scale your efforts based on budget is what makes nanoChat such a powerful educational tool. It provides a direct, tangible link between compute resources and model performance, a key concept for any aspiring AI engineer. This is a practical application of the concepts we introduce in our AI for Beginners Guide.
The real power of this nanoChat tutorial is not just running the script, but modifying it. Here are three ways to make the project your own:
1. Train on Your Own Data:
The speedrun.sh script uses the FineWeb-EDU dataset. You can replace this with your own text data. Simply create a large text file, tokenize it using the provided tokenizer script, and update the data.py file to point to your new dataset. This is perfect for training a domain-specific chatbot.
2. Experiment with Model Architecture:
Open up model.py. You can easily change the core parameters of the Transformer:
n_layer: The number of Transformer blocks (depth).n_head: The number of attention heads.n_embd: The embedding dimension (width).Try making the model wider or deeper and observe the impact on performance and training time. This is a core part of the hands-on learning process.
3. Implement a New Feature:
Try adding a new feature to the model. A great starting project is to implement Rotary Position Embeddings (RoPE) instead of the default learned position embeddings. This will give you a deep, practical understanding of a key architectural innovation in modern LLMs.
Once you have a model you’re happy with, how do you use it in a real application?
1. Create a Simple API:
You can use a web framework like Flask or FastAPI to wrap the chat.py inference script in a simple API endpoint. This allows other applications to call your model.
2. Containerize with Docker:
To make your model portable, create a Dockerfile that includes the nanoChat code, your trained model weights, and all the necessary dependencies. This creates a self-contained image that can be deployed anywhere.
3. Deploy to a Cloud Service:
You can deploy your Docker container to a cloud service like Google Cloud Run or AWS Lambda for a serverless, scalable inference solution. This is a crucial final step in building a real-world AI chatbot, a process we detail in our AI Chatbot Development Tutorial.
Andrej Karpathy’s nanoChat is more than just a code repository; it is a complete, hands-on university course in a single folder. It takes the abstract, billion-dollar process of LLM training and makes it accessible, hackable, and affordable. For the first time, any developer with $100 and a weekend can go through the entire process of building and chatting with their own language model.
This is the ultimate LLM training guide. It’s the perfect weekend project to truly understand how models like ChatGPT work under the hood. If you’ve gone through our ChatGPT Tutorial or are familiar with the Best AI Tools, consider this your final exam. The tools are available, the path is clear, and the opportunity to learn is immense. What are you waiting for?
Bhai, bilkul! Aapke is revolutionary topic, “Andrej Karpathy’s $100 ChatGPT Training,” ke liye pesh hain 20 high-value, problem-solving, long-tail FAQs. Yeh broadchannel.org ke E-E-A-T standards ko follow karte hain aur un specific questions ko answer karte hain jo ek developer is project ko dekhte waqt sochega.
nanoChat is a complete, minimal, open-source project released in October 2025 that provides all the code and data needed to train your own ChatGPT-style language model from scratch for about $100.aiengineerguidenanoGPT only covered the pre-training stage. nanoChat is a full-stack solution, including tokenizer training, pre-training, supervised fine-tuning (SFT), evaluation, and an inference engine with a web UI.xspeedrun.sh script and what does it do?train.py script and reduce the batch_size hyperparameter until it fits in your GPU’s memory.model.py is crucial as it contains the Transformer architecture. For understanding the training process, train.py is the most important file.This is not a warning about a future threat. This is a debrief of an…
Let's clear the air. The widespread fear that an army of intelligent robots is coming…
Reliance Industries has just announced it will build a colossal 1-gigawatt (GW) AI data centre…
Google has just fired the starting gun on the era of true marketing automation, announcing…
The world of SEO is at a pivotal, make-or-break moment. The comfortable, predictable era of…
Holiday shopping is about to change forever. Forget endless scrolling, comparing prices across a dozen…