nanochat - Inward App

nanochat

Build your won ChatGPT for $100 on a single GPU

Website github.com

What it is

nanochat is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase. Run tokenization, pretraining, finetuning, evaluation, inference, and web UI on a single 8XH100 node.

Intent

I need it when

Build and deploy a conversational AI model trained on custom data

After training, nanochat provides a built-in ChatGPT-like web UI (chat_web script) for interactive inference. Users can train a model, then immediately chat with it via a familiar interface without additional deployment infrastructure.

Understand and modify LLM training code with minimal abstraction layers

nanochat emphasizes minimal, vanilla PyTorch code designed to be hackable and understandable. The codebase is compact and covers all major LLM stages, making it ideal for researchers and developers who want to learn or experiment with training mechanics without navigating complex frameworks.

Experiment with LLM training stages including tokenization, pretraining, finetuning, and inference

nanochat covers the complete LLM pipeline in one codebase: tokenization, pretraining, finetuning, evaluation, inference, and includes a ChatGPT-like web UI for interactive testing. Researchers can quickly iterate on model changes using small-scale runs (e.g., 12-layer GPT-1 in ~5 minutes).

Train a custom language model on a budget with minimal code complexity

nanochat provides a minimal, hackable harness that trains GPT-2-capability models for ~$48 on an 8XH100 node (~2 hours), down from $43,000 in 2019. Single-dial complexity control via --depth parameter automatically optimizes all hyperparameters, eliminating manual tuning.

Benchmark and optimize LLM training speed and efficiency on modern hardware

nanochat maintains a 'GPT-2 speedrun' leaderboard tracking wall-clock training time and DCLM CORE scores. The framework is designed for single-GPU-node efficiency with explicit precision control (bfloat16, float32, float16) and detailed VRAM/throughput monitoring via wandb integration.

Drop

Not a fit when

User lacks access to GPU hardware (requires single GPU node minimum, preferably 8XH100 or 8XA100 for optimal performance)
User needs a production-ready, fully-supported commercial LLM training platform with enterprise SLAs and dedicated support
User wants to train very large models beyond single-node capacity without significant code modification and distributed training expertise
User requires a graphical user interface for model training and configuration without command-line proficiency
User needs pre-trained models ready for immediate deployment rather than training from scratch

Commercials

Pricing

Free and open source