GPT-Style Model from Scratch

This project is a complete implementation of a GPT-style decoder-only transformer built from scratch in PyTorch. Rather than using high-level libraries, every component is hand-crafted to provide a deep understanding of how modern language models work.

gpt_from_zero

For the complete source code, educational notebooks, and training scripts, check out the project on GitHub → GitHub repository

The project includes a complete training pipeline: you can train the model with just a data.txt file containing raw text, and optionally fine-tune it using RLHF (Reinforcement Learning from Human Feedback) with prompt-response pairs and ratings.

Key components of the transformer architecture (built from scratch):

What makes this project unique:

The training process is elegantly simple: given a sequence of words, predict the next one. Through millions of iterations, the model learns grammar, facts, context, and even writing style—all without explicit human labels. This is the same fundamental approach used by large language models like GPT-3 and GPT-4.

This project serves as a comprehensive educational resource for anyone wanting to understand transformers from the ground up, providing both theoretical knowledge through notebooks and practical implementation through clean, documented code.

Front page