Foundations

LLM Fundamentals

Large language models (LLMs) generate text through next-token prediction, using transformer architectures with self-attention mechanisms to process and produce sequences of language. Understanding how scale, training data, and architecture choices affect model capabilities is essential for building effective agents, because these fundamentals explain why models can follow instructions, use tools, and reason through complex problems. The relationship between pretraining data, fine-tuning, and reinforcement learning from human feedback (RLHF) determines a model's behavior and limitations, which directly shapes how agents perform in production.

subtopics

Pre-Training and Fine-Tuning

Transformer Architecture

connected to

Model Selection Reasoning Models

resources

Attention Is All You Needarxiv.orgThe original transformer paper from Google that introduced the architecture powering all modern LLMs (arxiv.org)What Are Large Language Models?developers.google.comGoogle's accessible introduction to LLM concepts and terminology (developers.google.com)Building LLMs from Scratchmanning.comSebastian Raschka's hands-on book for understanding LLM internals by building one (manning.com)The Illustrated Transformerjalammar.github.ioJay Alammar's visual walkthrough of how transformers process information (jalammar.github.io)Andrej Karpathy - Intro to Large Language Modelsyoutube.comOne-hour overview of LLM fundamentals from one of the field's best educators (youtube.com)

view in track