Blog
Recent Posts

TurboQuant - Compressing KV Caches to 3 Bits

LoRA & Parameter-Efficient Fine-Tuning - Adapting Giants on a Budget

Mixture of Experts - Scaling Without the Compute Tax

Model Quantization - Squeezing Giants into Laptops

RLHF - Teaching Language Models to Follow Human Intent

Speculative Decoding - Making LLMs Think Faster

Flash Attention - Breaking the Memory Wall

KV Caching - Making Transformers Actually Fast

Attention Is All You Need - A Visual Story

Language Modeling & Recurrent Networks
710
530
350
Craft
0
-150
Craft

Regularization & Stability - Training Networks That Generalize

Optimizers & Training - Making Neural Networks Learn Faster

Deep Learning from First Principles

Blog covers powered by GPT-4o

PG 101 - Building Postgres Extensions
