DIY LLMs are moving from hobbyist screens to real research chatter. The spark: hands-on tutorials from Andrej Karpathy showing how to build a GPT-like model from scratch. News that Discrete Distribution Networks (DDN) earned an ICLR2025 slot signals the shift from toy models to serious architectures. [1][2]
DIY LLM Tutorials - Andrej Karpathy's video “Let's Build GPT: from scratch, in code, spelled out” lays out the practical path [1]. - NanoGPT is a project by Andrej Karpathy that’s accessible and easy to run [1]. - Karpathy’s other videos move from approachable explanations to deep technical dives, outlining what LLMs really do [1].
DDN: A New Architecture on the Rise - Discrete Distribution Networks (DDN) is a novel generative model with simple, elegant principles; the paper has been accepted to ICLR2025 [2]. - It generates multiple outputs in a single forward pass, and these outputs together form a discrete distribution. It also emphasizes Zero-Shot Conditional Generation, a one-dimensional discrete latent representation organized in a tree, and full end-to-end differentiability [2]. - The approach can be combined with GPT-style systems and even explored as DDN LLMs, including ideas like minimizing tokenizers and using speculative sampling [2]. - In comparisons, DDN sits alongside diffusion, GANs, VAE, and autoregressive models, offering a distinctive, hierarchical take [2]. - ICLR reviewers called the method novel and elegant, hinting at new directions for generative modeling in LLMs [2].
Closing thought: the DIY LLM era is maturing—from hands-on builds to architecturally innovative paths that could reshape how we think about chat, code, and compression.
References
Ask HN: Build Your Own LLM?
Asks for tutorials to build toy LLMs from scratch; cites Karpathy videos, NanoGPT, and related resources to learn concepts faster
View sourceShow HN: I invented a new generative model and got accepted to ICLR
Thread discusses Discrete Distribution Networks (DDN), ICLR acceptance, GPT/LLMs integration, zero-shot generation, and comparisons to diffusion, GAN, VQ-VAE.
View source