Aaron's Digital Garden 🪴

Recent Writing

Computer Arch Crash Course
Sep 26, 2025
The Missing Readme - consolidated by new grad
Jun 16, 2025
Caching Crash Course
Jun 08, 2025
OS Crash Course
Jun 08, 2025

Recent Notes

Dist OS
Oct 23, 2025
HW Disaggregation
Oct 23, 2025

❯

Infra LLMs + AI Agents

❯

Mosaic Pretrained Transformer

Mosaic Pretrained Transformer

Sep 30, 20251 min read

machine-learning
artificial-intelligence
open-source-transformer

Source: https://www.databricks.com/blog/mpt-7b

Open source transformer

Perks as of Our MPT model series is:

Licensed for commercial use (unlike LLaMA).
Trained on a large amount of data (1T tokens like LLaMA vs. 300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM).
Prepared to handle extremely long inputs thanks to ALiBi (we trained on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
Optimized for fast training and inference (via FlashAttention and FasterTransformer)
Equipped with highly efficient open-source training code.

Recent Writing

Computer Arch Crash Course
Sep 26, 2025
The Missing Readme - consolidated by new grad
Jun 16, 2025
Caching Crash Course
Jun 08, 2025
OS Crash Course
Jun 08, 2025

Recent Notes

Dist OS
Oct 23, 2025
HW Disaggregation
Oct 23, 2025

Graph View

Backlinks

Notes on LLM Inference Performance Engineering Best Practices by Megha Agarwal, Asfandyar Qureshi, Nikhil Sardana, Linden Li, Julian Quevedo and Daya Khudia

Created with Quartz v4.5.2 © 2025

GitHub
LinkedIn