Source: https://www.databricks.com/blog/mpt-7b
Open source transformer
Perks as of Our MPT model series is:
- Licensed for commercial use (unlike LLaMA).
- Trained on a large amount of data (1T tokens like LLaMA vs. 300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM).
- Prepared to handle extremely long inputs thanks to ALiBi (we trained on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
- Optimized for fast training and inference (via FlashAttention and FasterTransformer)
- Equipped with highly efficient open-source training code.