Source: https://www.databricks.com/blog/mpt-7b

Open source transformer

Perks as of Our MPT model series is:

  • Licensed for commercial use (unlike LLaMA).
  • Trained on a large amount of data (1T tokens like LLaMA vs. 300B for Pythia, 300B for OpenLLaMA, and 800B for StableLM).
  • Prepared to handle extremely long inputs thanks to ALiBi (we trained on up to 65k inputs and can handle up to 84k vs. 2k-4k for other open source models).
  • Optimized for fast training and inference (via FlashAttention and FasterTransformer)
  • Equipped with highly efficient open-source training code.