Mistral AI released Mixtral 8x22B, the largest and most capable open-source MoE model to date. With 141B total parameters (39B active per token), it outperformed GPT-3.5 Turbo, Grok-1, and Llama 2 70B across most benchmarks. The model achieved strong performance on multilingual tasks, math, and coding while maintaining efficient inference through sparse activation.
- 141B parameters, 39B active per token
- Outperformed GPT-3.5 Turbo
- Strong multilingual capabilities
- Efficient MoE inference
- Apache 2.0 license
mistralmodel-releaseopen-sourcemixture-of-expertsmixtral
Mistral AI released Mixtral 8x7B, a sparse Mixture of Experts (MoE) model that outperformed GPT-3.5 and Llama 2 70B on most benchmarks. With only 12.9B active parameters per token from 46.7B total, it delivered exceptional efficiency. The model matched or exceeded GPT-3.5 Turbo on standard benchmarks while being faster and cheaper to run.
- Sparse Mixture of Experts (MoE) architecture
- Outperformed GPT-3.5 on most benchmarks
- Only 12.9B active parameters per token
- Apache 2.0 open-source license
- Efficient inference with high performance
mistralmodel-releaseopen-sourcemixture-of-expertsmixtral
Mistral AI released Mistral 7B, an open-source model that outperformed Llama 2 13B and approached Llama 1 34B performance. It introduced grouped-query attention (GQA) and sliding window attention (SWA) for faster inference. The model became the foundation for many open-source fine-tunes and demonstrated that smaller, well-designed models can compete with larger ones.
- 7B params outperform 13B Llama 2
- Grouped-query attention (GQA)
- Sliding window attention (SWA)
- Apache 2.0 open-source license
- Foundation for many fine-tunes
research-papermistralopen-sourcesliding-window