Mistral AI | AGI Progress Tracker

Major

Mixtral 8x22B Released

2024-04-17

Mistral AI released Mixtral 8x22B, the largest and most capable open-source MoE model to date. With 141B total parameters (39B active per token), it outperformed GPT-3.5 Turbo, Grok-1, and Llama 2 70B across most benchmarks. The model achieved strong performance on multilingual tasks, math, and coding while maintaining efficient inference through sparse activation.

141B parameters, 39B active per token
Outperformed GPT-3.5 Turbo
Strong multilingual capabilities
Efficient MoE inference
Apache 2.0 license

mistralmodel-releaseopen-sourcemixture-of-expertsmixtral

Sources

Mixtral 8x22B Release

Major

Mixtral 8x7B Released

2023-12-11

Mistral AI released Mixtral 8x7B, a sparse Mixture of Experts (MoE) model that outperformed GPT-3.5 and Llama 2 70B on most benchmarks. With only 12.9B active parameters per token from 46.7B total, it delivered exceptional efficiency. The model matched or exceeded GPT-3.5 Turbo on standard benchmarks while being faster and cheaper to run.

Sparse Mixture of Experts (MoE) architecture
Outperformed GPT-3.5 on most benchmarks
Only 12.9B active parameters per token
Apache 2.0 open-source license
Efficient inference with high performance

mistralmodel-releaseopen-sourcemixture-of-expertsmixtral

Sources

Mixtral 8x7B Release Announcement

Major

Mistral 7B: A 7-billion-parameter language model

2023-09-27

Mistral AI released Mistral 7B, an open-source model that outperformed Llama 2 13B and approached Llama 1 34B performance. It introduced grouped-query attention (GQA) and sliding window attention (SWA) for faster inference. The model became the foundation for many open-source fine-tunes and demonstrated that smaller, well-designed models can compete with larger ones.

7B params outperform 13B Llama 2
Grouped-query attention (GQA)
Sliding window attention (SWA)
Apache 2.0 open-source license
Foundation for many fine-tunes

research-papermistralopen-sourcesliding-window

Sources

Mistral 7B Paper