Microsoft released Phi-3, showing that smaller models (3.8B parameters) can achieve remarkable performance rivaling much larger models. By curating high-quality textbook quality training data, Phi-3-mini matched Llama 2 7B and approached Mixtral 8x7B on benchmarks. This demonstrated that data quality can compensate for model size, challenging the bigger is always better paradigm.
- 3.8B params matches Llama 2 7B
- Textbook-quality training data
- Small but highly capable
- Challenges scaling laws assumptions
- Mobile-friendly size
research-paperphismall-modelsmicrosoft
Microsoft announced Bing Chat, integrating OpenAI's GPT-4 into Bing search. This marked the beginning of AI-powered search wars and demonstrated GPT-4's capabilities before its official announcement, shaking up the search industry dominated by Google.
- First public use of GPT-4
- Integrated into Bing search
- Web browsing capabilities
- Real-time information access
- Shook search industry
microsoftproduct-launchgptllmsearch
Microsoft Research Asia introduced Swin Transformer, a hierarchical vision transformer that uses shifted windows to compute self-attention. It achieved state-of-the-art performance on image classification, object detection, and semantic segmentation. Swin Transformer became a foundational architecture for computer vision, winning ICCV 2021 Best Paper.
- Hierarchical vision architecture
- Shifted window self-attention
- Linear computational complexity
- State-of-the-art on ImageNet
- ICCV 2021 Best Paper
research-papervisiontransformerswin
Microsoft Research introduced ResNet, a revolutionary architecture using skip connections to train networks with 152+ layers. This solved the vanishing gradient problem and won ImageNet 2015 with 3.57% error, surpassing human performance for the first time.
- 152 layers (vs. 8 in AlexNet)
- Skip connections (residual learning)
- Solved vanishing gradient problem
- First to surpass human-level accuracy
- Architecture widely adopted
microsoftresearch-papervisiondeep-learningcnn