Organization page

Microsoft

A company-specific timeline showing the most important milestones for Microsoft.

  • 4milestones
  • 2015-12-10-2024-05-13range

Major

Phi-3 Technical Report: A Highly Capable Language Model

Microsoft released Phi-3, showing that smaller models (3.8B parameters) can achieve remarkable performance rivaling much larger models. By curating high-quality textbook quality training data, Phi-3-mini matched Llama 2 7B and approached Mixtral 8x7B on benchmarks. This demonstrated that data quality can compensate for model size, challenging the bigger is always better paradigm.

  • 3.8B params matches Llama 2 7B
  • Textbook-quality training data
  • Small but highly capable
  • Challenges scaling laws assumptions
  • Mobile-friendly size
research-paperphismall-modelsmicrosoft

Sources

Major

Microsoft Bing Chat with GPT-4

Microsoft announced Bing Chat, integrating OpenAI's GPT-4 into Bing search. This marked the beginning of AI-powered search wars and demonstrated GPT-4's capabilities before its official announcement, shaking up the search industry dominated by Google.

  • First public use of GPT-4
  • Integrated into Bing search
  • Web browsing capabilities
  • Real-time information access
  • Shook search industry
microsoftproduct-launchgptllmsearch

Sources

Major

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Microsoft Research Asia introduced Swin Transformer, a hierarchical vision transformer that uses shifted windows to compute self-attention. It achieved state-of-the-art performance on image classification, object detection, and semantic segmentation. Swin Transformer became a foundational architecture for computer vision, winning ICCV 2021 Best Paper.

  • Hierarchical vision architecture
  • Shifted window self-attention
  • Linear computational complexity
  • State-of-the-art on ImageNet
  • ICCV 2021 Best Paper
research-papervisiontransformerswin

Sources

Major

Deep Residual Learning (ResNet)

Microsoft Research introduced ResNet, a revolutionary architecture using skip connections to train networks with 152+ layers. This solved the vanishing gradient problem and won ImageNet 2015 with 3.57% error, surpassing human performance for the first time.

  • 152 layers (vs. 8 in AlexNet)
  • Skip connections (residual learning)
  • Solved vanishing gradient problem
  • First to surpass human-level accuracy
  • Architecture widely adopted
microsoftresearch-papervisiondeep-learningcnn

Sources