Mixture of Experts (MoE)

大纲

Concepts and early works

  1. HydraNet
  2. Multi-gate Mixture-of-Experts
  3. Sparsely gated MoE layer

Classic Transformer-based methods

  1. Switch Transformer
  2. GShard
  3. Vision MoE

Recent works

  1. LI-MoE
  2. DSelect-k
  3. BASE
  4. Hash Layers
  5. Sparse MLP
  6. Swin-MoE
  7. Uni-Perceiver-MoE

Talk 视频回放

https://meeting.tencent.com/v2/cloud-record/share?id=36844c5c-30f5-474a-a62c-fa6f817e9a55&from=3

Slide

链接: https://pan.baidu.com/s/1WzvNxFre7mjpd0d1NhiThg?pwd=c9kh 提取码: c9kh