Artwork

Content provided by Software Engineering. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

๐Ÿค– DeepSeek-V3: A 671B Parameter Mixture-of-Experts Language Model

30:29
 
Share
 

Manage episode 457755280 series 3112408
Content provided by Software Engineering. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

A 671B parameter Mixture-of-Experts language model. It highlights the model's architecture, including its innovative load balancing and multi-token prediction strategies, and its efficient training process using FP8 precision. Benchmark results demonstrate DeepSeek-V3's strong performance compared to other open-source and some closed-source models, particularly in math and code tasks. The document also provides instructions for running DeepSeek-V3 locally using various frameworks and hardware, including NVIDIA and AMD GPUs and Huawei Ascend NPUs. Finally, licensing and contact information are included.

  continue reading

360 episodes

Artwork
iconShare
 
Manage episode 457755280 series 3112408
Content provided by Software Engineering. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Software Engineering or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

A 671B parameter Mixture-of-Experts language model. It highlights the model's architecture, including its innovative load balancing and multi-token prediction strategies, and its efficient training process using FP8 precision. Benchmark results demonstrate DeepSeek-V3's strong performance compared to other open-source and some closed-source models, particularly in math and code tasks. The document also provides instructions for running DeepSeek-V3 locally using various frameworks and hardware, including NVIDIA and AMD GPUs and Huawei Ascend NPUs. Finally, licensing and contact information are included.

  continue reading

360 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play