Go offline with the Player FM app!
Optimization Techniques for Powerful yet Tiny Machine Learning Models
Manage episode 481681456 series 3574631
Can machine learning models be both powerful and tiny? Join us in this episode of TinyML Talks, where we uncover groundbreaking techniques for making machine learning more efficient through high-level synthesis. We sit down with Russell Clayne, Technical Director at Siemens EDA, who guides us through the intricate process of pruning convolutional and deep neural networks. Discover how post-training quantization and quantization-aware training can trim down models without sacrificing performance, making them perfect for custom hardware accelerators like FPGAs and ASICs.
From there, we dive into a practical case study involving an MNIST-based network. Russell demonstrates how sensitivity analysis, network pruning, and quantization can significantly reduce neural network size while maintaining accuracy. Learn why fixed-point arithmetic is superior to floating-point in custom hardware, and how leading research from MIT and industry advancements are revolutionizing automated network optimization and model compression. You'll gain insights into how these techniques are not just theoretical but are being applied in real-world scenarios to save area and energy consumption.
Finally, explore the collaborative efforts between Siemens, Columbia University, and Global Foundries in a wake word analysis project. Russell explains how transitioning to hardware accelerators via high-level synthesis (HLS) tools can yield substantial performance improvements and energy savings. Understand the practicalities of using algorithmic C data types and Python-to-RTL tools to optimize ML workflows. Whether it's quantization-aware training, data movement optimization, or the fine details of using HLS libraries, this episode is packed with actionable insights for streamlining your machine learning models.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
Chapters
1. TinyML Talks (00:00:00)
2. Network Pruning and Quantization (00:10:51)
3. Optimizing Quantized Neural Networks (00:21:51)
4. High-Level Synthesis for ML Acceleration (00:37:27)
5. Hardware Design and Optimization Techniques (00:47:06)
40 episodes
Manage episode 481681456 series 3574631
Can machine learning models be both powerful and tiny? Join us in this episode of TinyML Talks, where we uncover groundbreaking techniques for making machine learning more efficient through high-level synthesis. We sit down with Russell Clayne, Technical Director at Siemens EDA, who guides us through the intricate process of pruning convolutional and deep neural networks. Discover how post-training quantization and quantization-aware training can trim down models without sacrificing performance, making them perfect for custom hardware accelerators like FPGAs and ASICs.
From there, we dive into a practical case study involving an MNIST-based network. Russell demonstrates how sensitivity analysis, network pruning, and quantization can significantly reduce neural network size while maintaining accuracy. Learn why fixed-point arithmetic is superior to floating-point in custom hardware, and how leading research from MIT and industry advancements are revolutionizing automated network optimization and model compression. You'll gain insights into how these techniques are not just theoretical but are being applied in real-world scenarios to save area and energy consumption.
Finally, explore the collaborative efforts between Siemens, Columbia University, and Global Foundries in a wake word analysis project. Russell explains how transitioning to hardware accelerators via high-level synthesis (HLS) tools can yield substantial performance improvements and energy savings. Understand the practicalities of using algorithmic C data types and Python-to-RTL tools to optimize ML workflows. Whether it's quantization-aware training, data movement optimization, or the fine details of using HLS libraries, this episode is packed with actionable insights for streamlining your machine learning models.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
Chapters
1. TinyML Talks (00:00:00)
2. Network Pruning and Quantization (00:10:51)
3. Optimizing Quantized Neural Networks (00:21:51)
4. High-Level Synthesis for ML Acceleration (00:37:27)
5. Hardware Design and Optimization Techniques (00:47:06)
40 episodes
All episodes
×
1 Investing In The Edge: A VC Panel from AUSTIN 2025 42:43

1 The Pipeline Is Stalling: America's Declining Innovation Edge 13:41

1 Beyond the Edge - Analyst Insights at AUSTIN 2025 59:38

1 Dan Cooley of Silicon Labs - The 30 Billion Dollar Question: Can AI Truly Live on the Edge? 23:59

1 The Future of Domain-Specific AI Search Lies in Targeted Agent Systems 1:00:46


1 Smart Chips, Big Dreams: How NXP is Changing the AI Game 58:31

1 Counting What Counts: How Edge AI is Saving Japan's Seafood Industry 59:18

1 Career EDGE: Navigating the Job Market in the Age of Edge AI 1:28:56

1 Revolutionizing Wi-Fi Sensing with Machine Learning and Advanced Radio Frequency Techniques 13:24

1 From Ideas to Reality with Edge AI Expertise from embedUR and ModelNova 59:56

1 Reimagining Edge AI using Breakthrough Tools with Ali Ors of NXP 16:00

1 Revolutionizing AI Deployment with Open-Source Edge Technologies with Odin Chen of Arm 14:49

1 Revolutionizing Resource-Constrained Devices with Cutting-Edge HIMAX Edge AI Processors 16:41

1 Renesas and Reality AI: Shaping the Future of Edge Computing 13:15
Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.