Artwork

Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

TornadoVM: The Need for GPU Speed

59:41
 
Share
 

Manage episode 492939989 series 2469611
Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
An airhacks.fm conversation with undefined (@undefined) about:
starting with Java 8, first computer experiences with Pentium 2, doom 2 and Microsoft Paint, university introduction to Object-oriented programming using Objects First and bluej IDE, Monte Carlo simulations for financial portfolio optimization in Java, porting Java applications to OpenCL for GPU acceleration achieving 20x speedup, working at Huawei on GPU hardware, writing unit tests as introduction to TornadoVM, working on FPGA integration and Graal compiler optimizations, experience at OctoAI startup doing AI compiler optimizations for TensorFlow and PyTorch models, understanding model formats evolution from ONNX to GGUF, standardization of LLM inference through Llama models, implementing GPU-accelerated Llama 3 inference in pure Java using TornadoVM, achieving 3-6x speedup over CPU implementations, supporting multiple models including Mistral and working on qwen 3 and deepseek, differences between models mainly in normalization layers, GGUF becoming quasi-standard for LLM model distribution, TornadoVM's Consume and Persist API for optimizing GPU data transfers, challenges with OpenCL deprecation on macOS and plans for Metal backend, importance of developer experience and avoiding python dependencies for Java projects, runtime and compiler optimizations for GPU inference, kernel fusion techniques, upcoming integration with langchain4j, potential of Java ecosystem with Graal VM and Project Panama FFM for high-performance inference, advantages of Java's multi-threading capabilities for inference workloads

undefined on twitter: @undefined

  continue reading

354 episodes

Artwork
iconShare
 
Manage episode 492939989 series 2469611
Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
An airhacks.fm conversation with undefined (@undefined) about:
starting with Java 8, first computer experiences with Pentium 2, doom 2 and Microsoft Paint, university introduction to Object-oriented programming using Objects First and bluej IDE, Monte Carlo simulations for financial portfolio optimization in Java, porting Java applications to OpenCL for GPU acceleration achieving 20x speedup, working at Huawei on GPU hardware, writing unit tests as introduction to TornadoVM, working on FPGA integration and Graal compiler optimizations, experience at OctoAI startup doing AI compiler optimizations for TensorFlow and PyTorch models, understanding model formats evolution from ONNX to GGUF, standardization of LLM inference through Llama models, implementing GPU-accelerated Llama 3 inference in pure Java using TornadoVM, achieving 3-6x speedup over CPU implementations, supporting multiple models including Mistral and working on qwen 3 and deepseek, differences between models mainly in normalization layers, GGUF becoming quasi-standard for LLM model distribution, TornadoVM's Consume and Persist API for optimizing GPU data transfers, challenges with OpenCL deprecation on macOS and plans for Metal backend, importance of developer experience and avoiding python dependencies for Java projects, runtime and compiler optimizations for GPU inference, kernel fusion techniques, upcoming integration with langchain4j, potential of Java ecosystem with Graal VM and Project Panama FFM for high-performance inference, advantages of Java's multi-threading capabilities for inference workloads

undefined on twitter: @undefined

  continue reading

354 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play