Artwork

Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

1:06:58
 
Share
 

Manage episode 485223080 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Nicolay here,

Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.

Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.

His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.

The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.

Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."

Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.

In the podcast, we also cover:

  • Why fine-tuning is almost always the wrong choice
  • The "just-in-time" learning approach for staying sane in AI
  • Building writing assistants that actually preserve your voice
  • Why robots, not chatbots, are the real endgame

💡 Core Concepts

  • Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop.
    • React: Agents that Reason, Act, and Observe in a loop
    • Reflection: Agents that review and improve their own outputs
  • Fine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."
  • RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."

📶 Connect with Paul:

📶 Connect with Nicolay:

⏱️ Important Moments

  • From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.
  • AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.
  • The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."
  • Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.
  • End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."
  • The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.
  • Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.
  • Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.
  • Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.

🛠️ Tools & Tech Mentioned

  • LangGraph (for prototyping only)
  • Temporal (durable execution)
  • DBOS (simpler orchestration)
  • Restate (developer-friendly orchestration)
  • Ray (distributed compute)
  • UV (Python packaging)
  • Prefect (workflow orchestration)

📚 Recommended Resources

🔮 What's Next

Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.

💬 Join The Conversation

Follow How AI Is Built on YouTube, Bluesky, or Spotify.

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.

♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

  continue reading

57 episodes

Artwork
iconShare
 
Manage episode 485223080 series 3585930
Content provided by Nicolay Gerold. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Nicolay Gerold or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://ppacc.player.fm/legal.

Nicolay here,

Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.

Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.

His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.

The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.

Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."

Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.

In the podcast, we also cover:

  • Why fine-tuning is almost always the wrong choice
  • The "just-in-time" learning approach for staying sane in AI
  • Building writing assistants that actually preserve your voice
  • Why robots, not chatbots, are the real endgame

💡 Core Concepts

  • Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop.
    • React: Agents that Reason, Act, and Observe in a loop
    • Reflection: Agents that review and improve their own outputs
  • Fine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."
  • RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."

📶 Connect with Paul:

📶 Connect with Nicolay:

⏱️ Important Moments

  • From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.
  • AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.
  • The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."
  • Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.
  • End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."
  • The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.
  • Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.
  • Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.
  • Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.

🛠️ Tools & Tech Mentioned

  • LangGraph (for prototyping only)
  • Temporal (durable execution)
  • DBOS (simpler orchestration)
  • Restate (developer-friendly orchestration)
  • Ray (distributed compute)
  • UV (Python packaging)
  • Prefect (workflow orchestration)

📚 Recommended Resources

🔮 What's Next

Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.

💬 Join The Conversation

Follow How AI Is Built on YouTube, Bluesky, or Spotify.

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.

♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️

  continue reading

57 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide

Listen to this show while you explore
Play