Go offline with the Player FM app!
#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster
Manage episode 485223080 series 3585930
Nicolay here,
Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.
Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.
His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.
The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.
Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."
Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.
In the podcast, we also cover:
- Why fine-tuning is almost always the wrong choice
- The "just-in-time" learning approach for staying sane in AI
- Building writing assistants that actually preserve your voice
- Why robots, not chatbots, are the real endgame
💡 Core Concepts
- Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop.
- React: Agents that Reason, Act, and Observe in a loop
- Reflection: Agents that review and improve their own outputs
- Fine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."
- RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."
📶 Connect with Paul:
📶 Connect with Nicolay:
- X / Twitter
- Bluesky
- Website
- My Agency Aisbach (for ai implementations / strategy)
⏱️ Important Moments
- From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.
- AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.
- The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."
- Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.
- End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."
- The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.
- Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.
- Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.
- Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.
🛠️ Tools & Tech Mentioned
- LangGraph (for prototyping only)
- Temporal (durable execution)
- DBOS (simpler orchestration)
- Restate (developer-friendly orchestration)
- Ray (distributed compute)
- UV (Python packaging)
- Prefect (workflow orchestration)
📚 Recommended Resources
- The Economist Style Guide (for writing)
- Brandon Sanderson's Writing Approach (worldbuilding first)
- LangGraph Academy (free, covers agent patterns)
- Ray Documentation (Paul's next deep dive)
🔮 What's Next
Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
57 episodes
Manage episode 485223080 series 3585930
Nicolay here,
Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.
Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.
His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.
The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.
Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."
Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.
In the podcast, we also cover:
- Why fine-tuning is almost always the wrong choice
- The "just-in-time" learning approach for staying sane in AI
- Building writing assistants that actually preserve your voice
- Why robots, not chatbots, are the real endgame
💡 Core Concepts
- Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop.
- React: Agents that Reason, Act, and Observe in a loop
- Reflection: Agents that review and improve their own outputs
- Fine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take: "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week."
- RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach: "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article."
📶 Connect with Paul:
📶 Connect with Nicolay:
- X / Twitter
- Bluesky
- Website
- My Agency Aisbach (for ai implementations / strategy)
⏱️ Important Moments
- From CUDA to LLMs: [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications.
- AI Content Is Natural Evolution: [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains.
- The Framework Trap: [36:41] "I see them as no code or low code tools... not good frameworks to build on top of."
- Fine-Tuning Complexity Bomb: [27:41] How fine-tuning turns 1-day features into 1-week experiments.
- End-to-End First: [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works."
- The Orchestration Solution: [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators.
- Hype Filtering System: [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant.
- Just-in-Time vs Just-in-Case: [57:50] The crucial difference between learning for potential needs vs immediate application.
- Robot Vision: [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead.
🛠️ Tools & Tech Mentioned
- LangGraph (for prototyping only)
- Temporal (durable execution)
- DBOS (simpler orchestration)
- Restate (developer-friendly orchestration)
- Ray (distributed compute)
- UV (Python packaging)
- Prefect (workflow orchestration)
📚 Recommended Resources
- The Economist Style Guide (for writing)
- Brandon Sanderson's Writing Approach (worldbuilding first)
- LangGraph Academy (free, covers agent patterns)
- Ray Documentation (Paul's next deep dive)
🔮 What's Next
Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
57 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.