Thoughtful discussions about current topics, moderated by American Banker editors.
…
continue reading
Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.
…
continue reading
Relevant, Inspirational, and Transformative. Explore Judaism with Rabbi Einhorn
…
continue reading

1
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
27:19
27:19
Play later
Play later
Lists
Like
Liked
27:19For this week's paper read, we actually dive into our own research. We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a …
…
continue reading

1
Banks struggle to keep up with threat of AI deepfakes
15:22
15:22
Play later
Play later
Lists
Like
Liked
15:22Valerie Abend, Accenture’s financial services cybersecurity lead, explains what banks get wrong about fending off AI-based threats and what they should do instead.
…
continue reading

1
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam
26:11
26:11
Play later
Play later
Lists
Like
Liked
26:11This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Humanity's Last Exam (HLE). In the session we covered Gemini 2.5's architecture, its advancements in reasoning and multimodality, and its impressive context window. We also talked about how benchmar…
…
continue reading

1
'It can eliminate things we don't like doing': Vlad Lukic on AI
18:54
18:54
Play later
Play later
Lists
Like
Liked
18:54Generative AI will remove toil from our day to day jobs, argues Lukic, who is managing director and senior partner at Boston Consulting Group.
…
continue reading
We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging into. Learn how this open standard is revolutionizing AI by enabling seamless integration between LLMs and external data sources, fundamentally transformin…
…
continue reading

1
Could industry standards prevent the next Synapse-style mess?
17:09
17:09
Play later
Play later
Lists
Like
Liked
17:09“We want to put banks in the risk management driver’s seat,” says Sima Gandhi, co-founder of the Council for Fintech Ecosystem Standards, which has worked with a group of fintechs to create risk and compliance standards banks can use to evaluate their fintech partners.
…
continue reading

1
‘It’s a tremendous priority’: Huntington CFO Wasserman on AI
22:48
22:48
Play later
Play later
Lists
Like
Liked
22:48Software code generation and knowledge management are two of the places the bank has begun using generative AI to improve efficiency.
…
continue reading

1
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
30:23
30:23
Play later
Play later
Lists
Like
Liked
30:23This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks. We break down key announcements, including: DeepSeek’s Big Launch Week: A look at FlashMLA (DeepSeek’s new approach to efficient inference) and DeepEP (their enhanced pretraining method).…
…
continue reading

1
How DeepSeek is Pushing the Boundaries of AI Development
29:54
29:54
Play later
Play later
Lists
Like
Liked
29:54This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other models. What’s next for AI (and open source)? From training strategies to real-world performance, here’s …
…
continue reading

1
How one New York bank is getting younger people to save
17:07
17:07
Play later
Play later
Lists
Like
Liked
17:07Thomas Rudzewick, CEO of Maspeth Savings Bank worries about the growing crisis of low savings among millennials and Gen Z. He believes banks like his can help reverse this trend with financial literacy and innovative savings tools.
…
continue reading

1
Multiagent Finetuning: A Conversation with Researcher Yilun Du
30:03
30:03
Play later
Play later
Lists
Like
Liked
30:03We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains." This paper introduces a multiagent finetuning framework that enhances the performance and diversity of language models by employing a society of…
…
continue reading

1
East West Bank’s CEO on how the bank coped with LA wildfires
31:04
31:04
Play later
Play later
Lists
Like
Liked
31:04The bank, which is headquartered in Pasadena, had to quickly switch to remote work for many employees and come up with relief programs for customers whose homes and businesses were destroyed by fire.
…
continue reading

1
Where Citi Ventures is placing its fintech bets in 2025
31:20
31:20
Play later
Play later
Lists
Like
Liked
31:20Arvind Purushotham, head of Citi Ventures, shares where he and his team see opportunities and how they vet tech startups.
…
continue reading

1
Training Large Language Models to Reason in Continuous Latent Space
24:58
24:58
Play later
Play later
Lists
Like
Liked
24:58LLMs have typically been restricted to reason in the "language space," where chain-of-thought (CoT) is used to solve complex reasoning problems. But a new paper argues that language space may not always be the best for reasoning. In this paper read, we cover an exciting new technique from a team at Meta called Chain of Continuous Thought—also known…
…
continue reading

1
Why banks keep failing at money laundering
29:12
29:12
Play later
Play later
Lists
Like
Liked
29:12A lack of resources is one common cause of AML penalties, says consultant Aaron Ansari.
…
continue reading

1
LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods
28:57
28:57
Play later
Play later
Lists
Like
Liked
28:57We discuss a major survey of work and research on LLM-as-Judge from the last few years. "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods" systematically examines the LLMs-as-Judge framework across five dimensions: functionality, methodology, applications, meta-evaluation, and limitations. This survey gives us a birds eye view…
…
continue reading

1
The banks that implement AI well, from titans to mavericks
16:11
16:11
Play later
Play later
Lists
Like
Liked
16:11Some banks are “punching above their weight,” according to Dan Latimore, chief research officer at The Financial Revolutionist. Here’s how they do it.
…
continue reading

1
Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies
28:47
28:47
Play later
Play later
Lists
Like
Liked
28:47LLMs have revolutionized natural language processing, showcasing remarkable versatility and capabilities. But individual LLMs often exhibit distinct strengths and weaknesses, influenced by differences in their training corpora. This diversity poses a challenge: how can we maximize the efficiency and utility of LLMs? A new paper, "Merge, Ensemble, a…
…
continue reading

1
The case for a human crime officer in every bank
20:57
20:57
Play later
Play later
Lists
Like
Liked
20:57Ian Mitchell, founder of The Noble, an organization that works with law enforcement and with banks to fight human crime trafficking, explains some of his group’s recent work and why banks need someone dedicated to human crime.
…
continue reading

1
Agent-as-a-Judge: Evaluate Agents with Agents
24:54
24:54
Play later
Play later
Lists
Like
Liked
24:54This week, we break down the “Agent-as-a-Judge” framework—a new agent evaluation paradigm that’s kind of like getting robots to grade each other’s homework. Where typical evaluation methods focus solely on outcomes or demand extensive manual work, this approach uses agent systems to evaluate agent systems, offering intermediate feedback throughout …
…
continue reading

1
AARP’s Jilenne Gunther has advice for banks on elder fraud
20:38
20:38
Play later
Play later
Lists
Like
Liked
20:38Elder financial exploitation has been a problem for banks for years, and it’s getting worse. Gunther offers practical suggestions for what banks should do when they suspect an older customer is a victim.
…
continue reading
We break down OpenAI’s realtime API. Learn how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities, potential use cases, and best p…
…
continue reading

1
Southern Bancorp’s answer to the home affordability crisis
19:43
19:43
Play later
Play later
Lists
Like
Liked
19:43The Arkansas community development financial institution has an ambitious goal of making $500 million worth of mortgages in rural, minority and low-income neighborhoods.
…
continue reading

1
Swarm: OpenAI's Experimental Approach to Multi-Agent Systems
46:46
46:46
Play later
Play later
Lists
Like
Liked
46:46As multi-agent systems grow in importance for fields ranging from customer support to autonomous decision-making, OpenAI has introduced Swarm, an experimental framework that simplifies the process of building and managing these systems. Swarm, a lightweight Python library, is designed for educational purposes, stripping away complex abstractions to…
…
continue reading
In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but under-discussed component enables the seamless and snappy interactions we expect from modern AI systems. Harrison Chu breaks down how the KV cache works, h…
…
continue reading

1
‘Job satisfaction will go up’: How generative AI is changing work
20:00
20:00
Play later
Play later
Lists
Like
Liked
20:00The technology will take on routine, dull work, says Alenka Grealish, principal analyst at Celent.
…
continue reading

1
The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs
3:31
3:31
Play later
Play later
Lists
Like
Liked
3:31In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler. This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynam…
…
continue reading

1
Google's NotebookLM and the Future of AI-Generated Audio
43:28
43:28
Play later
Play later
Lists
Like
Liked
43:28This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive into some technical underpinnings of the product, specifically the SoundStorm model used for generating high-quality audio, and how it leverages a hie…
…
continue reading

1
How banks’ use of generative AI has evolved over the past year
19:53
19:53
Play later
Play later
Lists
Like
Liked
19:53Financial institutions have dramatically increased their investment and trust in large language models since 2023. Kartik Ramakrishnan, Capgemini’s deputy CEO of Financial Services and head of banking and capital markets, shares the results of a recent report that analyzed these changes.
…
continue reading

1
Exploring OpenAI's o1-preview and o1-mini
42:02
42:02
Play later
Play later
Lists
Like
Liked
42:02OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than their other models, especially science and math questions. We take a closer look at their latest crop of o1 models, and we also highlight some research …
…
continue reading

1
Upstart’s CEO Dave Girouard explains brighter outlook for rest of 2024
22:49
22:49
Play later
Play later
Lists
Like
Liked
22:49Advances in the company’s AI-based lending models have made them better at predicting risk, which has led to growth, he says.
…
continue reading

1
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning
26:54
26:54
Play later
Play later
Lists
Like
Liked
26:54A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the results. We dive into some recent drama in the AI community as a jumping off point for a discussion about Reflection 70B. In 2023, there was a paper wri…
…
continue reading

1
Composable Interventions for Language Models
42:35
42:35
Play later
Play later
Lists
Like
Liked
42:35This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new framework, composable interventions, that allows for the study of multiple interventions applied sequentially to the same language model. The discussio…
…
continue reading

1
‘These models will always hallucinate’: Seth Dobrin on LLMsDek
18:23
18:23
Play later
Play later
Lists
Like
Liked
18:23Dobrin, founder of advisory firm Qantm AI and former global chief AI officer at IBM, warns that popular generative AI models were trained on the whole of the internet and hallucinate at an unacceptable rate.
…
continue reading

1
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
39:05
39:05
Play later
Play later
Lists
Like
Liked
39:05This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of LLMs and evaluate them alongside human annotations which they find to have a high inter-annotator agreement. The study includes nine judge models and ni…
…
continue reading

1
‘Fraud is pervasive throughout the entire industry’: Crypto insider
21:34
21:34
Play later
Play later
Lists
Like
Liked
21:34Jake Donoghue, author of the book Crypto Confidential, shares some of the worst practices he saw as a founder of a cryptocurrency company.
…
continue reading

1
Breaking Down Meta's Llama 3 Herd of Models
44:40
44:40
Play later
Play later
Lists
Like
Liked
44:40Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation.” Will the latest Llama herd ignite new applications and modeling paradigms like synthetic data gene…
…
continue reading

1
Some banks are making a Faustian bargain with fintechs: Karen Petrou
18:57
18:57
Play later
Play later
Lists
Like
Liked
18:57Karen Petrou, the managing partner at Federal Financial Analytics and a long-time observer of banking and regulation, says banks need to do far more due diligence on potential fintech partners and exert more control over these relationships.
…
continue reading

1
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
33:57
33:57
Play later
Play later
Lists
Like
Liked
33:57Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.” The paper this week introduces LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. The researchers integrated the…
…
continue reading

1
What military members need from their banks
25:05
25:05
Play later
Play later
Lists
Like
Liked
25:05Two veterans and executives at Armed Forces Bank – Tom McLean and Jodi Vickery – share the challenges they see their customers face and new products the bank has rolled out this year to better serve them.
…
continue reading

1
Regulators are wise to be more careful’ after Chevron ruling
47:07
47:07
Play later
Play later
Lists
Like
Liked
47:07Gene Scalia, the banking lobby’s lawyer on retainer for a potential challenge to Washington’s capital reform effort, discusses the state of administrative law after the overturning of a key legal precedent.
…
continue reading

1
RAFT: Adapting Language Model to Domain Specific RAG
44:01
44:01
Play later
Play later
Lists
Like
Liked
44:01Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. SallyAnn DeLucia is joined by Sai Kolasani, researcher at UC Berkeley’s RISE Lab (and Arize AI Intern), to talk about his work on RAFT: Adapting Languag…
…
continue reading

1
Climate First Bank's plans to expand nationwide
30:17
30:17
Play later
Play later
Lists
Like
Liked
30:17The Florida bank has emerged from de novo status and can now offer its solar loans in a larger geographic footprint.
…
continue reading

1
LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic
44:00
44:00
Play later
Play later
Lists
Like
Liked
44:00It’s been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We’re excited to chat about this significant step forward in understanding how LLMs work and the implications it has for deeper understanding of the neural activity of language models. We take a closer look at some recent research from…
…
continue reading

1
Can data ownership be preserved in generative AI?
16:46
16:46
Play later
Play later
Lists
Like
Liked
16:46Foundational models like GPT-4, the large language model behind ChatGPT, have hoovered up content from publications like The New York Times and social media sites like Reddit and OpenAI, and it faces several lawsuits because of this. John Thompson, global head of artificial intelligence at EY and author of the book Data for All, has set up what is …
…
continue reading

1
Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment
48:07
48:07
Play later
Play later
Lists
Like
Liked
48:07We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment. Ensuring alignment (aka: making models behave in accordance with human intentions) has become a critical task before deploying LLMs in real-world applications. However, a major challenge faced by practitioners is the lack of clear guid…
…
continue reading

1
What might digital identity look like in the future?
21:30
21:30
Play later
Play later
Lists
Like
Liked
21:30Proof of identity is critical for many things, including being able to open a bank account, get a job, or obtain health care. Yet proving one’s identity is getting harder in a world of frequent data breaches. We asked Mariana Dahan, founder of the World Identity Network and chair of the Universal ID Council, what she thinks will solve this problem.…
…
continue reading

1
Breaking Down EvalGen: Who Validates the Validators?
44:47
44:47
Play later
Play later
Lists
Like
Liked
44:47Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation. This week’s paper explores EvalGen, a m…
…
continue reading

1
“The law was very clear” inside the Fed master account debate
36:53
36:53
Play later
Play later
Lists
Like
Liked
36:53Custodia Founder and CEO Caitlin Long says the Federal Reserve has rewritten the rules around accessing the government's payments system. The central bank and a federal court judge disagree. Editor’s note: This conversation was recorded on April 17. On April 26, Custodia Bank filed a notice of appeal, signaling that it will challenge the district c…
…
continue reading

1
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models
45:07
45:07
Play later
Play later
Lists
Like
Liked
45:07This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework. Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest o…
…
continue reading