AI Research Papers
Dive into the latest technical papers with the Arize Community.
Sign up to join us for bi-weekly AI research paper readings.
Trending AI Research
Some of the most popular AI research papers we've covered lately.
Explore More AI Research
Stay up to date with the latest breakthroughs in AI.

Accurate KV Cache Quantization with Outlier Tokens Tracing
Researchers propose a smarter way to compress the KV Cache while preserving model quality.
Read full paper
Scalable Chain of Thoughts via Elastic Reasoning
Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models (LRMs) by explicitly separating the reasoning process into two distinct phases: thinking and solution.
Read full paper
Sleep-time Compute: Beyond Inference Scaling at Test-time
We recently discussed “Sleep Time Compute: Beyond Inference Scaling at Test Time,” new research from the team at Letta.
Read full paper
Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies
LLMs have revolutionized natural language processing, showcasing remarkable versatility and capabilities. But individual LLMs often exhibit distinct strengths and weaknesses, influenced by differences in their training corpora. This diversity poses a challenge: how can we maximize the efficiency and utility of large language models?
Read full paper
Agent-as-a-Judge: Evaluate Agents with Agents
This week we dive into a paper that presents the “Agent-as-a-Judge” framework, a new paradigm for evaluating agent systems.
Read full paper
Introduction to OpenAI’s Realtime API
We break down OpenAI’s realtime API. Sally-Ann DeLucia and Aparna Dhinakaran cover how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement.
Read full paper
Model Context Protocol (MCP) from Anthropic
Want to learn more about Anthropic’s groundbreaking Model Context Protocol (MCP)? We break down how this open standard is revolutionizing AI by enabling seamless integration between LLMs and external data sources, fundamentally transforming them into capable, context-aware agents.
Read full paper
How DeepSeek is Pushing the Boundaries of AI Development
How do you train an AI model to think more like a human? That’s the challenge DeepSeek is tackling with its latest models, which push the boundaries of reasoning and reinforcement learning.
Read full paper
Multiagent Finetuning: A Conversation with Researcher Yilun Du
This week we were excited to talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper “Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains.”
Read full paper
Swarm: OpenAI’s Experimental Approach to Multi-Agent Systems
As multi-agent systems grow in importance for fields ranging from customer support to autonomous decision-making, OpenAI has introduced Swarm, an experimental framework that simplifies the process of building and managing these systems.
Read full paper
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning
A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through reflection tuning.
Read full paperTop AI research papers
Source | Description | ||
---|---|---|---|
Source | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | Description |
Showcases the potential of AI agents in automating the entire scientific discovery process, providing insights into building self-sufficient AI system |
Source | AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms | Description |
Demonstrates the capability of AI agents to innovate and optimize algorithms without human input, which highlights advancements in autonomous AI development. |
Source | Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation | Description |
This paper offers a method for AI agents to autonomously generate research hypotheses, which could help develop more intelligent and creative AI systems. |
Source | Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models | Description |
Provides a tool for assessing and improving the safety and reliability of AI agents, crucial for deploying trustworthy AI applications. |
Source | ChatQA: Surpassing GPT-4 on Conversational QA and RAG | Description |
Underscores the potential of community-driven research and development in advancing the capabilities of large language models. |