AI Research Papers

Dive into the latest technical papers with the Arize Community.
Sign up to join us for bi-weekly AI research paper readings.

Trending AI Research

Some of the most popular AI research papers we've covered lately.

Podcast

Deep Papers

Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.

Listen now

AI Benchmark Deep Dive: Gemini 2.5 and Humanity’s Last Exam

A comprehensive overview of modern AI benchmarks, taking a close look at Google’s recent Gemini 2.5 release and its performance on key evaluations

Podcast

Deep Papers

Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.

Listen now

LibreEval: A Smarter Way to Detect LLM Hallucinations

The Arize team has generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models.

Podcast

Deep Papers

Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.

Listen now

Sleep-time Compute: Beyond Inference Scaling at Test-time

A new paper from researchers at Letta

Podcast

Deep Papers

Deep Papers is a podcast series since 2023 featuring deep dives on today’s most important AI papers and research.

Listen now

Explore More AI Research

Stay up to date with the latest breakthroughs in AI.

Top AI research papers

Source	Description
Source	The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery	Description	Showcases the potential of AI agents in automating the entire scientific discovery process, providing insights into building self-sufficient AI system
Source	AlphaEvolve: A Gemini-Powered Coding Agent for Designing Advanced Algorithms	Description	Demonstrates the capability of AI agents to innovate and optimize algorithms without human input, which highlights advancements in autonomous AI development.
Source	Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation	Description	This paper offers a method for AI agents to autonomously generate research hypotheses, which could help develop more intelligent and creative AI systems.
Source	Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models	Description	Provides a tool for assessing and improving the safety and reliability of AI agents, crucial for deploying trustworthy AI applications.
Source	ChatQA: Surpassing GPT-4 on Conversational QA and RAG	Description	Underscores the potential of community-driven research and development in advancing the capabilities of large language models.

Recommended resources

AI Agent Workflows and Architectures Masterclass

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Build More Accurate AI Apps Through Fast Experimentation with Arize Phoenix, Langflow, and NVIDIA

Start your AI observability journey.

Book a demo Get started

Arize AX

Arize Phoenix

Learn

Insights

Company

AI Research Papers

Dive into the latest technical papers with the Arize Community. Sign up to join us for bi-weekly AI research paper readings.

Trending AI Research

Deep Papers

AI Benchmark Deep Dive: Gemini 2.5 and Humanity’s Last Exam

Deep Papers

LibreEval: A Smarter Way to Detect LLM Hallucinations

Deep Papers

Sleep-time Compute: Beyond Inference Scaling at Test-time

Deep Papers

Explore More AI Research

Accurate KV Cache Quantization with Outlier Tokens Tracing

Scalable Chain of Thoughts via Elastic Reasoning

Sleep-time Compute: Beyond Inference Scaling at Test-time

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Agent-as-a-Judge: Evaluate Agents with Agents

Introduction to OpenAI’s Realtime API

Model Context Protocol (MCP) from Anthropic

How DeepSeek is Pushing the Boundaries of AI Development

Multiagent Finetuning: A Conversation with Researcher Yilun Du

Swarm: OpenAI’s Experimental Approach to Multi-Agent Systems

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

Composable Interventions for Language Models

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Extending the Context Window of LLaMA Models Paper Reading

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

RAFT: Adapting Language Model to Domain Specific RAG

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment

Breaking Down EvalGen: Who Validates the Validators?

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

Demystifying Amazon’s Chronos: Learning the Language of Time Series

Anthropic Claude 3

Reinforcement Learning in the Era of LLMs

Sora: OpenAI’s Text-to-Video Generation Model

Phi-2 Model

Mistral AI (Mixtral-8x7B): Performance, Benchmarks

How to Prompt LLMs for Text-to-SQL

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Explaining Grokking Through Circuit Efficiency

Large Content And Behavior Models to Understand, Simulate, and Optimize Content and Behavior.

Skeleton of Thought: LLMs Can Do Parallel Decoding Paper Reading

Extending the Context Window of LLaMA Models Paper Reading

Llama 2: Open Foundation and Fine-Tuned Chat Models Paper Reading

Lost in the Middle: How Language Models Use Long Contexts Paper Reading

Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Paper Reading

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

Voyager: An Open-Ended Embodied Agent with LLMs Paper Reading and Discussion

Retrieval-Augmented Generation – Paper Reading and Discussion

LoRA: Low-Rank Adaptation of Large Language Models Paper Reading and Discussion

Drag Your GAN: Interactive Point-Based Manipulation on the Generative Image Manifold

LIMA: Less Is More for Alignment – Paper Reading and Discussion

Hungry Hungry Hippos (H3) and Language Modeling with State Space Models

Toolformer: Training LLMs To Use Tools

OpenAI on Reinforcement Learning With Human Feedback (RLHF)

Top AI research papers

Recommended resources

AI Agent Workflows and Architectures Masterclass

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Build More Accurate AI Apps Through Fast Experimentation with Arize Phoenix, Langflow, and NVIDIA

Start your AI observability journey.

Subscribe to The Evaluator

Dive into the latest technical papers with the Arize Community.
Sign up to join us for bi-weekly AI research paper readings.