Rebecca Qian

Rebecca Qian

I am the Co-Founder and CTO of Patronus AI. We are creating frontier evaluations and reinforcement learning environments to steer AGI. Our mission is to enable scalable oversight - building the infrastructure needed to understand, measure, and align increasingly capable models. I believe that this is the most important problem humanity must solve in our lifetimes.

My current research interests focus on developing hyperrealistic RL environments that mimic human problem solving and collaboration. I am interested in reward design, adaptability, task decomposition, evolutionary task design, tool use, and curriculum learning.

I have always been fascinated by individual and collective intelligence; how humans reason, learn, and how systems self-organize toward understanding. I'm inspired by the intersection of philosophy, cognitive science, and machine learning, and motivated by the belief that our progress in AI will shape every other field of knowledge.

Previously, I was a researcher at Meta AI (FAIR), where I demonstrated that pretraining LLMs on demographically altered corpora resulted in demographically invariant (fairer) language models.

Research

MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments [2025]

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning [2025]

TRAIL: Trace Reasoning and Agentic Issue Localization [2025]

Glider: Grading LLM Interactions and Decisions using Explainable Ranking [2024]

Lynx: An Open Source Hallucination Evaluation Model [2024]

FinanceBench: A New Benchmark for Financial Question Answering [2023]

SimpleSafetyTests: A Test Suite for Identifying Critical Safety Risks in Large Language Models [2023]

Step by Step to Fairness: Attributing Societal Bias in Tasems [2023]

Perturbation Augmentation for Fairer NLP [2022]

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction [2022]

Human Evaluation of Conversations is an Open Problem: Comparing the Sensitivity of Various Methods for Evaluating Dialogue Agents [2022]