Master Student

Microsoft

Biography

I am a senior software engineer at Microsoft, working on SCOPE query optimizer, sitting at the core of Microsoft’s in-house data lake that processes nearly one million jobs and exabytes of data every day.

I am also an independent researcher in AI infra. I have years of experience on:

  • AI Agents
    • Maintainer of OpenHands, a leading open-source coding agent platform. 52k stars on GitHub, 800k+ download. [CODE] [PAPER] [DOWNLOAD]
    • Creator of TheAgentCompany, an AI benchmark on consequential real world tasks. [CODE] [PAPER] [WEBSITE]
  • Graph database
    • Maintainer of JanusGraph, a leading open-source graph database. 5.5k stars on GitHub, 500k+ download. [CODE] [DOWNLOAD]

Specialties

  • AI Infra
  • Big Data

Education

  • Master of Computational Science, 2022

    LTI, Carnegie Mellon University

  • Bachelor in Computer Science, 2019

    The University of Hong Kong

  • Exchange Student, 2018

    University of Toronto

  • Summer Student, 2016

    University of California, Berkeley

Projects

The Agent Company: Benchmarking LLM Agents on Consequential Real World Tasks

TheAgentCompany is the first benchmark that examines AI’s ability to complete real-world consequential tasks. I designed a reproducible, extensible evaluation framework from scratch, led a team of 10+ software engineers to complete the coding part, and co-authored the paper as one of the primary authors. It is under review at ICML 2025.

OpenHands - An Open Platform for AI Software Developers as Generalist Agents

OpenHands is the most popular open-source coding agent. I am one of the early cofounders and an active maintainer of OpenHands since 2024. It has been downloaded by more than 800k times, reported by multiple media, and cited by 118+ academic papers.

JanusGraph - leading open-source graph database

Graph databases are a fundamental building block of AI applications to leverage the private domain knowledge. Since 2019, I served as Technical Steering Committee and led the development of JanusGraph, the most popular open-source distributed graph database. It has been downloaded by more than 500k times.

PLOVER: Virtualized State Machine Replication System

Plover is the first Virtualized SMR (VSMR) System that achieves fast and multi-core scalable virtual machine fault-tolerance

MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks

A web application for motif clique search and interactive graph analysis

Publications

(2024). TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. Under Review at ICML 2025.

PDF Code

(2024). OpenHands: An Open Platform for AI Software Developers as Generalist Agents. The Thirteenth International Conference on Learning Representations (ICLR 2025).

PDF Code

(2020). MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks. 36th IEEE International Conference on Data Engineering (ICDE 2020) Demo Track.

PDF

(2020). Stable community structures and social exclusion. International Conference on Social Informatics.

PDF

(2018). PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance. 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18).

PDF Code Slides

Working Experience

Adventure in industry

 
 
 
 
 

Senior Software Engineer

Microsoft

Feb 2023 – Present Redmond, WA, USA

I work in Scope query optimizer team. Scope is the language for Microsoft’s internal big data platform. Scope carries the company, processing Exabytes of data and nearly a million jobs every day.

I lead the performance improvement efforts, including building sophisticated statistics model, tuning TPC-DS benchmarks, and developing state-of-the-art algorithms for join operations. My work has led to millions of dollars of savings per year for the company.

 
 
 
 
 

Software Engineer

Goldman Sachs

Jul 2019 – Jul 2021 Hong Kong

My team builds a massive live knowledge graph that enables company-wide business intelligence, machine learning, monitoring, and data governance. My contribution includes:

• Graph query optimizations that reduce average system latency by 50%.

• An Infrastructure as Code (IAC) solution that reduces data governance labor by 90%.

• A Spark streaming pipeline to ingest ~10 million production telemetries daily.

Awards

Silver Medal in IEEE-CIS Fraud Detection Competition

Rank 1246381 (Top 2%)

Silver Medal in Instant Gratification Competition

Rank 341832 (Top 2%)

Dean’s Honours List

HKU Foundation Scholarships for Outstanding Students