I am currently working on the following projects:
- We exploit reinforcement learning to select (near-)optimal query plans.
We use neither cost or cardinality models nor data statistics.
Instead, we divide query execution into micro episodes and try
different plans in different episodes. By measuring evaluation progress
per time unit and optimally balancing exploration and expoitation in
plan selections, we guarantee near-optimal expected execution cost
for queries on large data.
See here for more details, talks, and source code.
We are working on several research projects around voice-based data access.
Those projects fall into three broad categories: research on how to interpret
user speech input more reliably, research on optimally summarizing trends in
query results via voice output ("data vocalization"), and research on
specializing query processing to voice interfaces (e.g., by interleaving
system speaking time with processing time). Research outcomes are integrated
into CiceroDB, a novel DBMS designed from the ground up for voice-based data access.
for more details, talks, and publications.
Data are often summarized via text documents, examples include newspaper articles
by data journalists, business reports, or scientific papers. Mistakes in data
summaries often go by unnoticed as there is no time to verify each claim.
In collaboration with Google NYC, we created a system, similar to a spell checker,
that verifies consistency between natural language claims and relational data sets.
See here for a live
demo, talks, and publications.
- Optimizer Testing
This line of work is targeted at supporting developers of query optimizers.
To assess the quality of query optimizers (and to identify candidate areas
for improvements), it is useful to compare plans produced by an optimizer
to guaranteed optimal plans. Guaranteed optimal plans are difficult to find
since we cannot rely on the optimizer cost or cardinality model. We are developing
approaches to find guaranteed optimal plans via offline optimization.
This requires executing plans or plan fragments to obtain
guaranteed bounds on execution costs and intermediate result sizes.
See here for more details, talks, and publications.
This list does not include several projects (in the areas of deterministic
approximation and automated fact checking) that we started recently.
I am currently working with the following PhD students:
- Saehan Jo
- George Karagiannis
- Jialing Pei
- Ziyun Wei
I am regularly teaching the following courses at Cornell:
- CS 4320: Introduction to Database Systems
- CS 4321: Practicum in Database Systems
- CS 6320: Advanced Database Systems
- CS 7390: Seminar in Database Systems
More details about those courses can be found
I serve(d) in the following capacities:
- Since recently, I serve as associate editor for SIGMOD Record
- I serve(d) as reviewer for SIGMOD 2018, 2019, and 2020
- I serve(d) as reviewer for VLDB 2018, 2019, and 2020
- VLDB 2019 AggChecker: a fact-checking system for text summaries of relational data sets.
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta.
- SIGMOD 2019 SkinnerDB: Regret-bounded query evaluation via reinforcement learning.
Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis.
- SIGMOD 2019 A holistic approach for query evaluation and result vocalization in voice-based OLAP.
Immanuel Trummer, Yicheng Wang, Saketh Mahankali.
- SIGMOD 2019 Exact cardinality query optimization with bounded execution cost.
- SIGMOD 2019 Verifying text summaries of relational data sets.
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta.
- CIDR 2019 Data Vocalization with CiceroDB.
I gratefully acknowledge funding from the following sources:
regret-bounded query evaluation via reinforcement learning
- Google Faculty Research Award for data vocalization
- Google Faculty Research Award for building an "anti-knowledge base"
- Support by Huawei for research on deterministic approximation