I am currently working on the following projects:
- We exploit reinforcement learning to select (near-)optimal query plans.
We use neither cost or cardinality models nor data statistics.
Instead, we divide query execution into micro episodes and try
different plans in different episodes. By measuring evaluation progress
per time unit and optimally balancing exploration and expoitation in
plan selections, we guarantee near-optimal expected execution cost
for queries on large data.
See here for more details, talks, and source code.
We are working on several research projects around voice-based data access.
Those projects fall into three broad categories: research on how to interpret
user speech input more reliably, research on optimally summarizing trends in
query results via voice output ("data vocalization"), and research on
specializing query processing to voice interfaces (e.g., by interleaving
system speaking time with processing time). Research outcomes are integrated
into CiceroDB, a novel DBMS designed from the ground up for voice-based data access.
for more details, talks, and publications.
Data are often summarized via text documents, examples include newspaper articles
by data journalists, business reports, or scientific papers. Mistakes in data
summaries often go by unnoticed as there is no time to verify each claim.
In collaboration with Google NYC, we created a system, similar to a spell checker,
that verifies consistency between natural language claims and relational data sets.
See here for a live
demo, talks, and publications.
Typically, approximate processing uses sampling and produces confidence bounds on query aggregates.
In BitGourmet, we produce deterministic bounds that guarantee to contain accurate values.
For that, we read carefully selected bits from each row, exploiting a "Bit-Store" data layout
that allows us to retrieve specific bit positions in specific columns efficiently.
See here for details, talks, and publications.
- Optimizer Testing
This line of work is targeted at supporting developers of query optimizers.
To assess the quality of query optimizers (and to identify candidate areas
for improvements), it is useful to compare plans produced by an optimizer
to guaranteed optimal plans. Guaranteed optimal plans are difficult to find
since we cannot rely on the optimizer cost or cardinality model. We are developing
approaches to find guaranteed optimal plans via offline optimization.
This requires executing plans or plan fragments to obtain
guaranteed bounds on execution costs and intermediate result sizes.
See here for more details, talks, and publications.
This list does not include several projects (in the areas of deterministic
approximation and automated fact checking) that we started recently.
I am currently working with the following PhD students:
- Saehan Jo
- George Karagiannis
- Junxiong Wang
- Ziyun Wei
I am regularly teaching the following courses at Cornell:
- CS 4320: Introduction to Database Systems
- CS 4321: Practicum in Database Systems
- CS 6320: Advanced Database Systems
- CS 7390: Seminar in Database Systems
More details about those courses can be found
I serve(d) in the following capacities:
- Since recently, I serve as associate editor for SIGMOD Record
- I serve(d) as reviewer for SIGMOD 2018, 2019, and 2020
- I serve(d) as reviewer for VLDB 2018, 2019, and 2020
- SIGMOD 2020 Demonstration of BitGourmet: data analysis via deterministic approximation. Saehan Jo, Immanuel Trummer.
- VLDB 2020 Mining an “Anti-Knowledge Base” from Wikipedia updates
with applications to fact checking and beyond. Georgios Karagiannis, Immanuel
Trummer, Saehan Jo, Shubham Khandelwal, Xuezhi Wang, Cong Yu
- CIDR 2020 BitGourmet: deterministic approximation via optimized bit selections. Saehan Jo, Immanuel Trummer.
- VLDB 2019 AggChecker: a fact-checking system for text summaries of relational data sets.
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta.
- SIGMOD 2019 SkinnerDB: Regret-bounded query evaluation via reinforcement learning.
Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis.
- SIGMOD 2019 A holistic approach for query evaluation and result vocalization in voice-based OLAP.
Immanuel Trummer, Yicheng Wang, Saketh Mahankali.
- SIGMOD 2019 Exact cardinality query optimization with bounded execution cost.
- SIGMOD 2019 Verifying text summaries of relational data sets.
Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta.
- CIDR 2019 Data Vocalization with CiceroDB.
I gratefully acknowledge funding from the following sources:
regret-bounded query evaluation via reinforcement learning
- Google Faculty Research Award for data vocalization
- Google Faculty Research Award for building an "anti-knowledge base"
- Support by Huawei for research on deterministic approximation