Immanuel Trummer

I am assistant professor for computer science at Cornell University and head of the Cornell Database Group. My research is generally about making data analysis more efficient (e.g., by leveraging reinforcement learning for query planning or approximate processing methods), or about making data access more user-friendly (e.g., via voice interfaces or via special-purpose natural language query interfaces).

I am always looking for outstanding students who combine strong formal background with excellent implementation skills. If interested, mail me your CV. Follow me on Twitter for the latest news.

Current Projects

I am currently working on the following projects:

SkinnerDB
We exploit reinforcement learning to select (near-)optimal query plans. We use neither cost or cardinality models nor data statistics. Instead, we divide query execution into micro episodes and try different plans in different episodes. By measuring evaluation progress per time unit and optimally balancing exploration and expoitation in plan selections, we guarantee near-optimal expected execution cost for queries on large data. See here for more details, talks, and source code.
CiceroDB
We are working on several research projects around voice-based data access. Those projects fall into three broad categories: research on how to interpret user speech input more reliably, research on optimally summarizing trends in query results via voice output ("data vocalization"), and research on specializing query processing to voice interfaces (e.g., by interleaving system speaking time with processing time). Research outcomes are integrated into CiceroDB, a novel DBMS designed from the ground up for voice-based data access. See here for more details, talks, and publications.
AggChecker
Data are often summarized via text documents, examples include newspaper articles by data journalists, business reports, or scientific papers. Mistakes in data summaries often go by unnoticed as there is no time to verify each claim. In collaboration with Google NYC, we created a system, similar to a spell checker, that verifies consistency between natural language claims and relational data sets. See here for a live demo, talks, and publications.
BitGourmet
Typically, approximate processing uses sampling and produces confidence bounds on query aggregates. In BitGourmet, we produce deterministic bounds that guarantee to contain accurate values. For that, we read carefully selected bits from each row, exploiting a "Bit-Store" data layout that allows us to retrieve specific bit positions in specific columns efficiently. See here for details, talks, and publications.
Optimizer Testing
This line of work is targeted at supporting developers of query optimizers. To assess the quality of query optimizers (and to identify candidate areas for improvements), it is useful to compare plans produced by an optimizer to guaranteed optimal plans. Guaranteed optimal plans are difficult to find since we cannot rely on the optimizer cost or cardinality model. We are developing approaches to find guaranteed optimal plans via offline optimization. This requires executing plans or plan fragments to obtain guaranteed bounds on execution costs and intermediate result sizes. See here for more details, talks, and publications.
This list does not include several projects (in the areas of deterministic approximation and automated fact checking) that we started recently.

PhD Students

I am currently working with the following PhD students:

  • Saehan Jo
  • George Karagiannis
  • Junxiong Wang
  • Ziyun Wei

Teaching

I am regularly teaching the following courses at Cornell:

  • CS 4320: Introduction to Database Systems
  • CS 4321: Practicum in Database Systems
  • CS 6320: Advanced Database Systems
  • CS 7390: Seminar in Database Systems

More details about those courses can be found here.

Service

I serve(d) in the following capacities:

  • Since recently, I serve as associate editor for SIGMOD Record
  • I serve(d) as reviewer for SIGMOD 2018, 2019, and 2020
  • I serve(d) as reviewer for VLDB 2018, 2019, and 2020

Publications

2020

  • VLDB 2020 Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
  • VLDB 2020 Demonstration of ScroogeDB: getting more bang for the buck with deterministic approximation in the Cloud. Saehan Jo, Jialing Pei, Immanuel Trummer.
  • VLDB 2020 Demonstrating the voice-based exploration of large data sets with CiceroDB-Zero. Immanuel Trummer.
  • VLDB 2020 Scrutinizer: fact checking statistical claims. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
  • SIGMOD 2020 Demonstration of BitGourmet: data analysis via deterministic approximation. Saehan Jo, Immanuel Trummer.
  • VLDB 2020 Mining an “Anti-Knowledge Base” from Wikipedia updates with applications to fact checking and beyond. Georgios Karagiannis, Immanuel Trummer, Saehan Jo, Shubham Khandelwal, Xuezhi Wang, Cong Yu
  • CIDR 2020 BitGourmet: deterministic approximation via optimized bit selections. Saehan Jo, Immanuel Trummer.

2019

  • VLDB 2019 AggChecker: a fact-checking system for text summaries of relational data sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta.
  • SIGMOD 2019 SkinnerDB: Regret-bounded query evaluation via reinforcement learning. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis.
  • SIGMOD 2019 A holistic approach for query evaluation and result vocalization in voice-based OLAP. Immanuel Trummer, Yicheng Wang, Saketh Mahankali.
  • SIGMOD 2019 Exact cardinality query optimization with bounded execution cost. Immanuel Trummer.
  • SIGMOD 2019 Verifying text summaries of relational data sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta.
  • CIDR 2019 Data Vocalization with CiceroDB. Immanuel Trummer.

Funding

I gratefully acknowledge funding from the following sources:

  • NSF-1910830: regret-bounded query evaluation via reinforcement learning
  • Google Faculty Research Award for data vocalization
  • Google Faculty Research Award for building an "anti-knowledge base"
  • Support by Huawei for research on deterministic approximation

News 2020

  • Google funds our research on data-driven claim verification!
  • Our paper "Scrutinizer: a mixed-initiative approach to large-scale, data-driven claim verification" by George et al. was accepted at VLDB 2020!
  • Our paper "Demonstration of ScroogeDB: getting more bang for the buck with deterministic approximation in the Cloud" by Saehan et al. was accepted at VLDB 2020!
  • Our paper "Demonstrating the voice-based exploration of large data sets with CiceroDB-Zero" by Immanuel was accepted at VLDB 2020!
  • Our paper "Scrutinizer: fact checking statistical claims" by George et al. was accepted at VLDB 2020!
  • Our work on data-driven fact checking is covered by major European newspapers!
  • Our paper on SkinnerDB was selected for the Best of SIGMOD edition of TODS!
  • Our paper "Demonstration of BitGourmet: data analysis via deterministic approximation" by Saehan and Immanuel was accepted at SIGMOD 2020!

News 2019

News 2018

  • Our paper "Data vocalization with CiceroDB" by Immanuel Trummer was accepted at CIDR 2019!
  • Our paper "Verifying text summaries of relational data sets" by Saehan Jo et al. was accepted at SIGMOD 2019!
  • Our paper "SkinnerDB: regret-bounded query evaluation via reinforcement learning" by Immanuel Trummer et al. was accepted at SIGMOD 2019!
  • Our paper "A holistic approach for query evaluation and result vocalization in voice-based OLAP" by Immanuel Trummer et al. was accepted at SIGMOD 2019!
  • Our paper "Exact cardinality query optimization with bounded execution cost" by Immanuel Trummer was accepted at SIGMOD 2019!
  • Congrats to Samuel Moseley for winning a VLDB 2018 NSF Travel Grant!
  • Congrats to Mark Bryan for winning a honorable mention for the 2018 CRA Outstanding Undergraduate Researcher Award!
  • Our paper "Vocalizing Large Time Series Efficiently" was accepted at VLDB 2018.
  • Our paper "SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning" was accepted at VLDB 2018.
  • A pre-print of our paper on computational fact checking is online.
  • Congrats to Mark Bryan for winning the JP Morgan BOOM Award 2018 for CiceroDB!