Projects

DB-BERT: the database tuning tool that “reads” the manual.
UDO: the “Swiss Army Knife” of database tuning tools.
SkinnerDB: adaptive query processing with reinforcement learning.
NaturalMiner: automate exploratory data analysis via LLMs.
CodexDB: synthesize Python code for data processing via LLMs.
CiceroDB: efficient and effective voice query interfaces.
Scrutinizer: fact-checking natural language claims from data.
GenesisDB: generate personalized DBMS components (WIP).

Publications

2024

SIGMOD 2024 ThalamusDB: Approximate Query Processing on Multi-Modal Data. Saehan Jo, Immanuel Trummer.
SIGMOD 2024 ROME: Robust Query Optimization via Parallel Multi-Plan Execution. Ziyun Wei, Immanuel Trummer.
SIGMOD 2024 Demonstrating λ-Tune: Exploiting Large Language Models for Workload-Adaptive Database System Tuning. Victor Giannakouris, Immanuel Trummer.

2023

MEAP Book AI-Assisted Data Science: Large Language Models for Multimodal Data Analysis. Immanuel Trummer.
[Book]
VLDBJ 2023 DB-BERT: a Database Tuning Tool that “Reads” the Manual. Immanuel Trummer.
PVLDB 2023 Quantum-Inspired Digital Annealing for Join Ordering. Manuel Schönberger, Immanuel Trummer, Wolfgang Mauerer.
[Paper]
PVLDB 2023 Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. Simran Arora, Brandon Yang, Sabri Eyuboglu, Avanika Narayan, Andrew Hojel, Immanuel Trummer, Christopher Re.
[Paper Code]
PVLDB 2023 Can Large Language Models Predict Data Correlations from Column Names? Immanuel Trummer.
[Paper Code]
PVLDB 2023 ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu.
[Paper Code]
PVLDB 2023 Demonstrating ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Joins via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu.
[Code]
PVLDB 2023 Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4. Immanuel Trummer.
[Code]
SIGMOD 2023 Best Demo Runner Up Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language. Immanuel Trummer.
[Paper Code Talk]
SIGMOD 2023 Demonstration of ThalamusDB: Answering Complex SQL Queries with Natural Language Predicates on Multi-Modal Data. Saehan Jo, Immanuel Trummer.
[Paper Talk]
VLDB 2023 QDSM Quantum Optimisation of General Join Trees. Manuel Schönberger, Immanuel Trummer, Wolfgang Mauerer.
[Paper]

2022

PVLDB 2022 CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions Using GPT-3 Codex. Immanuel Trummer.
[Paper Code Talk]
PVLDB 2022 BABOONS: Black-Box Optimization of Data Summaries in Natural Language. Immanuel Trummer.
[Paper Code]
PVLDB 2022 From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management. Immanuel Trummer.
[Paper Talk]
PVLDB 2022 UDO: Universal Database Optimization Using Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Debabrota Basu.
[Paper Code Talk]
PVLDB 2022 SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms. Ziyun Wei, Immanuel Trummer.
[Paper]
VLDB 2022, PhD Workshop Building Learned Federated Query Optimizers. Victor Giannakouris, Immanuel Trummer.
[Paper]
SIGMOD 2022 Demonstrating DB-BERT: a Database Tuning Tool that “Reads the Manual”. Immanuel Trummer.
[Paper Code Talk]
AAAI 2022 Procrastinated Tree Search: Black-Box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. Junxiong Wang, Debabrota Basu, Immanuel Trummer.
[Paper Code]
SIGMOD 2022 DB-BERT: a Database Tuning Tool that “Reads the Manual”. Immanuel Trummer.
[Paper Code Talk]
CIDR 2022 Towards NLP-Enhanced Data Profiling Tools. (Abstract) Immanuel Trummer.
[Paper Code Talk]

2021

TODS 2021 “Best of SIGMOD” Edition SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Junxiong Wang, Ziyun Wei, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis
[Paper Code Talk]
PVLDB 2021 The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that Read the Manual. Immanuel Trummer.
[Paper Talk]
PVLDB 2021 Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries. Ziyun Wei, Immanuel Trummer, Connor Anderson.
[Paper]
IEEE Data Engineering Bulletin WebChecker: Towards an Infrastructure for Efficient Misinformation Detection at Web Scale. Immanuel Trummer.
[Paper Code]
SIGMOD Record 2021 Database Tuning Using Natural Language Processing. Immanuel Trummer.
[Paper]
SIGMOD 2021 Demonstrating UDO: a Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning. Junxiong Wang, Immanuel Trummer, Debabrota Basu.
[Paper Code]
SIGMOD 2021 Demonstrating Robust Voice Querying with MUVE: Optimally Visualizing Results of Phonetically Similar Queries. Ziyun Wei, Immanuel Trummer, Connor Anderson.
[Paper]
ICDE 2021 Optimally Summarizing Data by Small Fact Sets for Concise Answers to Voice Queries. Immanuel Trummer, Connor Anderson.
[Paper Code Talk]

2020

BDA 2020 Best Demonstration Award Scrutinizer: a System for Checking Statistical Claims. Georgios Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper]
PVLDB 2020 Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper Code]
PVLDB 2020 Mining an “Anti-Knowledge Base” from Wikipedia Updates with Applications to Fact Checking and Beyond. Georgios Karagiannis, Immanuel Trummer, Saehan Jo, Shubham Khandelwal, Xuezhi Wang, Cong Yu.
[Paper Data]
PVLDB 2020 Demonstration of ScroogeDB: Getting More Bang for the Buck with Deterministic Approximation in the Cloud. Saehan Jo, Jialing Pei, Immanuel Trummer.
[Paper]
PVLDB 2020 Demonstrating the Voice-Based Exploration of Large Data Sets with CiceroDB-Zero. Immanuel Trummer.
[Paper Code]
PVLDB 2020 Scrutinizer: Fact Checking Statistical Claims. George Karagiannis, Mohammed Saeed, Paolo Papotti, Immanuel Trummer.
[Paper Code]
SIGMOD 2020 Demonstration of BitGourmet: Data Analysis via Deterministic Approximation. Saehan Jo, Immanuel Trummer.
[Paper]
CIDR 2020 BitGourmet: Deterministic Approximation via Optimized Bit Selections. Saehan Jo, Immanuel Trummer.
[Paper Talk]

2019

SIGMOD 2019 Selected for “Best of SIGMOD” SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, Joseph Antonakakis.
[Paper Talk]
SIGMOD 2019 A Holistic Approach for Query Evaluation and Result Vocalization in Voice-Based OLAP. Immanuel Trummer, Yicheng Wang, Saketh Mahankali.
[Paper Talk]
SIGMOD 2019 Exact Cardinality Query Optimization with Bounded Execution Cost. Immanuel Trummer.
[Paper Code Talk]
SIGMOD 2019 Verifying Text Summaries of Relational Data Sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liu, Niyati Mehta.
[Paper Talk]
PVLDB 2019 AggChecker: a Fact-Checking System for Text Summaries of Relational Data Sets. Saehan Jo, Immanuel Trummer, Weicheng Yu, Xuezhi Wang, Cong Yu, Daniel Liy Niyati Mehta.
[Paper Demo]
CIDR 2019 Data Vocalization with CiceroDB. Immanuel Trummer.
[Paper Talk]

2018

PVLDB 2018 SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. Immanuel Trummer, Samuel Moseley, Joseph Antonakakis, Saehan Jo.
[Paper Code]
PVLDB 2018 Vocalizing Large Time Series Efficiently. Immanuel Trummer, Mark Bryan, Ramya Narasimha.
[Paper Talk]

2017

CACM 2017 Multi-Objective Parametric Query Optimization. Immanuel Trummer, Christoph Koch.
[Paper]
PVLDB 2017 Data Vocalization: Optimizing Voice Output of Relational Data. Immanuel Trummer, Jiancheng Zhu, Mark Bryan.
[Paper Talk]
SIGMOD 2017 Solving the Join Ordering Problem via Mixed Integer Linear Programming. Immanuel Trummer, Christoph Koch.
[Paper Code]

Funding

External

NSF 2239326: CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning (2023-2028).
NSF 1910830: III: Small: Regret-Bounded Query Evaluation via Reinforcement Learning (2019-2022).
Google AI-Powered Immediate Response to Pandemics Initiative: Data-Driven Automatic Fact Checking of Coronavirus Claims (2020).
Fact-Checking Development Grant (IFCN & YouTube): Automatic Fact-Checking of Coronavirus Claims (2020).
Google Faculty Research Award: Mining an “Anti-Knowledge Base” for Fact Checking from Wikipedia Updates (2018).
Google Faculty Research Award: Optimizing Voice-Based Output of Relational Data (2016).

Internal

CIDA RIF Seed Grant: Co-Creating a Human-Machine Interface Better Adapted for On-Farm Data Recording, Curation, Management, and Use (2021-2023).
Huawei Research Initiative: Deterministic Approximation via BitGourmet (2019).

Students

Junxiong Wang (Last paper)
Saehan Jo (Last paper)
Victor Giannakouris (Last paper)
Ziyun Wei (Last paper)

Teaching

Cornell Courses

CS 4320: Introduction to database systems.
CS 4321: Practicum in database systems.
CS 6320: Advanced database systems.
CS 7390: Seminar in database systems.

Online Courses

Introductory database lecture (over 1M views!)
Tutorial on large language models

Hosted on GitHub Pages — Theme by orderedlist