§1. Overview
Equitas is a tool for automated verification of equivalence of SQL queries. Automated SQL equivalence verification tools (e.g., Cossette) will help improve the efficacy and robustness of query optimisers in database management systems.
Given a pair of SQL queries, a verification tool determines whether they are equivalent or not. For example, consider a pair of queries operating on the following tables:
- Employee table (EMP): ⟨EMP_ID, EMP_NAME, DEPT_ID⟩
- Department table (DEPT): ⟨DEPT_ID, DEPT_NAME⟩
[sql]
Q1:SELECT ∗ FROM
(SELECT ∗ FROM EMP WHERE DEPT_ID = 10) AS T
WHERE T.DEPT_ID + 5 > T.EMP_ID;
Q2:SELECT ∗ FROM
(SELECT ∗ FROM EMP WHERE DEPT_ID = 10) AS T
WHERE 15 > T.EMP_ID;
[/sql]
Q1 is a nested query where the inner query selects employees whose DEPT_ID is 10. The outer query then applies another filter on the results of the inner query by retrieving tuples where DEPT_ID + 5 is larger than EMP_ID. Q2 is another nested query where the inner query retrieves tuples from EMP whose DEPT_ID is 10. The outer query then selects a subset of those tuples whose EMP_ID is less than 15. Since the inner query in Q2 only selects tuples whose DEPT_ID is 10, the outer predicates of both queries Q1 and Q2 are equivalent. Given that these queries are equivalent, the database system may re-use the results of Q1 to answer Q2, thereby eschewing redundant query execution.
Equitas is based on a novel approach to determining query equivalence based on symbolic representation. The key idea is to effectively transform a wide range of SQL queries into first order logic formulae and then use an SMT solver to efficiently verify their equivalence.
§2. Introduction
Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. Thus, we require automated cloud-scale tools for identifying equivalent queries to minimize computation overlap.
Cossette is the state-of-the-art algebraic approach to automated verification of query equivalence. While it is capable of modeling the semantics of complex real-world SQL queries, it suffers from two limitations. First, they are unable to model the semantics of widely-used SQL features, such as complex query predicates and three-valued logic. Second, they have a computationally intensive verification procedure. These limitations restrict their efficacy and efficiency in cloud-scale database-as-a-service offerings.
Equitas leverages an alternate approach to determining query equivalence based on symbolic representation. It derives the symbolic representation of the SQL queries and uses satisfiability modulo theories (SMT) to determine their equivalence. This approach can model the semantics of widely-used SQL features, such as complex query predicates, arithmetic operations, and three-valued logic. Reducing the problem of determining the equivalence of queries to that of deciding the satisfiability of formulae in first-order logic enables the usage of computationally efficient SMT solvers.
§3. Design
The figure above illustrates the pipeline for determining query equivalence. Equitas takes as input a pair of SQL queries to be checked (Q1 and Q2) and returns a boolean decision that indicates whether the given pair of queries are equivalent. The pipeline consists of three stages.
- The first stage is a compiler that takes the given pair of SQL queries and converts them into logical query execution plans.
- The second stage consists of Equitas which determines the equivalence of the logical query execution plans emitted by the compiler. EQUITAS is written in 3660 lines of Java.
- The third stage is an SMT solver that is leveraged by EQUITAS for determining the satisfiability of FOL formula. EQUITAS leverages the open-source Z3 SMT solver.
§4. Publications
- Qi Zhou, Joy Arulraj, Shamkant B. Navathe, William Harris, and Dong Xu, “Automated verification of query equivalence using satisfiability modulo theories,” PVLDB, 12(11):1276–1288, 2019. [PDF][DOI][BIBTEX]
§5. People
§6. Acknowledgements
This work was supported (in part) by the U.S. National Science Foundation (IIS-1850342). We thank Zhenyu Hou, Tao Guan, and Jingren Zhou from Alibaba for their constructive feedback. We are grateful to Shumo Chu for sharing the query benchmark.
§7. Future Plans
- We plan to publish the source code of Equitas in summer 2020.
- We are collaborating with the Apache Calcite optimizer team.
- Drop us a note at: qizhou@gatech.edu