§1. Overview
EVA is a video database management system (VDBMS) that automatically materializes and reuses the results of expensive UDFs to facilitate faster exploratory data analysis. Here is a demo of the EVA system on Colab.
Consider a law enforcement officer analyzing a video data set for tracking a suspicious vehicle with the help of a witness. Initially, the witness may only recall the vehicle model (e.g., SUV) and the approximate time frame in which they saw the vehicle (e.g., nighttime). So, the officer starts with Q1 and searches for all SUVs present during that timeframe to identify the suspicious vehicle. While going over the frames with SUVs returned by Q1, the witness might recall the color of the vehicle (e.g., red). Then, the officer narrows down the search space and looks for the license plate of all red-colored SUVs (Q2).
[sql] Q1:SELECT timestamp , bbox, VEHICLE_COLOR(bbox, frame) FROM VIDEO CROSS APPLY OBJECT_DETECTOR(frame) ACCURACY 'HIGH' WHERE timestamp > 6pm AND label ='car'AND AREA(bbox) > 0.3 AND VEHICLE_MODEL(bbox, frame) ='SUV'; Q2:SELECT timestamp , bbox, LICENSE(bbox, frame) FROM VIDEO CROSS APPLY OBJECT_DETECTOR(frame) ACCURACY 'HIGH' WHERE timestamp > 7pm AND timestamp < 8pm AND label ='car' AND AREA(bbox) > 0.3 AND VEHICLE_COLOR(bbox, frame) ='red' AND VEHICLE_MODEL(bbox, frame) ='SUV'; [/sql]
Across these two queries, several UDFs are redundantly evaluated (i.e.,VehicleModel, ObjectDetector, VehicleColor, Area) on many frames. We seek to accelerate these queries by materializing and reusing the results of expensive UDFs since these functions dominate the overall query processing time.
EVA uses a novel technique for analyzing UDF-based predicates using symbolic computing to determine the degree of reuse across queries for a given UDF. It employs a conditional apply operator to automatically transform UDF invocations to reuse materialized results from previous queries. Its ranking functions that guide predicate reordering and model selection decisions take the availability of reuse opportunities into consideration to determine the cost of UDF evaluation.
§2. Introduction
Researchers have presented novel techniques for efficiently analyzing visual data at scale in video database management systems(VDBMSs), including sampling, filtering, and specialized neural networks. However, in exploratory video analytics, queries often exhibit a significant overlap of computation due to the redundant execution of user-defined functions (abbrev., UDFs) associated with computer vision tasks (e.g., object detection).
Consider queries Q1 and Q2. We notice that certain UDFs are present in both queries (i.e.,VehicleModel,VehicleColor,ObjectDetector,and Area), indicating an opportunity for reusing results. But, it is challenging to determine the degree of overlap between the predicates in Q1 and Q2 for each UDF due to the complexity of the expressions. Furthermore, the VDBMS must search for reuse opportunities across all previously executed queries (i.e., not just compare two queries).
Reuse algorithms in traditional DBMSs rely on the exact matching of sub-plans between two queries. This rigid approach does not handle the queries like Q1 and Q2, as they vary in their complex predicates. Researchers have recently presented novel systems, such as Recycler and HashStash, that extend query matching to compare different predicates. However, Recycler only supports a single range predicate, and HashStash only supports a few hard-coded rules for predicate analyses. We need a technique for capturing the semantics of these predicates and accurately determining the degree of overlap between them.
EVA leverages a novel technique for determining how to reuse UDF results across queries using symbolic computing at query optimization time. The optimizer symbolically analyzes the predicates associated with every UDF invocation to identify opportunities for reusing results. We formulate transformation rules in EVA’s Cascades-style optimizer to rewrite queries to leverage views.
§3. Design
EVA leverages the materialized UDF results of previous queries to accelerate subsequent queries. We design a semantic reuse optimization algorithm that is triggered after the canonical optimization algorithms have been applied. The figure above shows the key steps of this algorithm.
- Identifying candidate UDFs. For all the UDFs found in the query plan, the optimizer identifies the UDFs whose results are worth materializing. It uses the profiled evaluation cost to filters out inexpensive UDFs like Area.
- Compute UDF Signature. A UDF’s signature serves as a unique fingerprint and helps identify occurrences of the same UDF across queries. When the optimizer applies the canonical transformation rules for rewriting the query, it keeps track of the signature associated with every UDF occurrence within the query and appends it to the UDFManager. EVA reuses results across UDFs with identical signatures. The UDFManager maintains a mapping from the UDFsignature to the corresponding materialized view.
- Materialization-aware optimizations. Given a candidate UDF and its historical invocations from the UDFManager, the optimizer seeks to answer two questions. First, if multiple UDFs must be evaluated on the same input table, what is the optimal ordering in which these UDFs should be evaluated? Second, if a collection of deep learning models are suitable for a given vision task, what are the appropriate models (physical UDFs) to minimize the execution cost?
- Rule-based transformation on the candidate UDFs. Lastly, the optimizer performs a rule-based transformation on the candidate UDFs to leverage existing results in materialized views.
§4. Publications
- BIBTEX TODO
- Demo
- Colab Demo
- Github
§5. People
*Both authors contributed to the work equally
§6. Acknowledgments
This work was supported in part by the U.S. National Science Foundation (IIS-1850342, IIS-1908984, CNS-1909346, CNS-2008368), Alibaba Innovative Research (AIR) Program, Cisco, Adobe, Intel, and a gift from Microsoft Corp. We thank colleagues in Georgia Tech Database Group and Embedded Pervasive Lab for their constructive feedback in improving the system.
§7. Future Plans
- We plan to support more SQL operators such as JOIN in EVA.
- We plan to investigate other video analytics techniques such as filters in EVA.
- Raise issues and pull requests at https://github.com/georgia-tech-db/eva