TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
Proceedings of the ACM on Programming Languages(2023)
摘要
Finding errors in machine learning applications requires a thorough
exploration of their behavior over data. Existing approaches used by
practitioners are often ad-hoc and lack the abstractions needed to scale this
process. We present TorchQL, a programming framework to evaluate and improve
the correctness of machine learning applications. TorchQL allows users to write
queries to specify and check integrity constraints over machine learning models
and datasets. It seamlessly integrates relational algebra with functional
programming to allow for highly expressive queries using only eight intuitive
operators. We evaluate TorchQL on diverse use-cases including finding critical
temporal inconsistencies in objects detected across video frames in autonomous
driving, finding data imputation errors in time-series medical records, finding
data labeling errors in real-world images, and evaluating biases and
constraining outputs of language models. Our experiments show that TorchQL
enables up to 13x faster query executions than baselines like Pandas and
MongoDB, and up to 40
user study and find that TorchQL is natural enough for developers familiar with
Python to specify complex integrity constraints.
更多查看译文
关键词
Integrity Constraints,Machine Learning,Query Languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要