hide
Free keywords:
Computer Science, Learning, cs.LG,Computer Science, Databases, cs.DB
Abstract:
Much of the world's most valued data is stored in relational databases and
data warehouses, where the data is organized into many tables connected by
primary-foreign key relations. However, building machine learning models using
this data is both challenging and time consuming. The core problem is that no
machine learning method is capable of learning on multiple tables
interconnected by primary-foreign key relations. Current methods can only learn
from a single table, so the data must first be manually joined and aggregated
into a single training table, the process known as feature engineering. Feature
engineering is slow, error prone and leads to suboptimal models. Here we
introduce an end-to-end deep representation learning approach to directly learn
on data laid out across multiple tables. We name our approach Relational Deep
Learning (RDL). The core idea is to view relational databases as a temporal,
heterogeneous graph, with a node for each row in each table, and edges
specified by primary-foreign key links. Message Passing Graph Neural Networks
can then automatically learn across the graph to extract representations that
leverage all input data, without any manual feature engineering. Relational
Deep Learning leads to more accurate models that can be built much faster. To
facilitate research in this area, we develop RelBench, a set of benchmark
datasets and an implementation of Relational Deep Learning. The data covers a
wide spectrum, from discussions on Stack Exchange to book reviews on the Amazon
Product Catalog. Overall, we define a new research area that generalizes graph
machine learning and broadens its applicability to a wide set of AI use cases.