Year
2023
Location
Halıcıoğlu Data Science Institute, UC San Diego
Category
Research
Duration
5 Months
Graph Neural Networks (GNNs) have gained tremendous popularity for their potential to effectively learn from graph-structured data, commonly encountered in real-world applications. However, most of these models, based on the message-passing paradigm (interactions within a neighborhood of a few nodes), can only handle local interactions within a graph. When we enforce the models to use information from far away nodes, we will encounter two major issues: oversmoothing & oversquashing. Architectures such as the transformer and diffusion models are introduced to solve this; although transformers are powerful, they require significant computational resources for both training and inference, thereby limiting their scalability, particularly for graphs with long-term dependencies. Hence, this paper proposes GraphHSCN—a Heterogenized Spectral Cluster Network, a message-passing-based approach specifically designed for capturing long-range interaction. On our first iteration of ablation studies, we observe reduced time complexities compared to SAN, the most popular graph transformer model, yet comparable performance in graph-level prediction tasks.
Graph coarsening via spectral clustering: We propose a scheme to coarsen graph representation via spectral clustering with the relaxed formulation of the MinCUT problem, as presented in the paper from Bianchi et. al. We observe the structural patterns uncovered by SC reveal which long-range virtual connections should be made.
New connections learned by a heterogeneous network: We create an intra-cluster connection with a virtual node, and learn the new relationship as a graph indepdenent of the original graph. A heterogeneous convolutional network is trained on these separate relations, further coarsening the representations. On our set of ablation studies, and after hyperparameter tuning, Graph-HSCN out-performs the traditional message-passing architectures by up to 10 percent, achieving metrics similar to those of SAN while reducing the time complexity.
We presented a poster, which you can view here, at the HDSI's Capstone Project Showcase in Spring 2023.
Refer to our GitHub repository for the source code and resources for running our model.