HPC Lab
Nucleus Decomposition: Finding the Hierarchy of Dense Subgraphs
General information
Finding dense substructures in a graph is a fundamental graph mining operation, with applications
in bioinformatics, social networks, and visualization to name a few. Yet most standard
formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard.
Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense
substructures, understand their distribution in the graph, and ideally determine
relationships among them. Current dense subgraph finding algorithms usually
optimize some objective, and only find a few such subgraphs without providing any structural
relations.
We define the nucleus decomposition of a graph, which represents the graph
as a forest of nuclei. Each nucleus is a subgraph where smaller cliques
are present in many larger cliques. The forest of nuclei is a hierarchy by containment,
where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei
can have limited intersections, which allows for discovery of overlapping dense subgraphs.
With the right parameters, the nucleus decomposition generalizes the classic notions of kcores and ktrusses.
We give provable efficient algorithms for nucleus decompositions, and empirically
evaluate their behavior in a variety of real graphs. The tree of nuclei consistently
gives a global, hierarchical snapshot of dense substructures, and
outputs dense subgraphs of higher quality than other stateoftheart solutions.
Our algorithm can process graphs with tens of millions of edges in less than an hour.
If you use Nucleus Decomposition, please cite:

A. E. Sarıyüce, C. Seshadhri, A. Pınar, Ü. V. Çatalyürek
"Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions",
24th International World Wide Web Conference (WWW), May 2015.
Download
Latest release: Nucleus362015
Dataset: data examples