GraphFrames User Guide (Python)

This notebook demonstrates examples from the GraphFrames User Guide.

Requirements

This notebook requires Databricks Runtime for Machine Learning.

GraphFrame(v:[id: string, name: string ... 1 more field], e:[src: string, dst: string ... 1 more field])

Table

The number of follow edges is 4

Table

Stateful queries

Most motif queries are stateless and simple to express, as in the examples above. The next example demonstrates a more complex query that carries state along a path in the motif. Such queries can be expressed by combining GraphFrame motif finding with filters on the result where the filters use sequence operations to operate over DataFrame columns.

For example, suppose you want to identify a chain of 4 vertices with some property defined by a sequence of functions. That is, among chains of 4 vertices a->b->c->d, identify the subset of chains matching this complex filter:

Initialize state on path.
Update state based on vertex a.
Update state based on vertex b.
Same for c and d.

If final state matches some condition, then the filter accepts the chain. The following code snippets demonstrate this process. The code identifies chains of 4 vertices where at least 2 of the 3 edges are “friend” relationships. In this example, the state is the current count of “friend” edges. In general, it could be any DataFrame Column.

Table

Label Propagation

Run static Label Propagation Algorithm for detecting communities in networks.

Each node in the network is initially assigned to its own community. At every superstep, nodes send their community affiliation to all neighbors and update their state to the most frequent community affiliation of incoming messages.

LPA is a standard community detection algorithm for graphs. It is very inexpensive computationally, although (1) convergence is not guaranteed and (2) one can end up with trivial solutions (all nodes are identified into a single community).

Table

GraphFrame(v:[id: string, name: string ... 2 more fields], e:[src: string, dst: string ... 2 more fields])

Table

graphframes-user-guide-py(Python)

GraphFrames User Guide (Python)

Requirements

Create GraphFrames

Basic graph and DataFrame queries

Motif finding

Stateful queries

Subgraphs

Standard graph algorithms

Breadth-first search (BFS)

Connected components

Strongly connected components

Label Propagation

PageRank

Shortest paths

Triangle count