Creating an airport centrality data visualization web application with TigerGraph & Streamlit
Introduction
Centrality measures are an important tool to analyze the networks, not only social network, but also every network like electrical web or a national road network. All this analysis can be done using some simple topological measures that score nodes by their importance as a part of the big network.
TigerGraph
TigerGraph is a new kind of graph database, a native parallel graph database purpose-built for loading massive amounts of data (terabytes) in hours and analyzing as many as 10 or more hops deep in to relationships in real-time. TigerGraph supports transaction as well as analytical workloads, is ACID compliant, scales up and out with database sharding. TigerGraph is the fastest and only scalable graph database for the enterprise.
Streamlit
Streamlit is an open-source app framework for Machine Learning and Data Science teams. It’s an awesome new tool that allows engineers to quickly build highly interactive web applications around their data, machine learning models, and pretty much anything. The best thing about Streamlit is it doesn’t require any knowledge of web development.
First thing first, let’s take a quick look of this web application.
Set up database
Create a TigerGraph cloud account
To create a TigerGraph cloud account, go to https://tgcloud.io/, then click Login/Register button. Enter your email and password to create a new account. Then login to the TigerGraph cloud, you can see the TigerGraph dashboard.
Create a TigerGraph solution with Centrality Starter kit
Select the my solution on the sidebar to solution page, then click Create Solution button in the header. You can see the instructions for creating a new solution.
Choose Graph Analystics-Centrality Algorithms v3 as the starter kit. Customize your solution name and subdomain in Solution Settings step, then submit to create a new solution. Once the solution is done processing, open the solution in GraphStudio.
Queries Overview
Switch the global view to MyGraph, you could see the Schema of this Database. When you go to Write Queries section, you can find there are a bunch of queries are ready to use. It includes several centrality algorithms, such as Closeness Centrality, Betweenness Centrality, and PageRank Centrality. In this case, we don’t need to spend a lot of time on learning or writing those algorithms with gsql. Very convenient!
Working with Centrality Queries by Different Countries
In this project, I intended to display 4 types of centrality score for selected country, including degree centrality, closeness centrality, betweenness centrality and pagerank centrality. But not all of these queries are built as default, so it’s necessary to modify or create those queries to fit my needs.
Degree Centrality by country
Degree centrality is very simple. It based on the number of links held by each node. The links could be defined as in or out or both. I used outdegree as degree centrality for each vertex. Here is my implementation:
CREATE QUERY dc_by_country(/* Parameters here */STRING country, INT outputlimit=500) FOR GRAPH MyGraph {
/* Write query logic here */
TYPEDEF TUPLE<VERTEX Vertex_ID, STRING name, FLOAT lat, FLOAT lng, FLOAT score> vertex_score;
HeapAccum<vertex_score> (outputlimit, score DESC) @@topScores;
SumAccum<FLOAT> @score = 1;
Start = {Airport.*};
Start = SELECT s FROM Start:s WHERE s.country == country;
Start = SELECT v FROM Start:v
POST-ACCUM v.@score = v.outdegree("flight_to");IF outputlimit > 0 THEN
V = SELECT s FROM Start:s
POST-ACCUM @@topScores += vertex_score(s, s.name, s.latitude, s.longitude, s.@score);
PRINT @@topScores;
END;
}
Closeness Centrality by country
Closeness Centrality is a way of detecting nodes that are able to spread information very efficiently through a graph. It measures average distance from a vertex to every other vertex.
Since the solution has a built-in closeness centrality queries(called cc), which calculates closeness centrality for all airports in database. I only need search the target airports by selected country first, then calculate these airport closeness centrality score with provided algorithm. Here is how I implemented:
CREATE QUERY cc_by_country (BOOL display, INT outputLimit=500, INT maxHops, STRING country) FOR GRAPH MyGraph {
# Closeness Centrality main queryTYPEDEF TUPLE<VERTEX Vertex_ID, STRING name, FLOAT lat, FLOAT lng, FLOAT score> vertexScore;
HeapAccum<vertexScore>(outputLimit, score DESC) @@topScores;
SumAccum<float> @score;
SetAccum<EDGE> @@edgeSet; # list of all edges, if display is needed
INT numVert;
#INT maxHops = 10; # measure distance for vertices up to 10 hops away
Start = {Airport.*};
IF country != "" THEN
Start = SELECT v
FROM Start:v
WHERE v.country == country;
END;#Total number of vertices considered in graph
numVert = Start.size();# get closeness centrality for each vertex
Start = SELECT s FROM Start:s
POST-ACCUM s.@score = cc_subquery(s,numVert,maxHops),
@@topScores += vertexScore(s, s.name, s.latitude, s.longitude, cc_subquery(s,numVert,maxHops));PRINT @@topScores;
IF display THEN
PRINT Start[Start.@score];
Start = SELECT s
FROM Start:s -(flight_to:e)-> :t
ACCUM @@edgeSet += e;
PRINT @@edgeSet;
END;
}
Betweenness Centrality by country
Betweenness Centrality measures the extend to which a vertex lines on paths between other vertices. Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others.
The built-in betweenness centrality algorithm(called betweenness_cent) can calculate the selected country’s whole airports’ centrality score. However, it only return airport id and its score as result. For Displaying the airport and its score on map, I also need latitude and longitude for each airport. So I just modified the returned data format. Here is query:
CREATE QUERY betweenness_cent (INT maxHops, INT maxItems=500, STRING country) FOR GRAPH MyGraph {
# Betweenness Centrality main queryMapAccum<VERTEX,SumAccum<float>> @@BC;
SumAccum<float> @cent;
Start = {ANY};
IF country != "" THEN
Start = SELECT v FROM Start:v
WHERE v.country == country;
END;
Start = SELECT s FROM Start:s
ACCUM @@BC += bc_subquery(s, maxHops);
# Write scores to local accumulators of vertices.
Start = SELECT s FROM Start:s
POST-ACCUM s.@cent += @@BC.get(s)
ORDER BY s.@cent DESC
LIMIT maxItems;
PRINT Start[Start.id, Start.name, Start.latitude, Start.longitude, Start.@cent];
}
PageRank Centrality by country
Also, the default pagerank centrality(called pageRank_by_country) query only return the airport id and its score as result. I need more airport information like latitude, longitude and name. So I modified the result format. Here is the query:
CREATE QUERY pageRank_by_country(FLOAT maxChange=0.001, INT maxIter=20, FLOAT damping=0.85, STRING country, BOOL display=False, INT outputLimit=500) FOR GRAPH MyGraph { TYPEDEF TUPLE<vertex Vertex_ID, FLOAT lng, FLOAT lat, STRING name, FLOAT score> vertexScore;
HeapAccum<vertexScore>(outputLimit, score DESC) @@topScores;
MaxAccum<float> @@maxDiff = 9999; # max score change in an iteration
SumAccum<float> @received_score = 0; # sum of scores each vertex receives FROM neighbors
SumAccum<float> @score = 1; # Initial score for every vertex is 1.
SetAccum<EDGE> @@edgeSet; # list of all edges, if display is neededStart = {Airport.*}; # Start with all vertices of specified type(s)
Start = SELECT v
FROM Start:v
WHERE v.country == country;
WHILE @@maxDiff > maxChange LIMIT maxIter DO
@@maxDiff = 0;
V = SELECT s
FROM Start:s -(flight_to:e)-> :t
ACCUM t.@received_score += s.@score/(s.outdegree("flight_to"))
POST-ACCUM s.@score = (1.0-damping) + damping * s.@received_score,
s.@received_score = 0,
@@maxDiff += abs(s.@score - s.@score');
END; # END WHILE loopIF outputLimit > 0 THEN
V = SELECT s FROM Start:s
POST-ACCUM @@topScores += vertexScore(s, s.longitude, s.latitude, s.name, s.@score);
PRINT @@topScores;
END;IF display THEN
PRINT Start[Start.@score];
Start = SELECT s
FROM Start:s -(flight_to:e)-> :t
ACCUM @@edgeSet += e;
PRINT @@edgeSet;
END;
}
To run the queries in python, you need to install those queries in GraphStudio before using them with pyTigergraph.
Import packages
I used several packages to implement this web application. Here is a list of packages I used:
import pyTigerGraphBeta as tg #TigerGraph python connector
import streamlit as st #web framework
import pydeck as pdk #used for map display
import altair as alt #display chart
import json
import pandas as pd
import numpy as np
Make sure all of these packages are installed in your python environment using:
pip install [package_name]
Connect to TigerGraph database
To connect to TigerGraph cloud, you need to enter the subdomain name(hostname), graph name, GraphStudio username and password(‘tigergraph’ is a default username), and api token.
import cfg
import pyTigerGraph as tg #get token
cfg.token = tg.TigerGraphConnection(host="<hostname>", graphname="<graph_name>").getToken(cfg.secret, "<token_lifetime>")[0] #connect to tg cloud
conn = tg.TigerGraphConnection(host="<hostname>", graphname="<graph_name>", password=cfg.password, apiToken=cfg.token)
With Tigergraph connection established, we could run the queries with function runInstalledQueries. I will use this function in next section.
Create an Application
So far, we did a lot of work on database queries and connection. Let’s move to frontend part — building a Streamlit web application.
Layout and Widgets
The default layout for Streamlit is narrow central column, which leaves a lot of blank of left and right sides. I chose to set the layout wide, since it would look nicer than default layout. Using st.beta_column
could make widgets side-by-side.
#set layout as wide
st.set_page_config(layout="wide")#show side by side map
#col1, col2 separate the page into two part evenly
col1, col2 = st.beta_columns(2)
col3, col4 = st.beta_columns(2)
As the picture above, we can see the web page contains two widgets, sidebar pinned to left and the main page. Streamlit gives us a very straight forward method called st.sidebar
, so we don’t need to worry about writing a complex CSS layout. If you want to add more widgets into sidebar, just use st.sidebar.[element_name]
. In this application, I used select box, slider, checkbox and text input.
Fetch Data and Feed Them into Map and Chart
Since we are using pyTigergraph, it’s quite simple to fetch data from TigerGraph cloud. We can use runInstalledQuery(queryname, params=None, timeout=None, sizeLimit=None)
method to run an installed query in tgcloud. For arguments, params
is a string of param1=value1¶m2=value2
format or a dictionary. Let’s see fetching degree centrality by country as example:
First, I defined params
as "country="country
. country
on the left of =
should be same with the parameter define in query. country
on the right of =
is target value that I going to search. Then I wrote conn.runInstalledQuery
, defined the query name dc_by_country
and passed the params
. If there is no error after running this query, you can get the json object like this:
Then you need to parse that json and construct a dataframe res
with the parsed json list. Here is what res
looks like:
The last step is going to alter the res
data that the map could recognize, then filled them to the map. I used pydeck package to display the data on the map. Pydeck is an powerful python data visualization library, it supports different types of charts and maps for data display. Here is how I implemented the map:
As for the chart, I used Altair library. Altair is also a visualization tools for python which is similar to pydeck. Although pydeck and Altair are similar data visualization library, I still decided to use Altair, because I think Altair provides more attractive charts than pydeck.
Here is how I modify the data format to feed the chart:
Here is how to draw a chart:
For the other centralities’ data fetching and display, they are similar to degree centrality’s. Thus, I’m not going to explain one by one, if you have doubts about how to implement them, welcome to check our git repo.
Get Nearby Airports’ Centrality Score of Input City
In the previous part, we focus on country’s airports centrality score. In this part, we are going to explore nearby airports centrality of a city.
Obtain geographical coordinates of input city
Install and import geopy
package in your project:
#install in terminal
pip install geopy#import in.py file
from geopy.geocoders import Nominatim
Here is how I use this package to get city’s coordinate:
Calculate nearby airports
We now have city coordinates information. We can also set up the miles range of nearby city in page (as marked below):
With city coordinates and distance range, we can write a gsql query to find out all airports within the range in TigerGraph. Here is how I calculate the eligible airports:
CREATE QUERY calculateWeights(/* Parameters here */FLOAT lat, FLOAT lng, int distance) FOR GRAPH MyGraph {
/* Write query logic here */
TYPEDEF TUPLE<VERTEX Vertex_ID> vertex_dis;
SetAccum<vertex_dis> @@resultSet;
double pi = 3.14159265359; // pi
double R = 3958.8; // earth's radius in miles
//to_vertex_set("CNX-3931", "Airport");
Start = {Airport.*};
ResultSet = {};
Heavy = SELECT s FROM Start:s
ACCUM
double lat1 = s.latitude * pi / 180, // lat1 to radians
double lat2 = lat * pi / 180, // lat2 to radians
double deltalat = (lat - s.latitude) * pi / 180, // lat change in radians
double deltalong = (lng - s.longitude) * pi / 180, // long change in radians
double a = sin(deltalat/2) * sin(deltalat/2)
+ cos(lat1) * cos(lat2)
* sin(deltalong/2) * sin(deltalong/2),
//double atanp1 = sqrt(a), // temp
//double atanp2 = sqrt(1-a), // temp
double c = 2 * atan2(sqrt(a), sqrt(1-a)),
INT miles = ceil(R * c),
IF miles < distance THEN
@@resultSet += vertex_dis(s)
END;
PRINT @@resultSet;
}
From @@reslutSet
, we get id of each eligible airport. We move to the last step: calculate these airports centrality score.
Fetch data and feed them into map and chart
I also take degree centrality implementation as example in this section.
First, we should write a new degree centrality query:
CREATE QUERY degreeCentrality(/* Parameters here */SET<VERTEX> source, INT outputlimit = 100) FOR GRAPH MyGraph {
/* Write query logic here */
TYPEDEF TUPLE<VERTEX Vertex_ID, FLOAT lat, FLOAT lng, STRING name, FLOAT score> vertex_score;
HeapAccum<vertex_score> (outputlimit, score DESC) @@topScores;
SumAccum<FLOAT> @score = 1;
Start = {source};
Start = SELECT v FROM Start:v
POST-ACCUM v.@score = v.outdegree("flight_to");IF outputlimit > 0 THEN
V = SELECT s FROM Start:s
POST-ACCUM @@topScores += vertex_score(s, s.latitude, s.longitude, s.name, s.@score);
PRINT @@topScores;
END;
}
Then, we convert @@resultSet
to DataFrame airports
in python. As I used SET
of vertices without defined type, SET
is treated as an array. In that case, I need to build a parameter like this: source[0]=AIRPORT_ID&&source[0].type=Airport&&source[1]=AIRPORT_ID&&source[1].type=Airport
. (For more detail, please check the documentation here)
Thus, I use airports
to build a new list as a parameter when execute runInstalledQuery
. Here is my implementation:
To avoid the 414 error (URI too long), I divided the request into several parts. Now we can get the res
like this:
Displaying data on map and chart code are similar to country centrality calculating part. I will not go into details. The complete code is in here, welcome to check and leave me a comment.
Next step
This is my first time accessing to web application with python. And I learned a lot about data visualization with python. I had got a lot fun with that. I hope you could also get fun and inspiration from this blog. Now, here is a challenge for you to build your own web application with TigerGraph and Streamlit.
Thanks for reading!
Resources
https://docs.streamlit.io/en/stable/api.html
https://pytigergraph.github.io/pyTigerGraph/
https://deckgl.readthedocs.io/en/latest/
https://altair-viz.github.io/gallery/index.html
https://www.youtube.com/watch?v=msbR_S___R8