One of the things I’m most excited about with the graph journey is prediction algorithms.
What makes them fun and accessible is that a lot of them are actually quite simple. Like looking for the least number of “hops” between nodes, (unweighted shortest path algos) or the sum of the shortest distances between a node and all other nodes (closeness centrality algos).
One of the most reliable sources of predicting the future is relationships and their proximity. In graph language proximity is accounted for by
“weighting” relationships. If Philly is 80 miles from Cape May, their relationship gets a weight of 80 miles.
In school I remember learning that the greatest predictor of the strength of a friendship was proximity. Pretty obvious when you think about it, but just that simple fact means we can predict so much.
When I moved home from China, the likelihood I would meet my wife here jumped. Once we did met, it became inevitable that we would meet each others’ friends and families.
I would likely meet a few of Ann’s Philly friends: they were nearby, in touch, and would do social things together. Then after some threshold of seriousness was reached, her parents. Then siblings. At some point, like over a holiday break, I’d meet her close high school friends who had since moved out of town.
At that point, you could look at interests, personality, past behavior to get a sense of how likely we would have been to become how close of friends with people in each others’ networks. Using a range of relevant inputs, you could get a pretty good idea early on into our courtship who in her network I’d be close with and vice versa.
Some predictions are so obvious we call them facts. It’s chemistry when you mix baking soda and vinegar, because you know you’re going to get fizz. It’s gravity when you throw something up and it comes down.
Network science works in a very similarly predictable fashion. Data scientists are using the same algorithms to study cancer medications as they do to study network characteristics.
Shortest path algorithms can help us find the fastest way to reach a goal. I imagine mapping a network and figuring out who you know that also knows some podcast host you want to connect with.
Closeness centrality can help us predict how quickly information will disseminate through who. I imagine knowing who are most effective at sharing content or influencing an online conversation would be great to know.
Some easy to make predictions are so obvious to us that we take them for granted and some we’re just so blind to (back to bias).
And finally, there are hard to make predictions that would be obvious if only we could wrangle our own data. Or not. I don’t know. Still preprocessing the html. Hopefully something worth sharing on that soon!