# Philosophical Influence in the SEP

The SEP data can be used to answer questions about which philosophers, arguments, or concepts are the most "influential" in academic philosophy. We have to be careful here, however, because the inference from "The node representing the article on John Rawls has one of the highest betweenness centrality scores in the fall 2018 SEP data." to "John Rawls is one of the most influential philosophers who ever lived." is a shaky one. In what follows I'm going to try to unpack inferences like this one to see what we really can infer from the structure of this network data.

## Centrality Measures or Measures of Relative Importance

Node centrality in a graph can be measured in several ways, but all centrality measures are intended to represent the "importance" or "influence" of a node in a graph, relative to other nodes in that graph. In this section I briefly define the principal measures of centrality for nodes in a graph and evaluate each of them as a potential measure of relative importance for the philosophical topics in the SEP.

The simplest centrality measure is **degree centrality**, which is a measure of the incident edges for each node. The more edges that are directly connected to a node, the greater that node's degree. For the SEP data, this measure is a useful first pass to identify highly-connected nodes, but not an especially useful as a measure of relative importance. There are two reasons why not. First, the obvious reason is that a node can acquire high degree by simply including lots of links to other nodes. Degree may therefore be a measure of each author's estimation of what philosophical topics are relevant to their article, but not a good measure of how important that article is in the encyclopedia as a whole, or in academic philosophy more generally. Second, degree centrality is the sum of both indegree and outdegree in directed graphs, so we should interpret degree centrality as the combination of both incoming and outgoing links. Because degree is a mixed measure, it is more difficult to interpret as a measure of influence. Plausibly, a node is more influential if it has more *incoming* links than *outgoing* links, since incoming links are determined by many authors of many articles. We could focus on in-degree alone, but this may also bias us against certain articles that are well-connected but (for whatever reason) have few incoming links.

Next is **betweenness centrality**. We need to have in mind the concept of a *shortest path* in order to define this measure. For an arbitrary pair of nodes (i,j) in a graph, a path is a sequence of connected nodes that begins with i and ends with j without ever repeating a node or edge. The shortest path is the shortest such sequence. Betweenness centrality is a measure of the number of times a node lies on the shortest path between every other pair of nodes in the graph. Betweenness centrality is a useful measure of relative importance in a graph under the supposition that transfer between nodes has a non-zero cost and non-zero benefit. Under these constraints there is an incentive to traverse the shortest path between nodes, and so a node with high betweenness centrality will be more valuable in the sense that it connects many pairs of nodes along their shortest path.

In the SEP data, betweenness centrality is a reasonable measure for node importance because philosophers are incentivized to know what ideas, concepts, and thinkers are relevant to their work. If a node lies along a path between two nodes, then that node is more likely to be encountered by a researcher. As we will see, the nodes with the highest betweenness centrality are disproportionately articles about philosophers themselves, rather than about arguments or concepts. I suspect the reason for this is that while two very different topics may have little in common, individual philosophers who worked on those topics can easily bring them together.

The third centrality measure is **eigenvector centrality**. Eigenvector centrality assigns a normalized score to every node in the network and then passes that score to all connected nodes. Eigenvector centrality scores equilibrate when the scores received and passed are equal for all nodes in the graph. Thus, nodes with a high eigenvector centrality measure are more likely to be connected to *other* high scoring nodes. The PageRank algorithm is a variant of eigenvector centrality, so one may suppose that it would be a useful measure of relative importance in the SEP data. However, because of the relatively high degree centrality of the graph, the distribution of eigenvector centralities is heavily skewed.

One interesting project would be to weight the eigenvector centrality scores based on the frequency with which articles in the SEP are accessed, and generate something like an alpha centrality score. Alpha centrality adds an external score to the normalized scores of eigenvector centrality, which affects the conditions under which the scores equilibrate.

The fourth and final centrality measure I'll discuss here, **closeness centrality**, was defined in Bavelas (1950) as the average length of the shortest path between a node and all other nodes in the graph. Since the early years of the SEP data are unconnected graphs, we can use closeness centrality's close cousin harmonic centrality for those years.[^1] We can also normalize this centrality measure to compare across years, but we should also report the average path length of the entire network as it gives us an idea of the graph's overall connectedness.

All of these centrality measures tell us something slightly different about the nodes in question and in isolation are relatively uninformative about a given node's relative importance in the overall SEP network. I'm hoping that in concert we can start to get an idea of which nodes appear most influential in the graph, and which nodes remain so over time.

I've calculated degree, betweenness, and eigenvector centrality for the years and reported that information below the visualizations. The next step is to identify which nodes appear in the top 1% or so of each of these centrality measures and how these values change as the SEP network grows.

[^1]: Marchiori and Latora (2000), Dekker (2005), and Rochat (2009)