Network Analysis of the Stanford Encyclopedia

This project is a network-focused analysis of the Stanford Encyclopedia of Philosophy with the aim of investigating its global structure. To generate these data I scraped the most recent (at the time) archive of the encyclopedia and constructed node and edge lists based on the following criteria.

  • Each node represents a single article.
  • At the bottom of each article there is a list of "Related Entries" (see e.g. the "Models in Science" related entries) that link to either entire articles or sections of articles.
  • For any article A that contains a link to article B in "Related Entries" a directed edge was created from node A to node B. Links to sections of articles were treated as if they linked to the entire article.
  • Edges were weighted according to the number of links contained in an article.

After generating the node and edge lists the analysis and visualization was performed in R and Gephi. Using a community-detection algorithm in Gephi the following visualizations were generated. The first of these is an overall look at the entire network. Node color represents belonging to common community. The subsequent five subgraphs pick out those individual communities for analysis. In those subgraphs, node color saturation represents betweenness centrality while node size represents degree. Because reading anything useful directly off of these hairballs is somewhat difficult, you can click through on each image for a higher resolution pdf file and find more discussion below.

Analysis

A couple of things stand out immediately to me when looking at these graphs. First is that the largest three subgraphs correspond, generally, to epistemology & metaphysics (purple), value theory (blue), and history (yellow). I guess it isn't surprising that we should see these pretty broad category distinctions in the network, given that philosophers created the SEP and philosophers tend to organize the field this way too. The fairly coarse-grained community-detection algorithm also picked out two others: philosophy of the sciences (red) and east Asian philosophy (green). These were slightly more surprising to me, since the philosophy of the sciences typically falls under the broad umbrella of E&M and since I would've thought that Asian philosophy would be more tightly linked to history or value theory.

Philosophical Communities and Methods

What I think makes more sense out of this result is that the community-detection algorithm is highlighting a methodological distinction in the articles. The thing that isolates philosophy of the sciences from E&M and Asian philosophy from history is method. Notice too that causation, specifically as it relates to fundamental physics, is included in the red subgraph. It seems to me that the red subgraph represents those philosophers of science who are actively engaged in projects within the context of a scientific field, as opposed to broader epistemic or metaphysical projects about science or its outputs.

The Asian philosophy articles are also highly insular relative to other parts of the graph, which is explained by the fact that most of the articles about Asian philosophers and their work include as 'Related Entries' only other articles in the same tradition. For instance, although the article on Mencius describes his influential views on virtue and the role that emotions play in an ethical life the related entries for Mencius list nothing except other articles in his tradition. I'm not claiming anything about the appropriateness of this kind of categorization or the assessment about what is properly 'related' to the article on Mencius, but it explains the isolation in the graph.

Not pictured above, but if the community-detection algorithm is applied to the larger subgraphs additional subcommunities are identified that are fairly cohesive and consistent with current practice. For instance, at a first pass E&M divides into logic/philosophy of mathematics, epistemology, analytic metaphysics, and philosophy of mind/cognitive science. History divides into ancient Greek/Latin, medieval philosophy, 17th/18th century European philosophy, and German philosophy. Value theory subcommunities include normative ethics, social/political philosophy, philosophy of religion (with a heavy emphasis on articles on free will), and philosophy of law.

Node-Level Properties

Looking at node-level properties reveals that the most influential nodes (according to several centrality metrics including betweenness, degree, and eigenvector) are most often articles on individuals. A little more than half of the top 20 articles on each of those three measures are philosophers, rather than theories, problems, or concepts. The most influential philosophers (again, in this network according to these three measures) include Aristotle, Aquinas, Descartes, Frege, Hume, Husserl, Leibniz, Locke, Kant, Mill, Plato, Rawls, and Russell. For the most part the 'Related Entries' sections of extremely general article topics (like epistemology) tend to be sparsely populated (the article on medieval philosophy being one notable exception), meaning that although these articles are highly general they fail to command much influence in this particular network.

Next Steps

Some additional things I'd like to do with the SEP archives include the following:

  • Additional visualizations of the philosophical subcommunities (including an analysis of these subcommunities individually).
  • Create a pipeline for analyzing all archives of the SEP and generate some dynamic visualizations of the encyclopedia's growth over time.
  • Comparison of the SEP to other philosophy reference works?
  • Comparison of node centrality with SEP published usage statistics.