Let's continue the exploration of the Star Wars expanded universe! The first part of this post can be found here. To recap quickly for the avid and impatient reader, I have extracted a lot of information from the dedicated Star Wars wikipedia called Wookiepedia. In this post, we focus on the exploration of the network of all characters of the expanded universe.
Star Wars character galaxy!
If the graph is not too big, it is generally interesting to draw it to have an overview of its properties. So let's have a look at (a part of) our network of characters. But to visualize the entire graph, you have to choose a side of the Force.
So what's happening here?
Well, the first noticeable thing is that the graph is disconnected. It means that small clusters of nodes, drawn on the outer rim of the galaxy, are only connected between themselves and are not really linked to the rest of the corpus. The most probable explanation is that the wiki is (still) incomplete with no connections from the isolated clusters to the main characters. On the contrary, the core of the galaxy is actually well connected and form the largest connected component of the graph. The colors reveal communities of characters with a large number of interactions.
From this observation, we could actually create a new tool that prioritizes parts of the wiki that need to be completed. Imagine a small game where you would pick one of the isolated characters. The goal would be to search the web to find the missing connections with existing known characters. To create the link, you would simply draw an edge between the unknown character to its known counterpart. Using the wisdom of crowds, the missing links could be completed in no time!
I'm curious, what's in the center?
Well that's easy: the most connected characters. If you recall my first post on the subject, I showed the most connected characters of the dataset. Now let's go deeper and extract the subgraph of those characters. To do so, we select the top connected nodes (80 here) and we only keep edges connected to those nodes. To improve the readability, only the 20 first most connected characters are labeled here.
This visualization is very interesting because it reveals how communities of characters interact together. For the curious, I used graph-tool to draw the graph.
You must unlearn what you have learned
Although it might be unclear for the untrained eye, the result of the hierarchical community detection algorithm is both plausible and reassuring. By taking a look at the different communities, we see that not only the nodes are actually grouped by the people they interact with the most in the expanded universe, which confirms the definition of a community, but they are also grouped by their corresponding era. This is interpretably sound and logical as people are mostly interacted with while alive, even in a fictional world.
Old Republic Era. Represented by its most emblematic character, Revan. Quoting Wookiepedia: Revan ... renowned as the Revanchist, honored as the Revan, reviled as Revan the Butcher, dreaded as the Dark Lord of the Sith Darth Revan, and praised as the Prodigal Knight. Note that the Old Republic community has links to characters living more than 3000 years after this era. In Star Wars, it is not uncommon for "Force" ghosts (like Obi-Wan) to be summoned. Links can also be created when a Jedi or Sith interacts with an Holocron from an ancient master.
Rise of the Empire, episode 1, 2, 3. We have Yoda, Obi-Wan, Amidala, Mace Windu, Count Dooku, Jango Fett, etc. While some of them also exist after this era, they are more connected to people in this era. For instance in episode 5 and 6 Yoda would only speak with Luke and Obi-Wan whereas in the previous trilogy he is at the head of the Jedi Council! We also have Ahsoka Tano, a blue eyed Togruta female, padawan of Anakin Skywalker during the Clone Wars. Another curiosity is Jabba Desilijic Tiure, better known as Jabba the Hutt, the fat (1358 kg) 600 year old Hutt. Most of his criminal activities actually take place before the birth of Luke.
Rebellion Era, episode 4, 5, 6. This cluster regroups the most famous characters of the movies: with Han Solo, Chewbacca, Luke Skywalker, Palpatine and Anakin Skywalker (aka Darth Vader if you didn't know). You may wonder why Boba Fett, clone son of Jango Fett, appears as one of the most connected characters? If so, just take a look at the length of his biography. The most famous bounty hunter of the galaxy has his fair share of adventures!
New Republic era and after, now considered Legends. Here we can find Mara Jade Skywalker, wife of Luke Skywalker or Darth Caedus, son of Han and Leia, who turned to the dark side.... Sounds like Disney's seventh episode eh? Mitth'raw'nuruodo is the famous Imperial Grand Admiral who crushed the New Republic by taking control of the remaining Imperial forces after the destruction of the second death star. And what about Wedge Antilles? You may know him as Red Leader during the Battle of Endor! He is a famous pilot and Luke Skywalker's lifelong friend. Let's finish with Lando Calrissian, professional gambler who apparently betrayed Han Solo on Bespin, the city in the clouds but who redeemed himself by participating in Han's rescue against Jabba. His story begins to unravel after the fall of the Emperor.
As you can see the whole Star Wars universe is coherent and fun details can be revealed using graph theory. However let's not forget that we need data to do data-science. In our case, wiki contributors are of paramount importance as they actually create the content we use to blog about. May the Force be with them, always.
Could you make this whole galaxy of characters interactive?
Patience. (Obi-Wan to Luke, The Empire Strikes Back)
TL;DR: it's in progress.
I would like to create a smooth experience to explore the expanded universe, a bit similar to what I did for the Montreux Jazz Festival. Imagine an interface where you could search characters and display infos on them as well as a preview of the wiki on the side. The challenging part is to find meaningful anchors for the nodes, like geographical locations for the artists of the MJF.
As always, stay tuned for awesome data-science on Star Wars!
Let's continue the exploration of the Star Wars expanded universe! The first part of this post can be found...
In this post, I will try to give some insights on the Star Wars expanded universe. All the data come...
This post is a continuation of my previous article on the Montreux Jazz Festival. In this post, we will...