•  
  •  
  •  
  •  
  •  
For the longest time (and before my D3 Organization Chart post) I have been looking for organization charts in my company.  Some departments are outdated, some are unavailable on the intranet, some are in this format or that format.  In the end I have dozens of pieces of the puzzle that still need to be fit together.  In the end, I gave up.

Now the other day I happen to have been in Outlook inspecting our Global Address List (GAL) and noticed this tab.

HOORAH!  For each employee we have their manager and subordinates, hence it should be easy to create an org chart.  Well not quite.

Collecting Data

The actual difficulty in this task is downloading the data.  You can open up your GAL and a particular contact and actually SEE the data, but if you copy that contact to your local address book (which you could export), the information disappears.  You cannot export the GAL unless your Exchange admin either.  Trust me I fought and fought to no avail.

A friend luckily came to my rescue.  They mentioned CSVDE.exe which is a command line utility for importing and exporting Active Directory data.  Note that it appears to be only available on Windows Server  (some people have mentioned copying the executable and running it locally, however I had little luck).  What I did was simply log into one of our servers and ran the following:

CSVDE -f allpersons.csv -r objectCategory=person -l "objectclass, objectcategory, givenName, sn, displayName, mail, title, department, manager, directReports, company"

The first parameter is the file to save to, the second is a filter, and third is a list of fields.  An explanation of some of the less obvious fields.

  • DN – Distinguished Name (KEY)
  • objectClass – objectClass = User.  Also used for Computer, organizationalUnit, even container. Important top level container.
  • sn – Surname
  • givenName – First Name
  • displayName – typically SURNAME, GIVEN
  • objectCategory – Defines the Active Directory Schema category. For example, objectCategory = Person
  • manager – DN (key) of manager
  • mail – email address
  • directReports – DN (key) of subordinates (semicolon separated list)

I highly suggest using the filter since this is from the Active Directory which includes many things which are not people.

Data Cleaning

Next I cleaned up the data since even with the filter, not all records are valid.  For the most part, I removed
  • Terminated employees
  • Employees that did not have any manager AND subordinates
  • Employees who did not have a comma in their displayName
From this set I created two edge lists:
  1. Manager (manager) & Report (DN)
  2. Manager (DN) & Report (directReport) via
    ddply(subset(data, hasReport), .(DN),
         summarize,
         report = unlist(strsplit(directReports, ";")))

Two because I was not sure if a manager referenced their subordinates exactly as the other direction.  Next you just need to union the two lists via unique(rbind(...)) and viola you have the edge list.  The node list is simply DN (the ID) and any other fields you wish (I pulled displayName and department).  BE SURE to relabel DN to ID and manager/report to source/target for GEPHI before saving.

Graphing

Finally we move to the fun part and for this we use my favorite network visualization application: Gephi.  By no means will or can I give a tutorial here, rather I will abbreviate some things I did with this network.

Now right off the bat I had two issues.  First, my data had many isolated networks of just a few people some how not linked in any way to the giant component.  We fix this by using Giant Component filter.  What is nice about this filter is it adds a new column named Component ID, which are unconnected networks.  Hence all I had to do was delete all components not equal to the largest (via a partition filter).  At this point Nodes = Edges – 1.

The second problem is my network is too big (the Active Directory actually had global wide information).  To split it up I took our CEO and found their neighbors.  This is done via Ego Network filter, adding the their node ID and setting the depth to 1. Then I created a new column to and labeled these nodes.  I deleted the CEO and then reapplied Giant Component to get the new IDs of each network.  TaDa!

Use Force Atlas 2 with a large Tolerance (speed) to quickly arrange the graph.  Then reset the speed and turn on Dissuade Hubs.Wrap up the graphing I offer some tips:

  • As it is running move large portions of the nodes around to jiggle free the branches.  Once the speed is reset, you can move smaller branches around to prevent overlaps.
  • You can also run Yifan Hu afterwards.  This algorithm yields more of a tree structure than bubbles of neighbors, however it appears to expand the network oddly.
  • To reduce the network size you could remove non-managers simply by using a Degree Range filter of greater than 1 (since those employees should only have one connection)
  • Finally, reduce the size of the nodes and add labels.

 

Further Thoughts

Currently I am still tweaking the networks to print and still be able to read (you can blow up the plots on multiple sheets via MS Paint, PosteRazor, GIMP, or InkScape).  Unfortunately this isn’t very analytically but labor intensive since GEPHI does not quite handle text very well (wrapping and better collision detection would be nice).  However a few things I am curious about are:
  • Comparing this network to other connection modes such as the network of employees communication  (which should be cross department) and where they are located spatially (which could simply be coloring)
  • Error detection.  Since I also have address and department information attached to each employee we could use community detection or other methods to see if an employee is not labeled like their peers.
  • Similar to above, but just report general statistics of branches.  For instance does a particular branch have higher turnover?  Of course HR data would be needed, which with that addition would yield an abundance of questions.
  • Creating a quick (R, SQL, ???) script to give the path between two people.  Since I run into many people all the time I often wonder how I am connected to them.

This of course was done prior to utilizing D3. Check out the next incarnation of this in D3 Organization Chart post. 

Leave a comment