Friday, 23 August 2013

Venn diagrams

I'd always thought that drawing Venn diagrams were quite trivial, until I needed to create some recently. They are trivial, if we only have 2 equal-sized sets. If we have 3 sets, and we want the circle sizes to represent the number of data items in each set, then the layout algorithm is more complex. Luckily there's a great package for Python called matplotlib-venn which does exactly what I wanted.

However, if I have 4 sets, then it gets more complicated (and isn't yet handled by matplotlib-venn). A Venn diagram must show all possible intersections. Venn used overlapping ellipses to show how this could be achieved.

Diagram by RupertMillard from Wikimedia Commons

This is where Venn diagrams differ from Euler diagrams. Euler diagrams don't show empty intersections, so they can look much simpler than Venn diagrams, and can contain fully-nested circles. There are Venn diagrams that can represent all the overlaps of 5 and 6 sets, but we'd end up with some extremely complex diagrams that don't really aid the visualisation of our data.

Update (9 Oct 2013): Here's a Javascript/D3.js interactive version with more thoughts on why it's a difficult problem.

Update (20th March 2014): Here are a couple of completely over the top diagrams: a pine tree and a banana. Venn or Euler?


  1. Nice thoughts Amanda!

    We recently proposed a new visualization technique for overlapping sets that is more scalable than Venn / Euler diagrams:

    Hope you find it useful, and would be happy to hear your comments!

  2. Yes, that's interesting (and a helpful video), and seems sensible. It does look a bit like a hairball, but then visualising so many overlaps will always look complex. Also looks like it displays pairwise overlaps well, but adding the higher degree overlaps complicates the picture.

  3. Thanks for your feedback Amanda!

    I agree about both points and am trying to finds both computational and visual ways to deal with the hiarball and the clutter.

    I will research what the DM comunity did in this regard, and would be grateful about any pointers you become aware of.