The writeups of these solutions have been provided by Immo Hüneke of Zühlke. Many thanks Immo!
Although there were three in our pair, we failed to complete the challenge. We had a reasonably plausible approach, which failed however – probably because we were trying to use too many unfamiliar components.
Our intended approach was to use an analysis tool to spit out a CSV file, which we would then pull into Excel to plot the graph.
Checkstyle seemed like a good tool for obtaining code metrics, but was difficult to get productive with it - Immo had used it before but only as a pre-installed plugin under IntelliJ (not available in this context).
JDepend was much easier to install and use, but unfortunately it works only on compiled classes, which wasn't obvious from the beginning.
http://metrics.sourceforge.net/ is the most promising. It produces the numbers we want and can be run outside Eclipse as a standalone Ant task.
As a result of spending too long finding ways to get the basic numbers, we were unable to integrate the three main components of the solution.
There are actually tools available commercially that come very close to achieving what we want. See http://www.cenqua.com/fisheye. This links to a Subversion or CVS repository (many others supported) and provides graphical views of the code size, volatility and so on - what is currently missing is quality metrics, unfortunately.
Subversion repositories used by the different teams turned out to be a major bottleneck.
We got a basic solution working but then tried to enhance it. The approach involved using HTML-scraping to obtain the basic horoscope text (from Yahoo). Unfortunately after a half-dozen tests of the solution, Yahoo stopped answering our requests.
From there, it was easy to put the text into an on-line text summariser such as http://textmining.i2r.a-star.edu.sg/people/kanagasa/ts/cgi-bin/sumnew.pl or http://swesum.nada.kth.se/index-eng.html to get a one-line summary.
However, Nat changed the requirements by consultation with one of our team, so that he now wanted an overall plus/minus verdict. So then we were into analysing frequencies of words in the horoscope page and using Wordnet database to categorise the most frequently occurring non-stopwords.
Needless to say, we ran out of time. But here's a good reference to a recipe-book of text manipulation using Unix tools: http://www.dsl.org/cookbook/cookbook_16.html
We found this site http://www.larnercorp.com/trumps/ allows you to enter the information to create your own top trump cards. I registered and created a card as an experiment, but we didn't fancy trying to screen-scrape the form to generate the cards.
Second attempt - split the problem into two (data capture and card generation). Rusty PHP scripting skills were exhumed to create a HTML template that looked reasonably OK (attached). We started with a sample CSV file with just two entries.
Time ran out though to scrape all the information off the Web automatically - hats off to the wizard teams who managed to use search engines and regex processing to pull significant information about all the participants.
At least we didn't run into problems with search engines blocking us!