Last November I attended an Open Data Masterclass at Aberdeen University. I took notes as the day progressed and these are shared below, sometimes quite skeletally, sometimes more completely.
It was a very interesting and useful day and clarified and solidified my thinking on the relevance and significance of this to what my employer does online at present and where we’re headed.
Welcome and Introduction: Prof Pete Edwards
Pete explained the work of Aberdeen University in this area. He gave an interesting example of First Group and how they are committed to adopting open and linked data and are working with Aberdeen University in this area.
Welcome and Intro: Hanif Rahemtulla (Horizon Digital Economy Research, Nottingham)
There is a significant government drive to use Open Data. How are we to make sense of it?
The Open Data movement has many drivers. It started under Labour but now taking hold in the Coalition Government who are actually pushing it.
Transparency In Government: James Forrester – Cabinet Office Transparency Unit
Historically, lots of data have been released in PDF format. Opening it up makes it more accessible – eg by adding a graphical interface.
Open Knowledge Foundation make it open and usable through sites such as Where Does My Money Go.
We need to help people better understand the data. The Government publishes huge volumes of data on for example schools through inspection reports. Why should parents have to learn and understand government systems to understand what the data means? How do you choose a school? You need to understand the information available. Third party sites are opening up that data and presenting data in simple graphic fashion.
Channel 4 application – Mapumental. http://mapumental.channel4.com/ (NB access is currently limited and is firstly London-specific).
Select where you work, set your maximum desired commuting time and mix with house price and ‘scenicness’ from various sources. It shows where you could live, where meets your criteria – but in a way that matching three sets of raw numbers never could.
If we make more data available then more tools with more complex visualisations are possible.
- http://www.patientslikeme.com/ - share info on health issues.
- 18 Biggest Govt Depts in Whitehall. Set up smart meters to show electricity and gas usage. Competition to reduce energy with a prize. The winner: http://www.govspark.org.uk/ set up by 16 year-old.
- 17 year-old developed an iPhone app for alarms for getting up at last possible moment based on data such as travel times. Biggest app on itunes for several days.
- http://www.asborometer.com/ – showed gap between fear of crime and real crime stats based on location.
- Championed by Prime Minister. COINS data went live June 10. http://data.gov.uk/dataset/coins
Released > 120GB data through Bit Torrent. Different media responses. The Guardian built front end to allow better analysis. http://coins.guardian.co.uk/coins-explorer/search . The FT by comparison complained that it crashed their journalists MS Excel when he tried to load all 120GB.
There is now a Central Govt Transparency Board. Also local government – particularly England and Wales are doing well. The Scottish Govt are championing transparency although less through open data. Scottish Local Authorities – only two publishing open data. See http://openlylocal.com/councils/open?country=Scotland
The push – includes accountability; citizen choice;and the ‘Big Society’. Data needs to be findable, licenced, and in usable format.
An illustration: The Government Statistics unit refused to release open data. They published a PDF. A newspaper then asked an intern to retype it as copy, but that person unfortunately added a mistake. The headline which was based on the consequently-inaccurate data was completely wrong and very negative. The newspaper did publish a subsequent retraction – but it was buried on p29. Nobody saw it (compared to the front page headline). The statistics unit now see the case for publishing openly.
4,000+ datasets. Permission to reuse – very open license. Usable formats: CSV, API, Linked Data. The site has Featured Data examples.
There is a big community ethos on the site – wiki, discussion forums, blogs. There’s also a gallery of APPs created by community.
A Five Star Guide to making data available has been created by – Tim Berners Lee / Nigel Shadbolt. There is a link in the slideshow below. Citizens can request data from http://data.gov.uk.
What is coming next? More, better, data. Opening up the catalogue, allow community-submitted data, reduce barriers – easier to publish, easier to link (eg Gridworks – now http://code.google.com/p/google-refine/), choice in re-use.
Finally, remember, there are opportunities for volunteers to help charities to make use of the data.
Linked Data – Data on the Web: Richard Wallis / Tallis
There are lots of examples. Mostly graphic – data hidden beneath rich interfaces. Linked data is an approach to publishing data. It is easy to publish, easy to consume. It goes further than CSV. It covers both the data model and practices. But it requires extra technology today.
It is not about publishing data per se. It is a way of describing things and connecting things on the web. Web of Data is a semantic web. eg Book is linked to Author and the link has meaning (is author).
There is a massive web of linked data centred on DBpedia. (http://dbpedia.org/About)
There are some significant global organisations using linked data behind firewalls (and often linking to open external data sets) to drive business intelligence.
The BBC Natural History site – which is built on RDF, assisted by Tallis. eg Mallard duck entry. RDF explains relationships between things. http://www.bbc.co.uk/nature/life/Mallard
Tallis BIS example was built in three weeks. Mixes several data sources and present in accessible fashion.
Comparison: RDF is to Linked data is as HTML was for webpages 1995. It is almost the opposite of RDBMS – not stand alone systems. Makes data part of the web.
You still need to decide who to trust – anyone can say anything.
http://www.slideshare.com/mmmmmrob get this slide show
Challenges, data and tools – Ordnance Survey: Ian Holt, Ordnance Survey
This is a free service – you can build a site and service about 250 people per day using site (approx. 65,000 tiles per day)
The Webmap Builder offers easy introduction – wizard-driven
Gallery of examples. Developer Community forum. Very active.
API offers huge feature set.
Some great OS Data Mash ups.
Geovation Challenge: Chris Parker ordnance Survey
Covers climate change challenges – public, private, civil society, communities and individuals.
Link to NESTA.
Do’s and Don’ts of opening up data.
Innovation projects which focus on people’s needs 70% success rate – deliver double those that don’t.
Geovation challenge process.
http://www.geovation.org.uk/geovationchallenge/ – win a slice of £150,000
Deadline was mid Feb 2011.
See http://www.dragontail.co.uk – re-programme the gritting routes used by council.
There are now Linked Data Browsers – eg http://linksailor.com also Tabulator, Sheaflight, Object View
In the afternoon we worked on three projects which involved obtaining open data sets, filtering those to isolate specific relevant data and converting that to online mapped systems.
We used QGIS (an open source GIS system – http://www.qgis.com) for local viewing and manipulation of mapped and GEOCommons (http://www.geocommons.com ) to convert to mapped formats and to publish for public consumption.
You can find the output from my completing one of the exercises here: http://geocommons.com/maps/37099 This was acheived in less than an hour – taking a huge full-UK data source, isolating the Aberdeen data, converting the format where appropriate, uploading and formatting that map.
I have the source material (running to 500MB+) and instructions for the exercises should anyone wish to try them.
All in all this was a very useful day and showed clearly that our take at work on Open Data is moving in the right direction and showing how we should look to improve further.
(Originally written Nov 2010)