Filed under: Engaging audiences, Kew website, Making stuff happen, Networked Projects, Open Data | Tags: applications, Biodiversity, opendata
As the quantity of data available online reaches ever greater volumes, particularly structured or ‘linked’ data, questions of what value can be derived from that data, and how much that might be, are increasingly interesting.
Working at the Royal Botanic Gardens, Kew, I’m particularly interested in biodiversity data, what it might be used for, by whom, and to what end. This is both an academic interest and a pressing need given the impending crisis that threatens biodiversity around the world.
Many people, including Professor Nigel Shadbolt, Professor of Artificial Intelligence at Southampton University, describe the supply of online data as a ‘superabundant’ deluge of information. In a paper on semantic responses, Professor Shadbolt and his colleagues estimated that the amount of data generated in 2010 would be around 1.2 million petabytes. To put that into some context: if you tried to read through this data, assuming an average reading speed of say 1,000 characters (or approx 1 kilobyte) per minute, then it would take you about 2 trillion years! So techniques of data mining (not a new term, it has been around for several decades) are increasingly essential in locating and making sense of this incredible data mountain.
Biodiversity data is a subset. There is no reliable estimate of the quantity of biodiversity data available, but it is huge – GBIF (the Global Biodiversity Information Facility) reported in 2010 that it has 216 million records of primary biodiversity data available through its portal, and it estimates that the data records available at partner institutions run into several billion. Kew alone has tens of millions of data records.
So what is the role of data mining in making sense out of the information we have recorded about the planet’s biodiversity?
At Kew, some important aspects of scientific discovery rely on identifying patterns from large data sets. Biodiversity data usually includes accurate location-based information (for example the location a specimen was collected), providing a powerful opportunity to mine data by location. A good example is the work conducted recently to assess the risk to plant life around the world, expressed as the Sampled Red List Index (SRLI) for plants. Researchers at Kew, the Natural History Museum, ZSL and the International Union for Conservation of Nature (IUCN) took a representative sample of 7,000 species from around the world. Using a combination of bespoke and existing tools such as Google Earth, they mined data from the partners’ collections, remote sensing data from satellites, and other sources such as GBIF to arrive at the final assessment.
Another important use of biodiversity data is to derive models that can be used to make ecological predictions, for example when modelling climate scenarios. Projects such as TRY work through a global partnership of institutions that provide primary biodiversity data, which is mined to derive traits used in these models.
More miners – and mines
In a growing number of cases, people are mining data sets that nominally have nothing to do with biodiversity, to reveal new information. A fascinating example is that of a citizen scientist from Maine who used tourist images from Flickr to track the migration journey of a humpback whale from Brazil to Madagascar, publishing her results in the Royal Society’s Biology Letters. With Facebook now reaching over 500 million people, there is bound to be some useful biodiversity information to mine.
From eBird to iSpot, there are no shortage of opportunities for citizen scientists to invest time in documenting biodiversity – and they are doing it in large numbers. Increasingly data providers are also finding ways of making their data available to these groups. In the UK, the National Biodiversity Network offers a set of web services that enable use of data by developers creating applications, and GBIF similarly offers a number of services into its global data. The Encyclopedia of Life (EOL) aggregates biodiversity data aimed at a broader range of audiences, for which there is now an API (application programming interface) that can be used to build apps.
Although we’re not inundated with applications based on this kind of data, there are signs that both specialists and amateurs are approaching the data more creatively. There are for example some good illustrations of what is possible using data visualisation such as species heatmaps, or Google Earth layers showing species distribution. And there are certainly developers keen to get their hands on new data. Take the realm of civil data, for example, where organisations like MySociety create hugely popular apps out of freely available data.
So why are there not more biodiversity apps? Well the data is certainly harder to decipher, and in some cases includes concepts that simply don’t make sense to a non-specialist. So perhaps closer partnership between data publishers and app developers might stimulate more activity – maybe in the form of hack days or so-called ‘crowd-sourced’ projects.
If this happened, what would they build? Perhaps field guides compiled on the fly for a user-defined region, food-chain or ecological modelling, visualisation of the effects of man-made structures such as roads to habitats? The possibilities may be endless, and in some cases could prove genuinely insightful.
The value of more diverse communities using this data may be in the serendipity that it creates. The example of whale tracking via Flickr is a case in point. Not only will different communities look to new data sets with which to combine the primary data (even perhaps social networks such as Facebook or Twitter), but they may also approach the problem from new angles.
A partnership of miners
Although I believe that getting a broader base of people interested in biodiversity data could have significant benefits, I suspect that the cutting edge of mining biodiversity data will remain with the specialists.
Without expert involvement, mining data can lead to misinterpretation or false conclusions, especially where the data is complex and opaque. Only within the bioinformatics communities do you find the combination of taxonomic, GIS and regional expertise needed to make major breakthroughs in understanding from this data. In fact, many of the potential apps imagined above would probably need expert input to create genuinely valuable products.
And there is an important footnote – the data does not digitise itself, curate itself and offer itself up for use. It is an expensive (although valuable) function to create and maintain usable datasets that can be mined. Kew and other institutions are having to consider how to ‘biocurate’* their data for future use.
But where I think we could all benefit is by creating more opportunities for citizen scientists, experts and the public to engage together with this critically important data.
- Mike Saunders -**
References & Notes
* Howe, D. et al. (2008) Big data: The future of biocuration. Nature 455, 47–50 for review of biocuration
** This article was originally published on nature.com blog, 7 March 2011.
Filed under: Engaging audiences, Garden Life, Kew website | Tags: AV, digital
As the Digital Team at Kew endeavor to produce more video and slide shows for the web in house, I decided (as a member of the team) that it was time to get myself out and about a bit more, to learn more about the tricks of the trade – in particular top hints and tips for making great rich media for the web.
As luck would have it, a week or two ago, Sound Delivery and Third Sector PR and Communications Network announced a free knowledge sharing workshop about making audio slide shows. Located at the Computer Club in the Aldgate area, the event took the format of a Q&A session with journalist Paul Kerley, the BBCs online audio slide show guru.
Paul was really open in sharing the knowledge and skills that he’s acquired over the years. So, for those of you out there like me, who are just starting out in audio/video production for the web, here’s some of Paul’s top tips for making fab audio slide shows:
1) For a successful slide show you need to source around 8-10 images per minute. The quality of images is really important. Use the best that you have.
2) For a 3 minute slide show you will need around 20 minutes of audio to edit from. Personal stories and first hand testimonials, with the interview questions edited out, work best.
3) Plan your slide show as much as possible before hand. Be clear in what you want to communicate and know your audience. Work out what questions you want to ask your subject, and have a list of photographs that you need to source and/or take.
4) Have a chat with your subject before you turn up to record the interview/audio. Get to know them a little bit, discuss the subject matter and find out the kind of things that they might want to talk about up front.
5) Brief your subject about what to expect in the recording session and advise them not to bring along a script to read out.
6) If you have sourced images already, make sure that you talk about these with your subject when conducting the interview.
7) When putting your slide show together, always start with the images and then build the story (the audio track) around them.
8 ) Make sure you include a variety of images in your slide show and remember that ‘relevance’ is key. It sounds obvious, but the images used should compliment the audio and help to tell the story.
9) If you need to drop in bits of organisation messaging, make sure that you do this as short snippets within the story. Resist building the slide show around messaging.
10) Remember that a good opening image is crucial to capture people’s attention online. A strong closing image is also important to help viewers remember your slide show.
11) Add music to your slide show if you can. Music is emotive and can help to enhance the impact of your slideshow and connect with your audience. I’ve added some links to music licensing providers that sometimes do deals for charities below.
Promoting your slideshow
12) Research and find your ‘subject matter allies’ online. Develop a network of mutually beneficial connections with sympathetic bloggers and mainstream media channels where possible. Popular blogs and mainstream media channels (like the BBC, the Guardian and TelegraphTV) are often seeking good audio/video content to use online.
13) Where possible, take advantage of topical trends by promoting your slide show when it’s most relevant to the media appetite. If you make a slide show about growing pumpkins, approach popular bloggers and contacts at mainstream media channels in time for Halloween.
- A Tour of Duty, by Paul Kerley
- FAB camp stories: ‘My sister is smiling again‘, by Sound Delivery
- Loving and living with Alzheimers, by Paul Kerley
As with all these things, I imagine it’s far more difficult to produce a great audio slide show in practice, than it may first appear in theory. But the team here at Kew are going to give it a really good go. Watch this space for upcoming Kew Media efforts!
- Claire Welsby -