Archive for the ‘crowd-sourcing’ Category

Research questions, abstract problems – a round table on Citizen Science

February 26, 2017

I recently participated in a round-table discussion entitled “Impossible Partnerships”, organized by The Cultural Capital Exchange at the Royal Institution, on the theme of Citizen Science; the Impossibe Partnerships of the title being those between the academy and the wider public. It is always interesting to attend citizen science events – I get so caught up in the humanities crowdsourcing world (such as it is) that it’s good to revisit the intellectual field that it came from in the first place. This is one of those blog posts whose main aim is to organize my own notes and straighten my own thinking after the event, so don’t read on if you are expecting deep or profound insights.


Crucible of knowledge: the Royal Institution’s famous lecture theatre

Galaxy Zoo of course featured heavily. This remains one of the poster-child citizen science projects, because it gets the basics right. It looks good, it works, it reaches out to build relationships with new communities (including the humanities), and it is particularly good at taking what works and configuring it to function in those new communities. We figured that one of the common factors that keeps it working across different areas is its success in tapping in to intrinsic motivations of people who are interested in the content – citizen scientists are interested in science. There is also an element of altruism involved, giving one’s time and effort for the greater good – but one point I think we agreed on is that it is far, far easier to classify the kinds of task involved, rather than the people undertaking them. This was our rationale in that 2012 scoping study of humanities crowdsourcing.

A key distinction was made between projects which aggregate or process data, and those which generate new data. Galaxy Zoo is mainly about taking empirical content and aggregating it, in contrast, say, to a project that seeks to gather public observations of butterfly or bird populations. This could be a really interesting distinction for humanities crowdsourcing too, but one which becomes problematic where one type of question leads to the other. What if content is processed/digitized through transcription (for example), and this seeds ideas which leads to amateur scholars generating blog posts, articles, discussions, ideas, books etc… Does this sort of thing happen in citizen science (genuine question – maybe it does).  So this is one of those key distinctions between citizen science and citizen humanities. The raw material of the former is often natural phenomena – bird populations, raw imagery of galaxies, protein sequences – but in the latter it can be digital material that “citizen humanists” have created from whatever source.

Another key question which came up several times during the afternoon was the nature of science itself, and how citizen science relates to it. A professional scientist will begin an experiment with several possible hypotheses, then test them against the data. Citizen scientists do not necessarily organize their thinking in this way. This raises the question: can the frameworks and research questions of a project be co-produced with public audiences? Or do they have to be determined by a central team of professionals, and farmed out to wider audiences? This is certainly the implication of Jeff Howe’s original framing of crowdsourcing:

“All these companies grew up in the Internet age and were designed to take advantage of the networked world. … [I]t doesn’t matter where the laborers are – they might be down the block, they might be in Indonesia – as long as they are connected to the network.

Technological advances in everything from product design software to digital video cameras are breaking down the cost barriers that once separated amateurs from professionals. … The labor isn’t always free, but it costs a lot less than paying traditional employees. It’s not outsourcing; it’s crowdsourcing.”

So is it the case that citizen science is about abstract research problems – “are golden finches as common in area X now as they were five years ago?” rather than concrete research questions – “why has the population of golden finches declined over the last five years?”

For me, the main takeaway was our recognition citizen science and “conventional” science is not, and should not try to be, the same thing, and should not have the same goals. The important thing in citizen science is not to focus on the “conventional” scientific out comes of good, methodologically sound and peer-reviewable research – that is, at most, an incidental benefit – but on the relationships between professional academic scientists and non-scientists it creates; and how these can help build a more scientifically literate population. The same should go for the citizen humanities. We can all count bird populations, we can all classify galaxies, we call all transcribe handwritten text, but the most profitable goal for citizen science/humanities is a more collaborative social understanding of why doing so matters.

Sourcing GIS data

March 29, 2016

Where does one get GIS data for teaching purposes? This is the sort of question one might ask on Twitter. However while, like many, I have learned to overcome, or at least creatively ignore, the constraints of 140 characters, it can’t really be done for a question this broad, or with as many attendant sub-issues. That said, this post was finally edged into existence by a Twitter follow, from “Canadian GIS & Geomatics Resources” (@CanadianGIS). So many thanks to them for the unintended prod. The linked website of this account states:

I am sure that almost any geomatics professional would agree that a major part of any GIS are the data sets involved. The data can be in the form of vectors, rasters, aerial photography or statistical tabular data and most often the data component can be very costly or labor intensive.

Too true. And as the university term ends, reviewing the issue from the point of view of teaching seems apposite.

First, of course, students need to know what a shapefile actually is. A shapefile is the building block of GIS, the datasets where individual map layers live. Points, lines, polygons: Cartesian geography are what makes the world go round – or at least the digital world, if we accept the oft-quoted statistic that 80% or all online material is in some way georeferenced. I have made various efforts to establish the veracity of this statistic or otherwise, and if anyone has any leads, I would be most grateful if you would share them with me by email or, better still, in the comments section here. Surely it can’t be any less than that now, with the emergence of mobile computing and the saturation of the 4G smartphone market. Anyway…

In my postgraduate course, part of a Digital Humanities MA programme, on digital mapping, I have used the Ordnance Survey Open Data resources, Geofabrik, an on-demand batch download service for OpenStreetMap data, Web Feature Service data from Westminster City Council, and  continental coastline data from the European Environment Agency. The first two in particular are useful, as they provide different perspectives from respectively the central mapping verses open source/crowdsourced geodata angles. But in the expediency required of teaching a module, they main virtues are the fact they’re free, (fairly) reliable, free, malleable, and can be delivered straight to the student’s machine, or classroom PC (infrastructure problems aside – but that’s a different matter) – and uploaded to a package such as QGIS.  But I also use some shapefiles, specifically point files, I created myself. Students should also be encouraged to consider how (and where) the data comes from. This seems to be the most important aspect of geospatial within the Digital Humanities. This data is out there, it can be downloaded, but to understand what it actually *is*, what it actually means, you have to create it. That can mean writing Python scripts to extract toponyms, considering how place is represented in a text, or poring over Google Earth to identify latitude/longitude references for archaeological features.

This goes to the heart of what it means to create geodata, certainly in the Digital Humanities. Like the Ordnance Survey and Geofabrik, much of the geodata around us on the internet arrives pre-packaged and with all its assumptions hidden from view.  Agnieszka Leszczynski, whose excellent work on the distinction between quantitative and qualitative geography I have been re-reading as part of preparation for various forthcoming writings, calls this a ‘datalogical’ view of the world. Everything is abstracted as computable points, lines and polygons (or rasters). Such data is abstracted from the ‘infological’ view of the world, as understood by the humanities.  As Leszczynski puts is: “The conceptual errors and semantic ambiguities of representation in the infologial world propagate and assume materiality in the form of bits and bytes”[1]. It is this process of assumption that a good DH module on digital mapping must address.

In the course of this module I have also become aware of important intellectual gaps in this sort of provision. Nowhere, for example, in either the OS or Geofabrik datasets, is there information in British public Rights of Way (PROWs). I’m going to be needing this data later in the summer for my own research on the historical geography of corpse roads (more here in the future, I hope). But a bit of Googling turned up the following blog reply from OS at the time of the OS data release in April 2010:

I’ve done some more digging on ROW information. It is the IP of the Local Authorities and currently we have an agreement that allows us to to include it in OS Explorer and OS Landranger Maps. Copies of the ‘Definitive Map’ are passed to our Data Collection and Management team where any changes are put into our GIS system in a vector format. These changes get fed through to Cartographic Production who update the ROW information within our raster mapping. Digitising the changes in this way is actually something we’ve not been doing for very long so we don’t have a full coverage in vector format, but it seems the answer to your question is a bit of both! I hope that makes sense![2]

So… teaching GIS in the arcane backstreets of the (digital) spatial humanities still means seeing what is not there due to IP as well as what is.

[1] Leszczynski, Agnieszka. “Quantitative Limits to Qualitative Engagements: GIS, Its Critics, and the Philosophical Divide∗.” The Professional Geographer 61.3 (2009): 350-365.


To crowd-source or not to crowd-source

January 6, 2013

Shortly before Christmas, I was engaged in discussion with a Swedish-based colleague about crowd-sourcing and the humanities. My colleague – an environmental archaeologist – posited that it could be demonstrated that crowd-sourcing was not an effective methodology for his area. Ask randomly selected members of the public to draw a Viking helmet. You would get a series of not dissimilar depictions – a sort of pointed or semi-conical helmet, with horns on either side. But Viking helmets did not have horns.

Having recently published a report for the AHRC on humanities crowd-sourcing, a research review which looked at around 100 publications, and about the same number of projects, activities, blogs etc, I would say the answer to this apparent fault is: don’t identify Viking helmets by asking the public to draw them. Obvious as this may sound, it is in fact just an obvious example of a complex calculation that needs to be carried out when assessing if crowd-sourcing is appropriate for any particular problem. Too often, we found in our review, crowd-sourcing was used simply because there was a data resource there, or some infrastructure which would enable it, and not because there was a really important or interesting question that could be posed by engaging the public – although we found honourable exceptions to this. Many such projects contributed to the workshop we held last May, which can be found here. To help identify which sorts of problems would be appropriate, we have developed – or rather, since this will undoubtedly involve in the future, I should say we are developing – a four facet typology of humanities crowd-sourcing scenarios. These facets are asset type (the content or data forming the subject of the activity), process type (what is done with that content) task type (how it is done), and the output type (the thing, resource or knowledge produced). What we are now working on is identifying – or trying to identify – examples of how these might fit together to form successful crowd-sourcing workflows.

To put it in the terms of my friend’s challenge: an accurate image of a Viking helmet is not an output which can be generated by setting creative tasks to underpin the process of recording and creating content, and the ephemeral and unanchored public conception of what a Viking helmet looks like is not an appropriate asset to draw from. Obvious as this may sound, it hints that a systematic framework for identifying where crowd-sourcing will, and won’t, work, is methodologically possible. And this could, potentially, be very valuable as the humanities faces increasing interest from well-organized and well-funded citizen science communities such as Zooniverse (which already supports and facilitates several of the early success stories in humanities crowd-sourcing such as Ancient Lives and OldWeather).

This of course raises a host of other issues. How on earth can peer-review structures cope with this, and should they try to? What motivates the public, and indeed academics, to engage with crowd-sourcing? We hint at some answers. Transparency and documentation is essential for the former area, and we found that in the latter, most projects swiftly develop a core community of very dedicated followerswho undertake reams of work, but – possibly like many more conventional collaborations – finding those people, or letting them find you, is not always easy.

The final AHRC report is available: Crowdsourcing-connected-communities.