Archive for the ‘workshops’ Category

Research questions, abstract problems – a round table on Citizen Science

February 26, 2017

I recently participated in a round-table discussion entitled “Impossible Partnerships”, organized by The Cultural Capital Exchange at the Royal Institution, on the theme of Citizen Science; the Impossibe Partnerships of the title being those between the academy and the wider public. It is always interesting to attend citizen science events – I get so caught up in the humanities crowdsourcing world (such as it is) that it’s good to revisit the intellectual field that it came from in the first place. This is one of those blog posts whose main aim is to organize my own notes and straighten my own thinking after the event, so don’t read on if you are expecting deep or profound insights.


Crucible of knowledge: the Royal Institution’s famous lecture theatre

Galaxy Zoo of course featured heavily. This remains one of the poster-child citizen science projects, because it gets the basics right. It looks good, it works, it reaches out to build relationships with new communities (including the humanities), and it is particularly good at taking what works and configuring it to function in those new communities. We figured that one of the common factors that keeps it working across different areas is its success in tapping in to intrinsic motivations of people who are interested in the content – citizen scientists are interested in science. There is also an element of altruism involved, giving one’s time and effort for the greater good – but one point I think we agreed on is that it is far, far easier to classify the kinds of task involved, rather than the people undertaking them. This was our rationale in that 2012 scoping study of humanities crowdsourcing.

A key distinction was made between projects which aggregate or process data, and those which generate new data. Galaxy Zoo is mainly about taking empirical content and aggregating it, in contrast, say, to a project that seeks to gather public observations of butterfly or bird populations. This could be a really interesting distinction for humanities crowdsourcing too, but one which becomes problematic where one type of question leads to the other. What if content is processed/digitized through transcription (for example), and this seeds ideas which leads to amateur scholars generating blog posts, articles, discussions, ideas, books etc… Does this sort of thing happen in citizen science (genuine question – maybe it does).  So this is one of those key distinctions between citizen science and citizen humanities. The raw material of the former is often natural phenomena – bird populations, raw imagery of galaxies, protein sequences – but in the latter it can be digital material that “citizen humanists” have created from whatever source.

Another key question which came up several times during the afternoon was the nature of science itself, and how citizen science relates to it. A professional scientist will begin an experiment with several possible hypotheses, then test them against the data. Citizen scientists do not necessarily organize their thinking in this way. This raises the question: can the frameworks and research questions of a project be co-produced with public audiences? Or do they have to be determined by a central team of professionals, and farmed out to wider audiences? This is certainly the implication of Jeff Howe’s original framing of crowdsourcing:

“All these companies grew up in the Internet age and were designed to take advantage of the networked world. … [I]t doesn’t matter where the laborers are – they might be down the block, they might be in Indonesia – as long as they are connected to the network.

Technological advances in everything from product design software to digital video cameras are breaking down the cost barriers that once separated amateurs from professionals. … The labor isn’t always free, but it costs a lot less than paying traditional employees. It’s not outsourcing; it’s crowdsourcing.”

So is it the case that citizen science is about abstract research problems – “are golden finches as common in area X now as they were five years ago?” rather than concrete research questions – “why has the population of golden finches declined over the last five years?”

For me, the main takeaway was our recognition citizen science and “conventional” science is not, and should not try to be, the same thing, and should not have the same goals. The important thing in citizen science is not to focus on the “conventional” scientific out comes of good, methodologically sound and peer-reviewable research – that is, at most, an incidental benefit – but on the relationships between professional academic scientists and non-scientists it creates; and how these can help build a more scientifically literate population. The same should go for the citizen humanities. We can all count bird populations, we can all classify galaxies, we call all transcribe handwritten text, but the most profitable goal for citizen science/humanities is a more collaborative social understanding of why doing so matters.


CAA1 – The Digital Humanities and Archaeology Venn Diagram

April 1, 2012

The question  ‘what is the digital humanities’ is hardly new; nor is discussion of the various epistemologies of which the digital humanities are made. However, the relationship which archaeology has with the digital humanities – whatever the epistemology of either – has been curiously lacking. Perhaps this is because archaeology has such strong and independent digital traditions, and such a set of well-understood quantitative methods, that the close analysis of of those traditions – familiar to readers of Humanist, say –  seem redundant. However, at the excellent CAA International conference in Southampton last week, there was a dedicated round-table session on the ‘Digital Humanities/Archaeology Venn Diagram’, in which I was a participant. This session highlighted that the situation is far more nuanced and complex that it first seems. As is so often the case with digital humanities.

A Venn Diagram, of course, assumes two or more discrete groups of objects, where some objects contain the attributes of only one group, and others share attributes of multiple groups. So – assuming that one can draw a Venn loop big enough to contain the digital humanities – what objects do they share with archaeology? As I have not been the first to point out, digital humanities is mainly concerned with methods. This, indeed, was the basis of Short and McCarty’s famous diagram. The full title of CAA – Computer Applications and Quantitative Methods in Archaeology – suggests that a methodological focus is one such object shared by both groups. However unlike the digital humanities, archaeology is concerned with a well defined set of questions. Most if not all, of these questions derive from ‘what happened in the past?’. Invariably the answers lie, in turn, in a certain class of material; and indeed we refer to collectively to this class as ‘material culture’.  And digital methods are a means that we use to the end of getting at the knowledge that comes from interpretation of material culture.

The digital humanities have much broader shared heritage which, as well as being methodological, is also primarily textual. This fact is illustrated by the main print publication in the field being called Literary and Linguistic Computing. It is not, I think, insignificant as an indication of how things have moved on that that a much more recently (2007)  founded journal has the less content-specific title Digital Humanities Quarterly. This, I suspect, is related to the reason why digitisation so often falls between the cracks in the priorities of funding agencies: there is a perception that the world of printed text is so vast that trying to add to the corpus incrementally would be like painting the Forth Bridge with a toothbrush (although this doesn’t affect my general view that the biggest enemy of mass digitisation today is not FEC or public spending cuts, but the Mauer im Kopf that form notions of data ownership and IPR). The digital humanities are facing a tension, as they always have, between variable availability of digital material, and the broad access to content that any porting over to the ‘digital’ that the word ‘humanities’ implies. As Stuart Jeffrey’s talk in the session made clear, the questions facing archaeology are more about what data archaeologists throw away: the emergence of Twitter, for example, gives an illusion of ephemerality, but every tweet adds to the increasing cloud of noise on the internet; and those charged with preserving the archaeological record in digital form must decide where where the noise ends and the record begins.

There is also the question of what digital methods *do* to our data. Most scholars who call themselves ‘digital humanists’ would reject the notion that textual analysis, which begins with semantic and/or stylometric mark-up is a purely quantitative exercise; and that qualitative aspects of reading and analysis arise from, and challenge, the additional knowledge which is imparted to a text in the course of encoding by an expert. However, as a baseline, it is exactly the kind of quantitative  reading of primary material which archaeology – going back to the early 1990s – characterized as reductionist and positivist. Outside the shared zone of the Venn diagram, then, must be considered the notions of positivism and reductionism: they present fundamentally different challenges to archaeological material than they do to other kinds of primary resource, certainly including text, but also, I suspect, to other kinds of ‘humanist’ material as well.

A final point which emerged from the session is the disciplinary nature(s) of archaeology and the digital humanities themselves. I would like to pose the question as to why the former is often expressed as a singular noun whereas the latter is a plural. Plurality in ‘the humanities’ is taken implicitly. It conjures up notions of a holistic liberal arts education in the human condition, taking in the fruits of all the arts and sciences in which humankind has excelled over the centuries. But some humanities are surely more digital than others. Some branches of learning, such as corpus linguistics, lend themselves to quantitative analysis of their material. Others tend towards the qualitative, and need to be prefixed by correspondingly different kinds of ‘digital’. Others are still more interpretive, with their practitioners actively resisting ‘number crunching’. Therefore, instead of being satisfied with ‘The Digital Humanities’ as an awkward collective noun, maybe we could look to free ourselves of the restrictions of nomenclature by recognizing that can’t impose homogeneity, and nor should we try to. Maybe we could even extend this logic, and start thinking in terms of ‘digital archaeologies’; of branches of archaeology which require (e.g.) archiving, communication, semantic web, UGC and so on; and some which don’t require any.  I can’t doubt that the richness and variety of the conference last week is the strongest argument possible for this.

CeRch seminar: Webometric Analyses of Social Web Texts: case studies Twitter and YouTube

October 24, 2011

Herewith a slightly belated report of the recent talk in the CeRch seminar series given by Professor Mike Thelwell of Wolverhampton University. Mike’s talk, Webometric Analyses of Social Web Texts: case studies Twitter and YouTube concerned getting useful information out of social media, primarily social science means: information, specifically, about the sentiment of the communications on those platforms. His group produces software for text based information analysis, making it easy to gather and process large scale data, focusing on Twitter, YouTube (especially the textual comments), and the web in general and the Technorati blog search engine, also Bing. This shows how a website is positioned on the web, and gives insights as to how their users are interacting with them.

In sentiment analysis, a computer programme reads text and predicts whether it is positive or negative in flavour; and how strongly that positivity or negativity is expressed. This is immensely useful in market research, and is widely employed by big corporations. It also goes to the heart of why social media works – they function well with human emotions, and tracks what role sentiments have in social media. The sentiment analysis engine is designed for text that is not written with good grammar. At its heart is a list of 2,489 terms which are either normally positive or negative. Each has a ‘normal’ value, and ratings of -2 – -5. Mike was asked if it could be adapted to slang words, which often develop, and sometime recede, rapidly.  Experience is that it copes well with changing language over time – new words don’t have a big impact in the immediate term. However, the engine does not appear to work with sarcastic statements which, linguistically, might have diction opposite to its meaning, now with (for example) ‘typical British understatement’. This means that it does not work very well for news fora, where comments are often sarcastic and/or ironic (e.g. ‘David Cameron must be very happy that I have lost my job’). There is a need for contextual knowledge – e.g. ‘This book has a brilliant cover’ means ‘this is a terrible book’, in the context of the phrase don’t judge a book by its cover. Automating the analysis of such contextual minute would be a gigantic task, and the project is not attempting to do so.

Mike also discussed the Cyberemotions project. This looked at peaks of individual words in Twitter, e.g. Chile, when the earthquake struck in February 2010. As might be expected, positivity decreased. But negativity increased only by 9%: it was suggested that this might have been to do with praise for the response of the emergency services, or good wishes to the Chilean people. Also, the very transience of social media means that people might not need to express sentiment one way or another. For example, simply mentioning the earthquake and its context would be enough to convey the message the writer needed to convey. Mike also talked about the sentiment engine’s analysis of YouTube. As a whole, most YouTube comments are positive, however those individual videos which provoke many responses are frequently negatively viewed.

Try the sentiment engine (www. One wonders if it might be useful in XML/RDF projects such as SAWS, or indeed to book reviews on publications such as

Digital Classicist: Classical studies facing digital research infrastructures: from practice to requirements

July 11, 2011

Apologies are due to Agiatis Bernardou. I am a couple of weeks late posting my discussion of her paper in the Digital Classicist Seminar Series, Classical studies facing digital research infrastructures: from practice to requirements. Agiati is from the Digital Curation Unit, part of the “Athena” Research Centre, and her talk focused in the main on the preparatory phase of DARIAH, the European Arts and Humanities Research Infrastructure project. She began by outlining her own research background in Classics, which contained very little computing (it surely can’t be coincidence that the digital humanities is so full of former and practicing archaeologists and classicists).

DARIAH is technical and conceptual project. With the aim of providing  a research infrastructure for the Arts and Humanities across Europe. In practice, it is an umbrella for other projects, involving a big effort in the areas of law and finance, as well as technical infrastructure. A key part of this is to ensure that scholars in the arts and humanities are supported at each stage of the research lifecycle. This means ensuring that the requirements at each stage are understood. The DCU was part of the technical workpackage in DARIAH, and was tasked with doing this. Its approach was to develop a conceptual framework to map user requirements using an abstract model to represent the information practices within humanities research.

This included an empirical study of scholarly research activity. The main form of data collection was interviews with humanities scholars. The design of the study included transcription, coding and analysis of recordings of these interviews.  Context was provided by a good deal of previous work in this area, in the form of user studies of information browsing behaviour. In the 1980s, this carried the assumption that most humanists were ‘lone scholars’, with little interest in, or need for, collaborative practices. This however gave way to an increasingly self-critical awareness of how humanists work, highlighting practices such as annotation, which *might* be for the consumption of the lone scholar, which equally might be means for communication interpretation and thinking. This in turn led to a consideration of Scholarly primitives – low level, basic things humanities do both all the time and – often – at the same time. Agiatis cited the six types of information retrieval behaviour identified by D. Ellis, as revisited for the humanities by John Unsworth: Discovering, associating, comparing, referring, sampling, illustrating and representing.

The DCU’s aim was to produce a map of who does what and how. If one has a  research goal, for example to produce a commentary of Homer, what are the scholarly activities that one would need to achieve that, and what processes do those activities involve. To this end, Agiatis highlighted the following aspects that need to be mapped: Actor (researcher), Research activity, Research goal, information object, tool/service, format, and resource type.  The properties that link these include hasType, Creates, partOf, Searches, refersTo and Scholarly Activity.

A meaningful map of these processes must include meaningful descriptions of information types. DARIAH therefore has to embrace multiple interconnected objects, that need to be identified, represented, and managed, so they can be curated and reached throughout the digital research lifecycle. In this regard, there is a distinction that is second nature to most archaeologists,  between the visual representation of information, and hands-on access to objects.

The main interest of Agiati’s paper for me was the possibilities the DCU’s approach holds for specific research problems. One could easily see, for example, how the Methods Taxonomy could be better represented as a set of processes rather than as a static group of abstract entities, as it is at the moment. But if one could specify the properties of a particular purpose, the approach would be even more useful: for example one could test the efficacy of augmented reality by mapping the ways scholars engage with and use AR environments.

End of project MiPP workshop

July 10, 2011

At the closing MiPP project in Sussex last week. Due to a concatenation of various cirumstances, I had to take a large broomstick, which will be used in next week’s motion capture exercises at Butser Farm and in Sussex, on a set of trains from Reading, via the EVA London 2011 conference in Central London, to the workshop in Falmer, Sussex. Given this thing is six feet tall and required its own train seat (see picture), I got a variety of looks from my fellow passengers, especially on the Underground, ranging from suspicion to pity to humour. Imagined how one might have handled a conversation: ‘There’s a logical explanation. Yes, it’s going to be used as a prop in an experiment to test the environment of Iron Age round houses in cyberspace versus the real thing in the present day.’ ‘Oh yes? And that’s your idea of a logical explanation is it?’

Of course I could have really freaked people out be getting off the train at Gatwick Airport and wandering around the terminal, asking for directions to the runway.

As with the entire MiPP project, the workshop was highly interdisciplinary. A varied set of presentations included ones from Bernard Frisher of the University of Virginia, on digital representation of sculpture, and from colleagues at Southampton on the fantastic PATINA project. All of which coalesced  around questions of process, and how we represent it. Tom Frankland’s presentation on studying archaeological processes, including such offsite considerations as the difference between note taking in the lab and in the field, filled in numerous gaps of documentation that our work at Silchester last summer left.

When I got to my feet on day to two present, I veered slightly off my promised topic (as with most presentations I have ever given) and elected instead to reflect on the nature remediated archaeological objects. I would suggest that there is a three-way continuum on which any digital object representing an archaeological artefact or process may be plotted: the empirical, the interpretive and the conjectural. An empirical statement, such as Dr. Peter Reynolds, the founder of Butser Farm would have approved, might state that ‘the inner ring of this round house comprised of twelve upright posts, because we can discern twelve post holes in ring formation’.  An interpretative conclusion might be built on top of this stating that, because ceramic sherds were found in the post hole, cooking and/or eating took place near to this inner ring. This could in turn lead to a conjecture that a particular kind of meat was cooked in a particular way at this location, based not on interpretation or empirical evidence immediately to hand, but on the general context of the environment, and on what is known more broadly about Iron Age domestic practice.

More on all this next week, after capture sessions at Butser.