There’s an interesting discussion going on on the Forum for Information Heritage Standards in Heritage list, on which I have been lurking and keeping one eye, concerned with the theme of standardizing grey literature. For the non cognoscenti, grey literature records are reports about heritage objects and activities (at least in this context, but the term is known in other areas too), especially archaeological excavations, which are not widely available and therefore not widely read. The thread has been started by the data standards section of English Heritage, with the aim of establishing how the heritage community might go about standardizing the reporting process, and thereby making the grey literature more accessible. Numerous approaches have been discussed – such as Catherine Hardman of the ADS mentioning the A&H e-Science programme-funded Archaeotools project, which uses natural language processing to index grey lit. on the basis of what, where and when entities after it has been deposited – although,  of course, this depends on the resource being digital in the first place (or has been digitized), which, of course, it  may not have been — the virtues of paper record keeping have been aired in the discussion, and clearly no one is suggesting doing away with it.

My own thoughts: the word ‘literature’ of course implies a fundamentally non-digital way of doing things. Lief Isaksen has raised, on the list, the importance of  ‘grey data’, and I think this raises fundamental issues of *how the reports are compiled*, or rather how they could or should be. As I and others have discussed elsewhere, the process of gathering data in archaeology and heritage is faced with many new digital opportunities: the good old VERA project at Reading being a case in point. In many cases, perhaps we should be thinking of some elements of grey literature – and only some, before anyone writes any angry comments – those reports which document projects which are already gathering significant amounts of data digitally – should be seen as documentation and interpretation of that data, drawing perhaps on some of the good practice models of the old AHDS. This would enable the depositor to ‘tie’ the report to whatever format the data may be in – photos, GIS/GPS points, spreadsheets, etc. ‘Standardization’, such as it is, could then be drawn from schema types such as RDF. In such cases, why go through the fundamentally ‘literary’ process of compiling a grey literature report, when something much more lightweight could and should be possible in the digital age?


    Stuart, on reading this and the FISH thread i am thinking of the guest blogs that Ant Beck wrote for Open Knowledge Foundation, working forwards from his talk at 2010’s OKCon. follow the links for parts II and III.

    This talks in part about recording and analysing observation as it happens, technologically, during an archaeological dig, how GIS work is done nearer to the time and place of investigation and if shared sooner and more widely, could deeply influence later interpretation. Less “decoupled synthesis” and more preservation of the sources and also, as you pointed out in an earlier blog, the methods needed to re-create the process that the sources go through during interpretation…

    Not sure if “grey data” is a helpful term, perhaps it will be, it seems to have wider context than grey literature, e.g. stretching all the way through institutions themselves, actively or passively present as an artefact of how things are run and managed. The concern here is you end up with “shadow systems”, that is to say, no externally-observable data at all, transported by less traceable and recordable media like phone calls, in reaction to an increased peer pressure towards openness…

    Jo, many thanks for the comment, and for pinging Ant’s excellent reviews of STARS and DART. I would say all data is ‘grey’ to one extent or another and, at risk of stretching the metaphor too far, maybe there is more than one shade of grey to contend with.

    Two quick responses – one key thing this highlights is the assumption in this discussion that *all* information in grey literature is in some way equally valid, and should be preserved. Anyone whose desk, like mine, is characterized by high-rise tower blocks of documents, ‘filed’ there on some some vague expectation that I might need to look back at it one day, will be able to relate to this mentality. However, as Ant points out in the posting, grey lit is ‘tertiary data (an extraction of synthetic data derived from the primary record)’, which highlights that we approach it much as we have always approached university libraries – ‘universal’ collections of prepared scholarly material, there for all to mine using tools we give them (with a nod to the fact that much grey literature is not produced in universities at all). This sounds great, but then I look at those piles on my desk, and wonder if the assumption that EVERYTHING needs to be preserved might bear some re-examination, and if instead we might use standards in grey literature to filter the thigns we need to keep versus the things we don’t.

    The second thing – and in retrospect, I’m surprised that this didn’t figure more prominently in the FISH discussion – is the issue of how trustworthy this stuff is. No aspersions at all being cast on the authors of the reports or their organizations, but as has been pointed out, these are highly focused and specific documents, compiled for very specific and prescribed reasons, and without the kind of heavy duty peer review of material you get in conventionally published material. Ant mentions the example of a search criterion in STARS of ‘Post holes that contain ritual deposits’. In many contexts, especially prehistoric ones, ascribing ‘ritual’ characteristics to sites, features or artefacts is frequently tentative and not universally agreed by other specialists. Might any kind of structured grey literature system risk ‘hard coding’ such assumptions into the discourse and giving them undue authority (which in most cases, the authors would never have intended)?

