Tuesday, 11 November 2014

Seth van Hooland and Ruben Verborgh: Linked Data for Libraries, Archives and Museums (2014)

Edition: Facet (2014)
Review number:1490

Linked Data has been a buzz word for a couple of years now, and is something which is consistently forming the topic of questions I receive at work. This is usually due to someone reading an advocacy post somewhere, which leads them to ask "Could we fix all our resource discovery problems by releasing our metadata in linked data form?" So this is a timely publication, even though much of it is just backing up the reasons I have for being cautious about advocating such an approach myself.

The authors clearly aim Linked Data for... at a non-technical librarian - and I am an IT professional who has worked in a library for most of my working life, so not their prime audience. Hooland and Verborgh are clearly working hard to deliver their material in the way which is most suited to their target group. So, for instance, while there is some description of how different approaches to metadata work (tabular, relational, structured, linked), this is really only enough to support a discussion of the pros and cons of the different methods. It is clear that this is not a book which advocates Linked Data for the sake of it, but wants to make it possible for readers to evaluate for themselves whether it is a good approach for a particular metadata collection. This is a refreshingly mature approach, as unthinking blanket promotion of the current buzz word technologies (from "Internet", "XML", to, more recently, "social" and "cloud") is one of the main reasons why people grow disillusioned, as the race to move to the new paradigm is run whether or not it is appropriate in an individual case.

The fast moving nature of the linked data community means, unfortunately, that parts of the book are already obsolete. Some of the case studies and useful websites discussed already lead to blank pages or errors, or to material which differs from the description of the text. This is inevitable in a book on this topic, but does reduce the usefulness and impact of the book.

Much of the book resonated strongly with my experience while working on projects considering using/producing linked data or actually creating it. It is clear that understanding and improving the metadata involved is absolutely key to a successful release of a linked data version of an existing data set, and so the main chapters are successively concerned with cleaning, reconciling, enriching, and publishing metadata. While the linked data is the motivating factor of the discussion, much of it is likely to be of interest to any data set manager who is looking to improve the metadata they hold. Each chapter is accompanied by a real world case study, which is useful as a pointer to how the more theoretical ideas can be implemented in a specific scenario. I did feel that some of the discussion which revolves around the use of specific software (for such tasks as enriching metadata) maybe was too tied to something which is unlikely to remain a constant, but in general the case studies are an excellent part of the book. In a few years, the software discussed may no longer be available, may have changed name, or (most likely) may have been changed and updated so that the discussion of it is not applicable any more. Any book which extensively references work under development or online has this problem, of course, so this is not a criticism specific to this book.

Having an IT rather than librarianship background, I did find that some things which I was already familiar with were treated in more detail than I needed, especially the slightly heavy-handed advocacy of REST as an API architecture in the final section on the publication of data. I suspect I would have felt this even if I wasn't already familiar with REST, so this was a rare instance of the authors of the book not getting the level of their discussion correct. This is such an overwhelming part of the publishing section that other issues which may be important (such as infrastructure requirements and the use of analytics) are basically ignored, which seemed to me to make this section less valuable. (The importance which the authors give publishing is perhaps indicated by the by the 44 pages they give it, as opposed to the more than 180 which is used to discuss the metadata aspects of linked data.)

Did the book help me to answer the questions that people throw at me? Probably not. But it does confirm that the caveats I have, which include data quality, the paucity of existing links inside the data, and the need to enrich data before publishing. The key question is why it is worth exposing a particular dataset, and the answer to this question must be to do with the value of use cases for the data and not because it's something everyone is doing. A good introduction for librarians, if falling a bit short in the final stages. I rate the book at 7/10.