Monday, 27 September 2010

Fun with metadata

Many – possibly all of us – have long lists of police officers on our websites, mostly those attached to Safer Neighbourhood Teams. They’re a pain to maintain – but our usage stats show that members of the public do find them useful.

Here’s one of ours:

http://www.northyorkshire.police.uk/index.aspx?articleid=1540

I’ve always felt the way we published ours was a bit unsophisticated. Essentially, each officer is an article managed and rendered by our content management system, but there’s no way to extract any meaningful information from the profile other than by manually copying-and-pasting.

In an effort to improve this, I spent a few hours rummaging around in our officer “contact” template. We use GOSS iCM, and our contact template looks something like this:

<h2><!--#Contact[CONTACT.NAME.TITLE]--> <!--#Contact[CONTACT.NAME.FIRSTNAME]--> <!--#Contact[CONTACT.NAME.LASTNAME]--></h2>

<div class="label">Address:</div>

<div class="value"><!--~Contact[Address]--></div>

I’ve cut it down for simplicity, but you get the idea. The things like #Contact[CONTACT.NAME.TITLE] are variables that the CMS reads from the individual officer article itself.

If the user views the page source, the HTML looks like this:

<h2>PC Ed Rogerson</h2>

<div class="label">Address:</div>

<div class="value">Harrogate Police Station North Park Road Harrogate North Yorkshire HG1 5PJ United Kingdom</div>

So the CMS knows that #Contact[CONTACT.NAME.TITLE] must be a rank (PCSO, Sergeant, whatever). But once the page is rendered into HTML, that “semantic” information is lost. The web browser just sees it as plain text (“PC” in this case).

What we want, is some way to tell Google (and any other web services that are interested) that Ed Rogerson is a named person, with the rank of PC, who works at Harrogate Police Station.

Google recently started incorporating “semantic” data into its search results in certain circumstances. Do a search for Tamara Drew (http://www.google.com/search?q=tamara+drew) and look for the IMDb result, but don’t click on it. You’ll see that Google has worked out who the director is, who the actors are, and even the average viewer rating.

This information is generated automatically, via metadata – and we can use exactly the same principle to tell Google more about our officers.

(Crime mapping was before my time, but I understand that some of the more advanced forces also populate maps.police.uk with team data via XML. This is basically the same principle, but I don’t think it is a consistent standard that is any use outside of TeamDB).

By adding metadata to our officer contact list, we can add useful information such as an individual’s contact phone number, even beat ward. One day – we’re not there yet! – a user could type their street name into Google, and it would bring up their local officer (not simply because that street name appears in the officer’s profile, but because it knows the geographic location of their beat).

The good news is that adding RDFa metadata is pretty straightforward. Using our code above as an example, here is the rendered HTML:

<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">

<h2><span property="v:title">PC</span> <span property="v:name">Ed Rogerson</span></h2>

<div class="label">Force:</div>

<div class="value"><span property="v:affiliation">North Yorkshire Police</span></div>

<div class="label">Address:</div>

<div class="value"><span property="v:address">Harrogate Police Station North Park Road Harrogate North Yorkshire HG1 5PJ United Kingdom</span></div>

</div>

The first div tells the browser that there is metadata on the page. Then each piece of information (rank, name, address) is marked with a span property in a tag. I also added an “affiliation” property in there, to tell Google that the individual works for North Yorkshire Police.

You can easily see how this works in the real world via Google’s Rich Snippets Testing Tool. For example:

Ignore the breadcrumb stuff (although that’s quite clever too) and you’ll see that Google now recognises Ed as a human being with the following information attached to him:

Person

title = PC
name = Ed Rogerson
affiliation = North Yorkshire Police
address = Harrogate Police Station North Park Road Harrogate North Yorkshire HG1 5PJ United Kingdom

This is all pretty new – if I had more time, I’m sure I could do a lot more with it. And at the moment Google is only using RDF in very specific circumstances, such as reviews, business directories, that sort of thing. But it does look likely that they will expand it in the future. There’s lots of useful documentation here:

http://www.google.com/support/webmasters/bin/topic.py?hl=en&topic=21997

In short: assuming you aren’t afraid to get your hands dirty with some fairly simple HTML, adding metadata is a really straightforward tweak that will – in the not too distant future – make your police officer databases much more useful.

1 comment:

  1. Tom,

    Thanks for adding your blog to the NPWMG blog. Do you know if NYP contribute their local policing data to CrimeMapper via the XML feed or manually via the TeamDB database? I thought most forces ended up being able to submit their data via XML in the end.

    Its interesting to debate if we should share metadata within associated RSS/XML feeds or within the page code (or both)!

    ReplyDelete