Blog

Nature’s Metadata for Web Pages

Tony Hammond

Tony Hammond – 2008 May 19

In Metadata

Well, we may not be the first but wanted anyway to report that Nature has now embedded metadata (HTML meta tags) into all its newly published pages including full text, abstracts and landing pages (all bar four titles which are currently being worked on). Metadata coverage extends back through the Nature archives (and depth of coverage varies depending on title). This conforms to the W3C’s Guideline 13.2 in the Web Content Accessibility Guidelines 1.0 which exhorts content publishers to “provide metadata to add semantic information to pages and sites”.

Metadata is provided in both DC and PRISM formats as well as in Google’s own bespoke metadata format. This generally follows the DCMI recommendationExpressing Dublin Core metadata using HTML/XHTML meta and link elements, and the earlier RFC 2731Encoding Dublin Core Metadata in HTML”. (Note that schema name is normalized to lowercase.) Some notes:

  • The DOI is included in the “dc.identifier” term in URI form which is the Crossref recommendation for citing DOI.
    • We could consider adding also “prism.doi” for disclosing the native DOI form. This requires the PRISM namespace declaration to be bumped to v2.0. We might consider synchronizing this change with our RSS feeds which are currently pegged at v1.2, although note that the RSS module mod_prism currently applies only to PRISM v1.2.
      • We could then also add in a “prism.url” term to link back (through the DOI proxy server) to the content site. The namespace issue listed above still holds.
        • The “citation_” terms are not anchored in any published namespace which does make this term set problematic in application reuse. It would be useful to be able to reference a namespace (e.g. “rel="schema.gs" href="..."“) for these terms and to cite them as e.g. “gs.citation_title“.
        The HTML metadata sets from an example landing page are presented below.

Word Add-in for Scholarly Authoring and Publishing

Last week Pablo Fernicola sent me email announcing that Microsoft have finally released a beta of their Word plugin for marking-up manuscripts with the NLM DTD. I say “finally” because we’ve know this was on the way and have been pretty excited to see it. We once even hoped that MS might be able to show the plug-in at the ALPSP session on the NLM DTD, but we couldn’t quite manage it.

prism:doi

Tony Hammond

Tony Hammond – 2008 February 22

In Metadata

The new PRISM spec (v. 2.0) was published this week, see the press release. (Downloads are available here.)

This is a significant development as there is support for XMP profiles, to complement the existing XML and RDF/XML profiles. And, as PRISM is one of the major vocabularies being used by publishers, I would urge you all to go take a look at it and to consider upgrading your applications to using it.

One caveat. There’s a new element <tt>prism:doi</tt> (PRISM Namespace, 4.2.13) which sits alongside another new element <tt>prism:url</tt> (PRISM Namespace, 4.2.55). Unfortunately the <tt>prism:doi</tt> element is shown to take DOI proxy URL as its value - and not the DOI string itself, e.g.

Crossref Citation Plugin (for WordPress)

OK, after a number of delays due to everything from indexing slowness to router problems, I’m happy to say that the first public beta of our WordPress citation plugin is available for download via SourceForge. A Movable Type version is in the works.

And congratulations to Trey at OpenHelix who became laudably impatient, found the SourceForge entry for the plugin back on February 8th and seems to have been testing it since. He has a nice description of how it works (along with screenshots), so I won’t repeat the effort here.

Having said that, I do include the text of the README after the jump. Please have a look at it before you install, because it might save you some mystification.

DC in (X)HTML Meta/Links

Tony Hammond

Tony Hammond – 2007 November 06

In Metadata

This message posted out yesterday on the dc-general list (with following extract) may be of interest:

_“Public Comment on encoding specifications for Dublin Core metadata in HTML and XHTML

2007-11-05, Public Comment is being held from 5 November through 3 December 2007 on the DCMI Proposed Recommendation, “Expressing Dublin Core metadata using HTML/XHTML meta and link elements” «http://dublincore.org/documents/2007/11/05/dc-html/» by Pete Johnston and Andy Powell. Interested members of the public are invited to post comments to the DC-ARCHITECTURE mailing list «http://www.jiscmail.ac.uk/lists/dc-architecture.html» , including “[DC-HTML Public Comment]” in the subject line. Depending on comments received, the specification may be finalized after the comment period as a DCMI Recommendation.”

OpenDocument Adds RDF

Tony Hammond

Tony Hammond – 2007 October 14

In Metadata

Bruce D’Arcus left a comment here in which he linked to post of his: “OpenDocument’s New Metadata System“. Not everybody reads comments so I’m repeating it here. His post is worth reading on two counts:

  1. He talks about the new metadata functionality for OpenDocument 1.2 which uses generic RDF. As he says:
> _&#8220;Unlike Microsoft’s custom schema support, we provide this through the standard model of RDF. What this means is that implementors can provide a generic metadata API in their applications, based on an open standard, most likely just using off-the-shelf code libraries.&#8221;_

This is great. It means that description is left up to the user rather than being restricted by any vendor limitation. (Ideally we would like to see the same for XMP. But Adobe is unlikely to budge because of the legacy code base and documents. It’s a wonder that Adobe still wants XMP to breathe.)

  * He cites a wonderful passage from Rob Weir of IBM (something which I had been considering to blog but too late now) about the changing shape of documents. Can only say, go read [Bruce’s post][2] and then [Rob’s post][3]. But anyway a spoiler here:

    > _&#8220;The concept of a document as being a single storage of data that lives in a single place, entire, self-contained and complete is nearing an end. A document is a stream, a thread in space and time, connected to other documents, containing other documents, contained in other documents, in multiple layers of meaning and in multiple dimensions.&#8221;_</ol>

    I think the ODF initiative is fantastic and wish that Adobe could follow suit. However, I do still hold out something for XMP. After all, nobody else AFAICT is doing anything remotely similar for multimedia. Where’s the W3C and co. when you really need them? (Oh yeah, [faffing][4] about the new [Semantic Web logo][5]. 😉

Scholarly DC

Tony Hammond

Tony Hammond – 2007 October 05

In Metadata

This This was just sent out to the DC-GENERAL mailing list about the new DCMI Community for Scholarly Communications. As Julie Allinson says:

“The aim of the group is to provide a central place for individuals and organisations to exchange information, knowledge and general discussion on issues relating to using Dublin Core for describing items of ‘scholarly communications’, be they research papers, conference presentations, images, data objects. With digital repositories of scholarly materials increasingly being established across the world, this group would like to offer a home for exploring the metadata issues faced.”

Custom Panel for CC

Tony Hammond

Tony Hammond – 2007 September 15

In Metadata

Creative Commons now have a custom panel for adding CC licenses using Adobe apps - see here.

Interesting on two counts:

  • Machine readable licenses
    • XMP metadata
    But I still think that batch solutions for adding XMP metadata are really required for publishing workflows. And ideally there should be support for adding arbitrary XMP packets if we’re going to have truly rich metadata. I rather fear the constraints that custom panels place upon the publisher.

Last Orders Please!

Tony Hammond

Tony Hammond – 2007 September 13

In Metadata

Public comment period on the PRISM 2.0 draft ends Saturday (Sept. 15) ahead of next week’s WG meeting to review feedback and finalize the spec.

(I put in some comments about XMP already. Hope they got that.)

The Second Wave

Tony Hammond

Tony Hammond – 2007 September 11

In Metadata

You might have been wondering why I’ve been banging on about XMP here. Why the emphasis on one vendor technology on a blog focussed on an industry linking solution? Well, this post is an attempt to answer that.

Four years ago we at Nature Publishing Group, along with a select few early adopters, started up our RSS news feeds. We chose to use RSS 1.0 as the platform of choice which allowed us to embed a rich metadata term set using multiple schemas - especially Dublin Core and PRISM. We evangelized this much at the time and published documents on XML.com (Jul. ’03) and in D-Lib Magazine (Dec. ’04) as well as speaking about this at various meetings and blogging about it. Since that time many more publishers have come on board and now provide RSS routinely, many of them choosing to enrich their feeds with metadata.

Well, RSS can be seen in hindsight as being the First Wave of projecting a web presence beyond the content platform using standard markup formats. With this embedded metadata a publisher can expand their web footprint and allow users to link back to their content server.

Now, XMP with its potential for embedding metadata in rich media can be seen as a Second Wave. Media assets distributed over the network can now carry along their own metadata and identity which can be leveraged by third-party applications to provide interesting new functionalities and link-back capability. Again a projection of web presence.

(Continues.)