RDFa and Drupal

Last year, I wrote a blog post title Drupal, the semantic web and search that outlined how search engines like Google and Yahoo! are getting increasingly hungry for structured data. It is no surprise, because if they could built a global, vertical search engine that, say, searches all products online, or one that searches all job applications online, they could disintermediate many existing companies.

More exciting to me is how search engines can help bootstrap the semantic web as they build out these vertical search engines, and the role that content management systems like Drupal get to play in this. Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal.

In my DrupalCon Boston keynote presentation last year, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envisioned a Drupal core CCK with the power to do just that.

In the year since Boston, the Drupal community has built exactly what I asked for. I was planning to show a video of their work in my keynote presentation at DrupalCon DC earlier this month. Unfortunately, I ran out of time before I could show it. However, it was shown in the "Semantic Web and Why You Should Care" session, and today Stephane Corlosquet posted all the details in the semantic web group on drupal.org. The video paints a picture of what is possible with today's Drupal technology, but also, what hopefully will become possible with Drupal core at some point. The prototypes in this video were built using contributed modules for Drupal 6. However, since last year, we have fields in core and we've already began putting some RDFa in core, too.

Ben Lavender produced the screencast, Josh Huckabee built the Exhibit view and Stephane Corlosquet built the SearchMonkey applications and the social network site. Other people that helped include Axel Polleres and Andreas Harth (creator of VisiNav). The work on both this video and the featured modules has been sponsored by DERI Galway, Harvard IIC and OpenBand.

Comments

Matt Farina (not verified):

RDFa adds weight to a page to make it more readable to machines. But, extra weight on a page causes pages to load a littler slower according to a blog post from Dries earlier this year at http://buytaert.net/faster-is-beter.

This leaves me wondering if RDFa should have an on/off switch and if RDF with an auto discover link is better.

March 16, 2009
scor (not verified):

There will be a switch. The weight of the RDFa markup within the HTML is negligible in most cases.

March 16, 2009
Wim Leers (not verified):

It's not as much the amount of HTML that makes a web page slow, but it's the components that it references that makes it slow. See my article about Drupal's page loading performance for details.

March 16, 2009
John Breslin (not verified):

Excellent post, thanks Dries!

March 16, 2009
Jim (not verified):
March 17, 2009
Cindy (not verified):

This is very exciting and inline with a project that I am aware of. It is described here:

http://www.wikieducator.org/OER_Collaboration_in_African_Agricultural_Ed...

They say "This project proposes to develop the technology to facilitate remixing of content between WikiEducator and the Connexions authoring platforms."

Do you see RDFa and Drupal in line for "remixing"? Could this project use something like Drupal as a prototype?

March 17, 2009
vango (not verified):

I'd also like to see the same interface whether you are mapping RDF terms from attributes in core, contributed modules, CCK fields or whatever.

November 21, 2009

Add new comment

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.