Drupal, the semantic web and search

All major search engines, including Google and Yahoo!, are moving aggressively trying to capture structured data. This isn't exactly a surprise because it provides tremendous opportunity. Let's take the example of product search. Imagine the web as a huge database of millions of products, and search engines like Google and Yahoo! giving you a rich set of controls to filter by price, availability, color, shipping cost, user ratings, and more. Wouldn't it be great to be able to search all the world's products from a single page with a single interface? I'd think so too.

It is waiting to happen; we just have to connect the dots. That is, we have to make Drupal emit structured information.

Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics, including product information. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal. As such, search engines can't pick it up as a product, and they'd fail to include it in their world-wide product database.

I first talked about the semantic web and Drupal in my DrupalCon keynote last year in Boston. In my presentation, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envision a Drupal core CCK with the power to do just that.

Here is another example. Imagine a standard Drupal node-type called 'job'. The fields in the job node-type would have RDF properties associated with them mapping to salary, duration, industry, location, and so on. Creating a new job posting on a Drupal site would generate RDFa that semantic search engines like Yahoo!'s SearchMonkey would pick up and the job would be included in their world-wide job database.

Technologies like this disintermediate so many existing websites and organizations that it makes my head spin. It is too great an opportunity for us to pass up on. By adding semantic technology to Drupal core, I think we can make a notable contribution to the future of the web.

This kind of technology is not limited to global search. On a social networking site built with Drupal, it opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.

I can has semweb in Drupal core?

Comments

Philippe Jadin (not verified):

How about a module that spits rdf into each node html?

The rdf can be simple property -> value pairs.

Have a UI the allows users to define the rdf property name for each CCK field, and a way to represent the value of each field.

This would not be complex, the structure is already there with CCK.

And also, provide a list of "sane" property names, so we don't end up filling the web with "articles", "article", "product", "item",... whose "cost", "price" or "value" cannot be understood by search engines.

October 15, 2008
Dries:

To obtain wide-spread adoption of RDFa in Drupal, and to untap the real potential, this ought to be part of Drupal core. This is not a contributed module play.

October 16, 2008
Olivier (not verified):

In a first iteration all the modules could do their best to integrate the semantic information in the html they output, RDFa and microformats are well suited for that.

I thought that microformats would have a great future but it seems it's somewhat stuck. There still is no product microformat http://microformats.org/wiki/product for example.

October 15, 2008
David Peterson (not verified):

Okay, I am a bit biased ;) but great post Dries.

Exposing your database (but only what you want) and linking it into the graph of things is an amazing opportunity. This extends and enhances the social graph [1] in any way you want. Now you can plug in *anything*!

I can't wait Dries. Can I have it now?

Cheers,

David

[1] http://blogs.zdnet.com/BTL/?p=5156

October 15, 2008
Manu Sporny (not verified):

Great post, Dries! Just a quick mention that not only is the RDFa community is behind Drupal in the adoption of RDFa, but we'd like to be actively involved and help in any way that would benefit the Drupal community.

We've been discussing with David Peterson on how we can help on the public RDFa mailing list:

http://lists.w3.org/Archives/Public/public-rdfa/2008Sep/0019.html
http://lists.w3.org/Archives/Public/public-rdfa/2008Oct/0000.html

If there is anything that we can do to help - please let us know, or feel free to discuss this further on the public-rdfa mailing list:

http://lists.w3.org/Archives/Public/public-rdfa/

-- manu

--
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches

October 15, 2008
Dries:

Manu, thanks! There are several ways you can help: (i) you can provide programming resources, (ii) you can provide funding so that a Drupal company can work on this, or (iii) can provide technical assistance/reviews.

October 16, 2008
riffraff (not verified):

albeit there is still no product microformat there is still hListing that has already a bit of deployment.
Though, I like RDFa, albeit I feel that providing a pure RDF representation could be even cooler, and then just link it in head

October 15, 2008
Benjamin Melançon (not verified):

Yes, you can!

First, congratulations for having it in drop. Now the world is catching up; it is time to put RDF back in Drupal core.

Second, here's the threads on the w3 mailing list that David Peterson started on the W3C RDF in XHTML mailing list for how to help Drupal implement useful FOAF and SIOC vocabularies as RDFa and continuing into October.

Third, scientists are very interested in RDF and Drupal, as evidenced at the Taxonomy code sprint sponsored by Encyclopedia Of Life. The science angle is where we're working now, for other people, but Agaric would have walked away from this job (even though we love our client) if we thought many hours of Semantic Web work wouldn't benefit anyone else. So this is very, very gratifying.

Fourth, Kingsley Idehen of OpenLink Software converted Drupal's database schema to RDF a year and a half ago, and could make a very good starting point.

[End of informative portion of post, now I'm just too excited to stop talking.]

The Semantic Web was always something that I thought was a good idea, but figured someone else would do it (and I intend to help bring democratic mass communication to the world, so modest goals isn't one of my qualities). So this is pretty recent understanding I'm going to try to share, to restate what the Semantic Web is all about: it's a web of meaning, allowing computers to easily know things about content and it's relationships where right now it just knows there's a document.

Because with specifications like FOAF and SIOC we're talking about the ultimate of open standards, the data won't be limited to use just by Google, Yahoo, Amazon, or Microsoft. So it's not just products and services that can have all this data exposed and super-intelligently searchable and filterable, but things we want to share for free and collaborate on also. Once the infrastructure is there, sharing data (rather than "just HTML") can be essentially free.

It hasn't happened yet, and that makes even people who have long been working on semantic web projects a bit nervous. "The web worked and took off because it was simple," more than one such person has in effect said to me. "Any kid could put together an HTML page. RDF is not... easy."

Is the answer obvious to everyone yet?

Linked data may not be the driver of much adoption of Drupal, not right away. Instead, it will be (another) gift from Drupal to the world: another building block of the Semantic Web that makes it real, that makes its uses and users legion. And like the Web itself, the Semantic Web benefits from network effects: the more people involved, the greater the value of the whole network to everyone.

October 15, 2008
Benjamin Melançon (not verified):

Sorry, one very important link I missed:

The Semantic Web group on groups.drupal.org, and in particular the Drupal RDF Schema proposal post.

Just for fun: In addition to talking about putting RDF in Drupal, here's what Drupal looks like described in RDF, which may help to show what kind of structure can be given to content (not that it would look like this, but this is what computers could see).

benjamin, Agaric Design Collective

October 15, 2008
Evan Goer (not verified):

Your analysis is exactly right, Dries -- we at Yahoo! are keenly interested in getting structured data out on the web, and major CMSes like Drupal are among the best ways to do it. As Benjamin says above, this doesn't just benefit the big players. SearchMonkey is one use case for structured data, sure... but personally, I'm hoping that once the data is out there, nimble and smart people all over the web will figure out all sorts of interesting uses for it.

Anyway, my team has worked with a number of sites on structured markup since the SearchMonkey project started, and we'd be happy to help you get going too -- whether it's assistance with vocabularies, or even direct code contributions. Just let us know what we can do to help.

Evan Goer
Yahoo! SearchMonkey Team

October 15, 2008
John Breslin (not verified):

Great post Dries and very exciting to imagine the possibilities this would enable.

Let me know what we can do to help. We have tried to align SIOC terms as closely as possible to concepts in Drupal and other open source social software (the Drupal module was one of the first SIOC exporters we made in fact) and will continue to do so...

Thanks!

John.
--
(Founder of the SIOC project)

October 16, 2008
Dries:

John, any interest in taking the module to the next level by feeding them to Drupal core in bite-size chunks?

October 16, 2008
John Breslin (not verified):

Dries - sounds great - I'll consult with my colleagues and fellow Drupal-heads scor and terraces to see what we can do next...

October 16, 2008
scor (not verified):

Yes, RDFa is promising! This post is an opportunity to revive the Drupal core RDF Schema proposal I posted earlier this year.

Besides Search Monkey, other search engines are already able to crawl RDFa like the semantic web search engine Sindice which currently indexes more than 37 million documents.

Turning the several 10 000s existing Drupal sites into RDF nodes is an exciting goal! Let's see how much RDF spices we can get in Drupal 7...

October 16, 2008
Steven Pemberton (not verified):

"Disintermediate". Exactly! This is just what I was talking about in my talk at XTech this year "Why you should have a website".

http://homepages.cwi.nl/~steven/vandf/2008.03-website.html

October 16, 2008
Peter Krantz (not verified):

And, if a large aount of the Drupal plugin developers adopt RDFa I can see Drupal becoming a fantastic platform for content publishing in a variety of domains.

RDFa could potentially benefit users with disabilities as well.

Keep up the good work!

October 16, 2008
Knud Möller (not verified):

Exposing the Drupal DB as RDF and thus to the Semantic Web is brilliant, I really hope this gets into Drupal Core! We just announced a Drupal site which does the exact opposite - we're exposing an RDF database with conference metadata through Drupal: http://data.semanticweb.org. It's all nicely served as linked data, uses content negotiation, etc. We have packaged all the code into a model, but I'm not sure yet how to generalise it enough so that it would make sense to release the code publicly. Anyway, if both approaches - Drupal->RDF and RDF->Drupal - would work hand in hand, this would be very exciting.

October 16, 2008
Rainer (not verified):

I have some rdf code on my Drupal contact page, just as "pure html" block under the submit-message button. That's a vcard/hcard for me as it professional.
To have one (or few) data model for jobs (that apply to the whole world of business) would be incomparable more difficult and the definition would take ages.

The other side: When all products on all pages of the world will be rdf-enabled and therefore readable and comparable by machines (google), wouldn't a smart spamer just fake a set of rdf data to give you some nasty popups?

It's not only to understand the data - it's also about trusting the data.

I think it's to far away for D7, just put some code in blocks and you're done for now.

October 20, 2008
Daniel Chvatik (not verified):

I can't wait to hear the updates on Semantic Web progress at DrupalCon in DC in March! This is a great opportunity for Drupal to extend it's lead on other CMS frameworks...

January 30, 2009
scor (not verified):

If you want to support RDF in Drupal core, you can now donate at http://drupal.org/node/443824

April 25, 2009
Silona (not verified):

Hey Dries - long time no chat...

Citability.org - asking that all public govt documentation be web citable on a paragraph level

Example: archive.house.gov/HR1586IH/20030318093016#S1b1Bii

The example would point to section 1, subsection b, chapter 1, paragraph B, clause ii in SB 1234 on Mar 18, 2003 at 09:30:16 am UTC

Recovery.gov and nysenate.gov are both done in Drupal. David Strauss believes this could be implemented VERY quickly.

I am talking to Louis Montagne about doing a codeathon in France where we create this tool for drupal.

you in?

It would make the web into a giant database with those UID into govt documentation and all the relevant blog posts and wiki citations that include them!

Cheers!
Silona

May 15, 2009
Paolo Bouquet (not verified):

I really share Benjamin Melançon's enthusiasm about this opportunity that Drupal may provide for the Semantic Web. Here's a possible contribution to make it happen soon.

One of the key issues in exposing RDFa in any web portal is that it is not directly integrated with any other structured data on the web, and therefore its use (and reuse) is very limited. And this is mainly because, despite the attention on linked data stuff and the like, very little care is taken in making sure that when I publish something about an entity (person, product, company, event, document, ...) it has a URI which can be connected to and integrated with other information about the same entity in other web locations. If this does not happen, I'm afraid the costs of publishing RDF and RDFa on the web will outperform the benefits, and very few users will make the investment necessary for exposing good RDFa on the web.

The EU funded project called OKKAM (http://www.okkam.org/) has made available an open and public infrastructure, called Entity Name System (or ENS), for sharing and looking up identifiers (URIs) for any type of entity. It can be used by any application through APIs for looking up URIs of entities, create new identifiers, add known identifiers to an entity description, and so on (see our Community Portal and http://api.okkam.org/ for documentation and testing interfaces). The ENS is open and free, and can be used for creating and sharing a permanent identifier for any entity (including your dog or your car), not just encyclopedic or "popular" entities like "Barack Obama" or "Berlin"!

Among many other pilot projects, my group is starting right now a project for adding RDFa with OKKAM identifiers in the new portal of my University (the University of Trento in Italy), which is under development with **Drupal**! So I'd be very interested in collecting suggestions and sharing experience on this. Please get in touch with me if you think this package Drupal + RDFa + OKKAM can be of interest for you or your company/institution.

May 12, 2010
Bruce Whealton (not verified):

These are great improvements with Drupal 7. Very impressive. We really need this. I can see the potential already. It is a technology that is important for web designer/developers, site owners, SEO, and that's just the beginning. I'll be sure to import your blog as a rss feed on my sites and keep up to date on what is happening. Great accomplishment!
Bruce

January 2, 2011

Add new comment

© 1999-2014 Dries Buytaert Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
Drupal is a Registered Trademark of Dries Buytaert.