Two years ago at DrupalCon Boston, I declared that we should embrace the semantic web, and that, as a first step, we should add RDFa support to Drupal core. Since then, I've written extensively about the importance of semantic technologies in Drupal, and how I believe Drupal can play an important role in helping to bootstrap the semantic web.
Drupal 7, the next major release of Drupal, will ship with RDFa support directly "out of the box." To help people understand what is possible with RDFa support and how it enables us to do cool new things, take a look at the video below. I showed this video in my DrupalCon San Francisco keynote, two years after my initial RDFa challenge at DrupalCon Boston.
There are many other things we can build on top of this core support, but this is a start. I'm personally very excited to see this vision being realized, and I'm very thankful to all the people that helped make this possible. Kudos to Lin Clark from DERI, NUI Galway for her work on building the demo and recording the screencast, and to Stéphane Corlosquet from MIND Informatics at Mass General for leading the RDF in Drupal 7 efforts. The newly launched Semantic Drupal website contains articles, video tutorials and news on building Linked Data sites with Drupal 7.
Web founder Sir Tim Berners-Lee and Professor Nigel Shadbolt unveiled Data.gov.uk today. The new website offers public sector data, ranging from traffic statistics to crime figures, for private or commercial use. It is designed to be similar to the Obama administration's data.gov project, run by Vivek Kundra, Chief Information Officer in the US.
What is exciting is that Data.gov.uk uses both Drupal and various semantic web technologies to encourage people to create data mashups and to visualize the data in clear, imaginative ways that provide more insight in the underlying information. It is great to see that Open Source, Open Data and government meet, and it is great validation for Drupal's adoption of the semantic web. Hot stuff!
Two days ago, Google announced "Rich Snippets", a move that is sure to shake up the SEO industry, and cause hundreds of thousands of people to reconsider their skepticism of the semantic web. Yes, that probably includes many of you.
Google's Rich Snippets provide summary information to help users quickly identify the relevance of their search results. For example, if you search for a restaurant, rich snippets may include an average review score, a price range, or more. As users get more sophisticated at search, they'll ask Google increasingly complex questions. Rich Snippets allow Google to stay on top of that trend, and prevents losing users to competitors.
It is very hard for search engines to understand the structure and semantics of data embedded in an HTML page. To create these snippets, Google needs the help of hundreds of thousands of webmasters around the world, and by extension, content management systems like Drupal, Joomla!, and others. Specifically, Google is asking all of us to surface structured data to their crawlers by marking up our HTML with RDFa and Microformats. When Google announced Rich Snippets this week, they really announced support for RDFa and Microformats, and the semantic web in general. This is big.
Initially, Google's adoption of RDFa will disrupt the current approaches to Search Engine Optimization (SEO). With Google entering the RDFa game, the words "semantic markup" will get redefined. Every webmaster wanting to improve click-through rates, reduce bounce rates, and improve conversation rates, can no longer ignore RDFa or Microformats. Structured data is the new SEO.
As I've written before, search engines like Google and Yahoo! will provide the killer apps (e.g. vertical search engines) that the semantic web has been waiting for. Five years from now, we'll look back and say: "All it took was some incentive for the SEO industry." ...
Rich Snippets is a natural step in making search better. It provides a glimpse into the future of search, and tempts us with the possibilities of the semantic web. Right now, Google has a database of pages. If you read beneath the lines of their announcement, what Google is really asking is for us to help them in building giant specialized databases of all products, people, places and events in the world. This provides opportunities well beyond providing rich search snippets. We're turning the web into a giant database for Google (and others) to slice and dice as needed.
For example, it is easy to see that a database of all the job applications in the world, built by crawling hundreds of thousands of independent RDFa-enabled sites, will impact specialized job sites. Or how a database of all the product or movie reviews in the world could affect specialized review sites. It might seem scary at the surface, but it really isn't. On the web, scale and reach are more important than scarcity -- you win by setting data free, not by holding it close to your chest.
For many of us in the Drupal community, Google's announcement couldn't be more timely. The Drupal community has been working on adding RDFa support to Drupal 7, and at this very moment, people from the community are gathering in Galway for a week long code sprint to get more RDFa support in Drupal 7 core. Once again, Drupal proves itself to be on the cutting edge, and is taking a leadership role in adopting semantic web technologies. As I said in my DrupalCon Boston keynote 1.5 years ago, I believe that Drupal can become a significant player in the development of the semantic web. It's bullish, and maybe even naive, but I couldn't be more excited about giving the semantic web snowball a small push.
The Obama administration recently excited the world of open source software by choosing to launch recovery.gov on Drupal. Their choice of a free, open source platform over any proprietary system is as hopeful and promising as the purpose of the website they built, which is to lend transparency to the spending of the $800 billion dollar economic stimulus money. We should be happy both that the U.S. government is embracing Open Source software, and that it is promoting Open Data.
I recently blogged about how hundreds of thousands of Drupal sites contain vast amounts of structured data, but that structure has been hidden deep in Drupal databases and never surfaced to the HTML level. To counter this, I'd like the upcoming version of Drupal to emit structured information through the addition of RDFa metadata for both common and custom content types. This could help the Obama administration with their goals around Open Data.
Instead of needing to do all of the data analysis themselves, governments should work on making data available in machine readable formats. This would have the effect of enabling citizens and organizations to query and combine that data, to answer interesting questions not asked before, and to build new services that help other citizens. Just look at Apps for Democracy.
According to Georges Thomas from recovery.gov, the Obama administration wants to do exactly that. Thomas presented some additional details on how they envisioned making all of that data available. Furthermore, they recently solicited proposals for what to technologies to use. Tim Berners-Lee, the inventor of the web, submitted a proposal for Linked Open Data. Various people, including myself, wrote in to express our support for Tim Berners-Lee's proposal.
To achieve these goals, and help governments transition into an era of open, linked data, Drupal has some growing to do. As mentioned earlier, we are organizing code sprints that aim to make Drupal 7 a more powerful tool for managing RDF data.
Given that recovery.gov already runs Drupal, and given that I would like to see more Semantic Web technologies in core, I couldn't be more excited. With the right encouragement and technological tools, government sites can expose vast amounts of data covering an enormous range of concepts and topics. This data will be exposed in an open, reusable form that can be searched or leveraged by organizations and individuals as they require. We, the Drupal community, have a unique opportunity to help reshape how politics is done.
Step one is to make the data available -- and that is exactly what we try to accomplish with Drupal 7 and beyond. Many of the technologies -- such as RDF, RDFa, SIOC, FOAF, Oauth, and OpenID -- are available. It's a simple matter of programming to start putting these together, and it takes projects like Drupal to help bootstrap them. Time to get our hands dirty!
From May 11th to 14th, smart minds will be meeting in Galway, Ireland, to work on keeping Drupal a cutting edge technology. Drupal has a proud history of staying ahead of the curve in adopting or pioneering new technologies, and this has been a large contributing factor to Drupal's success. To stay ahead of the curve, continuous hard work is needed, and this is why the upcoming RDF in Core code sprint, sponsored by DERI, is so important.
What are the goals of this sprint? In a nutshell, the goal is to give true meaning to Drupal's data. Drupal is capable of collecting and presenting a lot of data, in no small part thanks to CCK, now Fields in Core for Drupal 7. This data is still meaningless in the Semantic Web sense because other computer agents can't make sense of the data that Drupal presents.
The goals proposed by the RDF in Core sprint would change this, and fields added to Drupal would contain semantic meaning useful to other tools (like next generation search engines). As I wrote earlier, existing search engines such as Google and Yahoo! SearchMonkey have already started to take advantage of RDF, and emerging tools such as Sindice and Visinav are currently crawling and indexing the Web of Data.
This Web of Data promises to be browsable just like a huge database, e.g. by means of query languages for RDF such as SPARQL (the name similarity with SQL is not a coincidence). Besides the first goal of getting RDF into Drupal Core, more flexible extensions such as RDFCCK are already on the way, to make your Drupal sites a part of the so-called Linked Data cloud.
I've written extensively about the importance of semantic technologies in Drupal before, and am therefore personally very excited for this sprint to happen. Many thanks to Stéphane Corlosquet for organizing this, and to DERI for hosting it. At the present time, a number of people who would like to attend are unable to due to shortages of funds. If you'd like to support this sprint financially, a ChipIn collection is underway to help bring a few more smart minds to the meeting so that they can best accomplish all of their goals.
Last year, I wrote a blog post title Drupal, the semantic web and search that outlined how search engines like Google and Yahoo! are getting increasingly hungry for structured data. It is no surprise, because if they could built a global, vertical search engine that, say, searches all products online, or one that searches all job applications online, they could disintermediate many existing companies.
More exciting to me is how search engines can help bootstrap the semantic web as they build out these vertical search engines, and the role that content management systems like Drupal get to play in this. Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal.
In my DrupalCon Boston keynote presentation last year, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envisioned a Drupal core CCK with the power to do just that.
In the year since Boston, the Drupal community has built exactly what I asked for. I was planning to show a video of their work in my keynote presentation at DrupalCon DC earlier this month. Unfortunately, I ran out of time before I could show it. However, it was shown in the "Semantic Web and Why You Should Care" session, and today Stephane Corlosquet posted all the details in the semantic web group on drupal.org. The video paints a picture of what is possible with today's Drupal technology, but also, what hopefully will become possible with Drupal core at some point. The prototypes in this video were built using contributed modules for Drupal 6. However, since last year, we have fields in core and we've already began putting some RDFa in core, too.
Ben Lavender produced the screencast, Josh Huckabee built the Exhibit view and Stephane Corlosquet built the SearchMonkey applications and the social network site. Other people that helped include Axel Polleres and Andreas Harth (creator of VisiNav). The work on both this video and the featured modules has been sponsored by DERI Galway, Harvard IIC and OpenBand.
Last week at DrupalCon DC I gave my traditional state of Drupal presentation in front of 1400 Drupalistas. The video of the presentation is provided below, and you can download a copy of my slides (PDF, 20 MB) as well. The video is available in alternative encoding formats from archive.org. Topics I talked about: the history of Drupal, the Drupal 7 release, the future of Drupal, etc. Have a look!
All major search engines, including Google and Yahoo!, are moving aggressively trying to capture structured data. This isn't exactly a surprise because it provides tremendous opportunity. Let's take the example of product search. Imagine the web as a huge database of millions of products, and search engines like Google and Yahoo! giving you a rich set of controls to filter by price, availability, color, shipping cost, user ratings, and more. Wouldn't it be great to be able to search all the world's products from a single page with a single interface? I'd think so too.
It is waiting to happen; we just have to connect the dots. That is, we have to make Drupal emit structured information.
Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics, including product information. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal. As such, search engines can't pick it up as a product, and they'd fail to include it in their world-wide product database.
I first talked about the semantic web and Drupal in my DrupalCon keynote last year in Boston. In my presentation, I laid down the challenge that we need to put fields in core and make them first class citizens. Once fields are thus empowered, they can be associated with rich, semantic meta-data that Drupal could output in its XHTML as RDFa. For example, say we have an HTML textfield that captures a number, and that we assign it an RDF property of 'price'. Semantic search engines then recognize it as a 'price' field. Add fields for 'shipping cost', 'weight', 'color' (and/or any number of others) and the possibilities become very exciting. I envision a Drupal core CCK with the power to do just that.
Here is another example. Imagine a standard Drupal node-type called 'job'. The fields in the job node-type would have RDF properties associated with them mapping to salary, duration, industry, location, and so on. Creating a new job posting on a Drupal site would generate RDFa that semantic search engines like Yahoo!'s SearchMonkey would pick up and the job would be included in their world-wide job database.
Technologies like this disintermediate so many existing websites and organizations that it makes my head spin. It is too great an opportunity for us to pass up on. By adding semantic technology to Drupal core, I think we can make a notable contribution to the future of the web.
This kind of technology is not limited to global search. On a social networking site built with Drupal, it opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches.
I can has semweb in Drupal core?
The presentation discusses the results of the recent survey that I conducted; the survey ran for 30+ days and collected more than 1300 responses so it should provide a good idea of the community's current thinking. I'll provide more color and details about the survey results in a number of follow-up posts.
Last week at DrupalCon Boston I gave my traditional state of Drupal presentation in front of 850 Drupalistas. The video of the presentation is provided below, and you can download a copy of my slides (PDF, 15 MB) as well. The video is available in alternative encoding formats from archive.org.
Topics I talked about: the Drupal 6 release, the state of our union, the need for a drupal.org redesign, the Drupal 7 killer release, the Drupal 7 development cycle, usability, test-driven development, the future of Drupal and the semantic web, etc. There is a lot of material in this presentation and during the course of the next few weeks, I plan to decompose this presentation in a number of extended blog posts. Stay tuned!
Updates from Dries straight to your mailbox