Blog Item Format

The XML for a blog post in an RSS 2.0 document looks like this:

<title>My Item</title>
<pubdate>Thu, 25 Aug 2005 12:31:00 GMT</pubdate>
<guid isPermaLink="false">b18f7ed9-yadda-yadda</guid>
<description>This is the text of my blog post.</description>

This is similar to what Community Server delivers up.  Most of it is right from the RSS spec, but some of it isn't.  Not that it's illegal - RSS allows extensions, and the stuff in the dc, slash, and wfw namespaces are defined by RSS extensions.

  • dc:creator is "an entity primarily responsible for making the content of the resource."
  • slash:comments is simply the number of comments that exist for the item
  • wfw:commentRss is where comments on this item can be found, in a machine-readable format.

The nice thing about RSS and extensions is that these sorts of ad-hoc extensions can develop, and the good ones will stick.  The bad thing about it is that the tools that are consuming RSS don't have a single place to look for the specification; they need to consume RSS plus some set of extensions, and it's up to them to figure out what extensions they want to support.

My migration project requires that I take a single blog post, and output it as XML.  Preserving the comments means I should also write out all the comments, and then write out RSS for the comments and have the items contain a <wfw:commentRss> that points to the static comment "feed".

There is probably some extra data I might like to capture as well - such as the number of hits the item has received in the current blogging tool.  But maybe that belongs somewhere else.

What I'd like to know is if anyone has tackled this problem already, and defined an XML schema whose intention it is to represent a blog post.  Something like RFC2822 for an <item>.