I decided previously to use an RSS <item> as the file format for a blog post. Looking into the MetaWeblogAPI, it seems like that was a good move, since posting with MetaWeblogAPI involves simply sending the <item> to post.
So there’s two things to do:
- Figure out the SQL query to use with Community Server to get the data I need out
- Write some XML code to take this data and write it to disk as an XML file
The SQL Query looks like this:
SELECT cs_Posts.PostDate, cs_Posts.Subject, cs_Posts.Body
FROM cs_Posts INNER JOIN cs_Sections
ON cs_Posts.SectionID = cs_Sections.SectionID
LEFT OUTER JOIN cs_weblog_Weblogs
ON cs_Posts.SectionID = cs_weblog_Weblogs.SectionID
LEFT OUTER JOIN cs_Threads
ON cs_Posts.ThreadID = cs_Threads.ThreadID
LEFT OUTER JOIN cs_weblog_Posts
ON cs_Posts.PostID = cs_weblog_Posts.PostID
(cs_Posts.PostLevel = 1) AND (cs_Sections.Name = N'steve''s blog')
This gets me 3 output columns: PostDate, Subject, and Body. Perfect. Substitute my blog name for yours if you’re going to reuse this query.
Populating a DOM with these is a simple matter:
Element rootNode = xmlDoc.createElement("item");
Element titleNode = xmlDoc.createElement("title");
Element bodyNode = xmlDoc.createElement("description");
Element dateNode = xmlDoc.createElement("pubDate");
Now I want to figure out what the common way to write out XML with Java is. I found the wrong way to do it. Actually that page says “Serialization has been left as a task for vendor specific classes”. It’s hard to believe that something as core as writing XML would be left out of the framework.
Fortunately Sun has a page that shows how to write out a DOM as an XML file (as part of their XSLT tutorial, strangely enough).
Where in C# you’d simply write:
The equivalent in Java is something like this:
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
File f = new File("MyFile.xml");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(f);
I’m not sure why Java is so needlessly obtuse, but there it is.
But now that this part is done, I have a directory containing an xml file containing an <item> for every blog posting I wrote with .Text over the last few years. 391 items. I’ve put the raw items here.
An idea I’ve been working on lately is using OPML to identify collections of things like mail messages and blog posts, and having web tools work with those collections, wherever they may be, rather than managing their own private copy of the data. It’d be great to be able to refer to the Flickr metadata they manage for my photos directly, rather than having to do it through their API. But that’s another story.