While doing my round of blogs I found this post by Stephane Rodriguez on the support in Open XML for embedded custom schemas in a WordprocessingML document, and after wading through the ranting going on, there is little proof of actual knowledge of Open XML.
What Stephane rants about is Open XML support for defining business semantics in a document, but he doesn't really understand what it is about. Given his post, I'd be amazed if he actually looked at this feature. Since he forgot to mention the one single WordprocessingML tag used to define Custom XML markup.
His opening paragraph starts out by saying that Custom XML is a poorly chosen name, since XML is already custom. Ehr… Next going into the fact that it is application specific, which it isn't, and then calling it useless, while in reality the support is one of those things that really drive businesses since it is easy to get to document data outside of the formatting and layout.
The second section 'Custom XML definition, as per Microsoft' serves little purpose but to belittle Microsoft, while it is an ECMA feature! Perhaps you should investigate their stance and not Microsoft, all in all little useful stuff in that area.
Next is 'Enough marketing fluff. What it really is', which is a hilarious title since Stephane actually doesn't know what it is, but does try his best to figure it out without reading the Open XML spec.
Stephane says to take a piece of markup like such:
<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>
Add in some extra tags like so:
<w:p>
<w:r>
<mytag>myvalue</mytag>
<w:t>test</w:t>
</w:r>
</w:p>
And then be amazed at the fact that your document is broken. Wow, unknown XML breaks a parser, how insightful. The fix that he tries next is even more wondrous, and gives me the feeling that he is not that in tune with XML at all. Solution to prevent breaking? Let's try and add a random namespace prefix, and let's re-use the one used by WordprocessingML. Ehr….
<w:p>
<w:r>
<w:mytag>myvalue</w:mytag>
<w:t>test</w:t>
</w:r>
</w:p>
By the way Stephane, WordprocessingML also doesn't support or define the w:stupid tag or the w:Idontknowxmlnamespaces tag as well.
To summarize in a nifty table as well:
Test | Idea? |
mytag="myvalue" | Stupid |
<mytag>myvalue</mytag> | Stupid |
<w:mytag>myvalue</w:mytag> | Stupid |
What Custom XML is all about
So let me explain custom XML markup. It is about embedding custom XML defined outside of Open XML to support solution which aim to structure a document using business semantics, not only using formatting. A great advance since you want to get to the data, and not by saying that the customer name is the 3rd paragraph. The issue is that you cannot just allow any arbitrary XML to be stored in the WordprocessingML package. This would become application specific, and it would break validation since all XML is valid. Not a great idea.
So the way to do this in Open XML is to use a w:customXML tag for it:
<w:p>
<w:customXml w:element="customerName" w:namespace="urn:my:order">
<w:r>
<w:t>test</w:t>
</w:r>
</w:customXml>
</w:p>
This allows you to embed business semantics in such a way that it is discoverable, and implementers not interested in using that feature can skip over it easily, without needing to know what application stored it there.
You can even easily extract your data using one simple generic XSLT as I have shown on my older Info Support blog:
http://blogs.infosupport.com/wouterv/archive/2007/10/11/Extracting-data-from-a-xml_2D00_mapped-document.aspx
Stephane, the only thing defective by design is your blog. Sorry dude, but please investigate before spending a Friday ranting.
And also excuse my slightly attacking nature of writing today. I am fed up with people saying stuff about Open XML on their blogs, influencing the ISO process, while the actual knowledge contained in the blog post is so extremely slim. Not even funny anymore.
[Edit]
I just have to respond to the last thing claimed in his post:
Something interesting to note is that Microsoft thinks that storing data inside the ZIP package independently of the document is a good thing. From a pure technical point of view, you can view this "Custom XML data" as a cache of values thanks to which the consumer is able to drill into the data without a connection to the actual data source (corporate data). But there is a major flaw. Anybody using this feature will end up storing arbitrary data in ZIP packages shared across colleagues and others inside and outside the organization. Eventually, confidential information from the corporate databases will end up there, and a PR disaster automatically follows. You don't want to use this feature.
Dude, so by separating the content from the formatting you are creating PR disasters? Wow. When not using CustomXML your document still displays a customer name, it is only harder to discover it as such. How does defining a semantic structure for it create a PR disaster? There is no more or no less data in the document itself.
[/Edit]