Skip to main content

Wouter

Go Search
Home
Contact Me
  

Wouter > Posts > Custom XML? This Custom XML!!!
Custom XML? This Custom XML!!!

While doing my round of blogs I found this post by Stephane Rodriguez on the support in Open XML for embedded custom schemas in a WordprocessingML document, and after wading through the ranting going on, there is little proof of actual knowledge of Open XML.

What Stephane rants about is Open XML support for defining business semantics in a document, but he doesn't really understand what it is about. Given his post, I'd be amazed if he actually looked at this feature. Since he forgot to mention the one single WordprocessingML tag used to define Custom XML markup.

His opening paragraph starts out by saying that Custom XML is a poorly chosen name, since XML is already custom. Ehr… Next going into the fact that it is application specific, which it isn't, and then calling it useless, while in reality the support is one of those things that really drive businesses since it is easy to get to document data outside of the formatting and layout.

The second section 'Custom XML definition, as per Microsoft' serves little purpose but to belittle Microsoft, while it is an ECMA feature! Perhaps you should investigate their stance and not Microsoft, all in all little useful stuff in that area.

Next is 'Enough marketing fluff. What it really is', which is a hilarious title since Stephane actually doesn't know what it is, but does try his best to figure it out without reading the Open XML spec.

Stephane says to take a piece of markup like such:

<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>

Add in some extra tags like so:

<w:p>
<w:r>
<mytag>myvalue</mytag>
<w:t>test</w:t>
</w:r>
</w:p>

And then be amazed at the fact that your document is broken. Wow, unknown XML breaks a parser, how insightful. The fix that he tries next is even more wondrous, and gives me the feeling that he is not that in tune with XML at all. Solution to prevent breaking? Let's try and add a random namespace prefix, and let's re-use the one used by WordprocessingML. Ehr….

<w:p>
<w:r>
<w:mytag>myvalue</w:mytag>
<w:t>test</w:t>
</w:r>
</w:p>

By the way Stephane, WordprocessingML also doesn't support or define the w:stupid tag or the w:Idontknowxmlnamespaces tag as well.

To summarize in a nifty table as well:

Test

Idea?

mytag="myvalue"

Stupid

<mytag>myvalue</mytag>

Stupid

<w:mytag>myvalue</w:mytag>

Stupid

What Custom XML is all about

So let me explain custom XML markup. It is about embedding custom XML defined outside of Open XML to support solution which aim to structure a document using business semantics, not only using formatting. A great advance since you want to get to the data, and not by saying that the customer name is the 3rd paragraph. The issue is that you cannot just allow any arbitrary XML to be stored in the WordprocessingML package. This would become application specific, and it would break validation since all XML is valid. Not a great idea.

So the way to do this in Open XML is to use a w:customXML tag for it:

<w:p>
<w:customXml w:element="customerName" w:namespace="urn:my:order">
<w:r>
<w:t>test</w:t>
</w:r>
</w:customXml>
</w:p>

This allows you to embed business semantics in such a way that it is discoverable, and implementers not interested in using that feature can skip over it easily, without needing to know what application stored it there.

You can even easily extract your data using one simple generic XSLT as I have shown on my older Info Support blog:

http://blogs.infosupport.com/wouterv/archive/2007/10/11/Extracting-data-from-a-xml_2D00_mapped-document.aspx

Stephane, the only thing defective by design is your blog. Sorry dude, but please investigate before spending a Friday ranting.

And also excuse my slightly attacking nature of writing today. I am fed up with people saying stuff about Open XML on their blogs, influencing the ISO process, while the actual knowledge contained in the blog post is so extremely slim. Not even funny anymore.

[Edit]

I just have to respond to the last thing claimed in his post:

Something interesting to note is that Microsoft thinks that storing data inside the ZIP package independently of the document is a good thing. From a pure technical point of view, you can view this "Custom XML data" as a cache of values thanks to which the consumer is able to drill into the data without a connection to the actual data source (corporate data). But there is a major flaw. Anybody using this feature will end up storing arbitrary data in ZIP packages shared across colleagues and others inside and outside the organization. Eventually, confidential information from the corporate databases will end up there, and a PR disaster automatically follows. You don't want to use this feature.

Dude, so by separating the content from the formatting you are creating PR disasters? Wow. When not using CustomXML your document still displays a customer name, it is only harder to discover it as such. How does defining a semantic structure for it create a PR disaster? There is no more or no less data in the document itself.
[/Edit]

 

 

Comments

Fredrik E. Nilsen

I think this explains why mr. Rodriguez never accepts comments on any of his blog posts. ;)
at 3/24/2008 11:20 PM

Jesper Lund Stocholm

Hi Wouter, Great article. I took the "OOXML is defective by design"-post apart in Summer 2007 (sadly, in Danish) and I was amazed of how horribly wrong the article was. As you I couldn't help but think: Does he at all know what he is talking about?". I thought about writing something about his new article, but reading it I couldn't find an antry-point ... simply because every new paragraph contained a new error. ... and I was also amused about his "Perhaps this is a namespace issue. Let's prefix our custom XML with w ..." WTF? :o) Anyone referencing his article as "devastating technical proof" or similar should be slapped around a bit with a rainy-wet edition of Sunday's New York Times.
at 3/25/2008 10:08 AM

Cmdr Flibberty Jibbits

I have always suspected Stephane Rodriguez, he has a business in doing Excel extracting tools and was an early critic of OOXML in that domain. His arguments have always been very flaky, and he has been publicly exposed in a number of threads around the web (like when he did not understand the spreadsheet). He seems to have expanded from the area of his expertise to other areas, and I suspect he is being paid to do this. Stephane's own business was threatened by the availability of OOXML as it depended entirely on being able to decode obscure features in the old binary file formats. The further announcements from Microsoft for translators from binary to OOXML might have threatened his business more. It is in this environment of fear that Mr Rodriguez was probably approached by an interested party and funded him to write anti-OOXML pieces. The pieces are not very savvy and as this article proves (and many other articles that have debunked him) his knowledge is not very deep. Why would someone jeopardize his own business and his reputation is beyond me, and I can only come to the conclusion that he is being paid to do so some consulting fee.
at 3/25/2008 5:30 PM

Julien Chable

I love the content and especially the end ! Too much people are talking about Open XML without having tested it yet or read the specs. The Custom XML feature is one of the most powerful feature of Open XML for business scenarii and document centric solutions.
 
Thank you Wouter :p
at 3/25/2008 6:31 PM

Simone Pringle

The ability to embed and interweave business data into transportable and humanly readable documents is extremely useful.  Take for instance the efforts to standardize the embedding of patient medical data into PDF documents (aka PDF/H).
 
Records For Living has been able to take advantage of Open XML's capabilities with regards to its support of custom schemas to integrate two industry standards: Ecma's Open XML and the ASTM's Continuity of Care Record (CCR).  The combination is powerful: patients can use personal health record (PHR) software to exchange 'live' reports with their doctors in a way that is both human and machine readable.
 
Thank you for setting the record straight on this capability of the Open XML standard.
 
Simone
at 3/25/2008 7:31 PM

W.Meints

I'm getting a bit tired of developers who are doing such stupid tests. I'm working on an application that is going to parse Word 2007 documents and thusfar I haven't seen any problems with parsing or creating word documents.
 
Good to see people with real knowledge about OOXML that set these weird actions straight.
at 4/1/2008 8:33 AM

Gareth Horton

Nice one Wouter, especially from the point of view of a nice, clear, simple intro to custom XML.
 
I don't think one can doubt Stephane's technical talents, with his work on BIFF8 and even reverse engineering BIFF12 (xlsb) to some extent. 
 
I think, as others do, that there must be some kind of agenda here.
 
Gareth
at 4/4/2008 6:57 PM

xmler

All the examples I see use some text but what we all want is ... some text .... where we are annotating word docs with real data. can it do this?
at 11/5/2008 9:50 PM

Wouter

Haven't tested it, should work most of the time.
Wouter van Vugt at 5/1/2009 5:39 PM

Add Comment

Items on this list require content approval. Your submission will not appear in public views until approved by someone with proper rights. More information on content approval.

Name (required) *


Your Url

Type the Web address: (Click here to test)  

Type the description: 

Comments (required) *

Attachments