Skip to main content

Wouter

Go Search
Home
Contact Me
  

 ‭(Hidden)‬ Admin Links

Such silence

Has everybody over-blogged their respective selves or something? I used to get an entertaining story at least one time a week not that long ago…

   

   

   

Oh well, I've been offline for some time as well, but I'll have a nice announcement somewhere later this week. I've spent the time not blogging quite usefully, coding! (and sometimes helping painting stuff, and.. painting yet more stuff, our baby is coming just about 2 months so we'll be moving to defcon-1 soon )

Business Data Catalog and Document Information Panels – EOF Issue resolved

About a week ago I got notified by one of my students that he was running into issues with the Business Data column in SharePoint, when it was being used as a column in a document library. The document information panel displayed by Microsoft Office was unable to resolve entities, and returned a somewhat cryptic java-script error message:

Expected token ‘EOF’ found ‘:’.
//ns2-->:<--
File:script.js
Line:170

Online there are more people running into the issue:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1482403&SiteID=1

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3016317&SiteID=1

So after some debugging I found the issue, and a work-around that you can use to allow BDC fields to appear in the document information panel, at least until Microsoft fixes the issue.

One nice thing is that you are allowed to debug the java-script causing the error, and if you do, you drop in to the bdc_SaveResolvedValues function, which obviously, saves resolved values. Inside this method there are a few things that are taking place. First, a concatenated string with the primary field name and secondary field names is constructed. Next this string is split using the ‘:’ as a separator character. Each string resulting from the split is then used in an XPath expression.

Here’s the thing. If you haven’t selected any secondary fields in your BDC column, the concatenated string looks like the following, note the absence of any content after the ‘:’ character.

MyFieldID:

If this string is then split using the ‘:’ character, you get two strings back, one containing ‘MyFieldID’, the other is empty since there are no characters trailing after the ‘:’, and there is your issue. Now the empty string is used to build the XPath expression, which will look like ‘//ns2:’, without any node name to query because of the empty string.

Of course the real resolution would be to create a better javascript, but I believe it is fully generated at runtime, so how to go about it is something I am not sure on without help from either the community or Microsoft. However, you might also be able to use a quick fix, by including more than one field from the BDC in your Business Data Field. To do this go to the configuration of your Business Data Field in the library where you are using it, and select an item under ‘Add a column to show each of these additional fields’. This will prevent the string-splitting error.

Hope it helps.

Is it official? Does ODF suck?
In danger of being called pro-Microsoft:
 
 
But really, he does have a point. Microsoft has been shipping better products to more people, and as such, is hated by just about everyone who wants to benefit from the less-great, less-sold (also called "Open Source"). 
 
Damn you Microsoft!!!! I want to take over the world, but you've already took it!
Open XML Explained in Vietnamese
Just learned that the Open XML explained book is now also available in Vietnamese:
 
Anybody who can send me a copy? I'd be much obliged :)
 
 
“Crawling through the needle’s eye”

Taken from the mailing list of the 'open' doc society:

OOXML which was submitted by Microsoft to ECMA, and by ECMA to ISO, has

literally crawled through the needles eye.


Michiel was even so kind as to post the ISO results before ISO takes their official position tomorrow. Now I know the society is called 'OpenDoc(umentformat) society', but to actually open documents to the general society which ISO hasn't even released, that would seem to be somewhat inappropriate. Is this the kind of respect for ISO I can expect from that society?

After a year of discussion and repairs it still receives the very minimum of support

The report states 75% approval of Open XML, with the minimum of 66.66%. That to me is not minimum. The report states 14% disapproval of Open XML, with the maximum of 25%. That to me, is not minimum at all. Not a land-slide majority, but a majority none-the-less.

Sigh…

Still that negativism, and twisting of news. We can chuck it up with the news about Germany and Norway.

Open XML to become an ISO standard!!! (???)

If I am to read the Open Malaysia blog correctly, and if their vote tracking is up to date, Open XML will pass and become an ISO standard. Not official just yet, but….

Rejoice!

I hope we will be hearing the official result somewhere the following week. That makes tonight almost as exciting as the night before my tenth birthday J

Why using a single style container is a very bad thing

In my previous blog post on separating style from fiction, Stephane mentioned correctly that ODF uses a single container for styles, which is re-used for documents and spreadsheets. To me this is a bad thing, like I responded to his comment. Why? Well, it makes less sense to provide the same level of formatting for documents as for spreadsheets, and vice-versa. A little investigation into ODF and its main authoring tool proves this issue.

So, let's investigate.

First let me create an ODF document. Just add some text to it, and select that text and change the kerning option:

To change the kerning option, you go to the character formatting dialog, third tab for positioning. (the following screenie is a Dutch localized version of Writer)

This will result in the following layout:

Cool, kerning works in the OpenOffice.org Writer application. The markup for this is generically available for spreadsheets as well. It looks like the following:

<style:text-properties fo:letter-spacing="0.318cm" fo:background-color="transparent" />

Now for a spreadsheet. You can create a spreadsheet and style a single cell. The Calc formatting dialog for cells doesn't contain the kerning setting like the Writer dialog. Hmmm… Similar ODF markup container, different things allowed. This will not help me find out if my document uses unsupported features. Calc does not have a 'position' tab, but I can of course change the markup directly.

So you open the ODF document, edit the markup directly to add the fo:letter-spacing attribute and see what happens. First of all, nothing crashes, no indication of anything being wrong. But the text in my cell has no kerning applied:

To make matters worse, if you now edit the cell formatting, and save the document again, Calc removes the fo:letter-spacing attribute. I remember the following statement on Stephane's blog:

That is after all what XML fragments are for. It should be able to make in-place replacements of XML fragments and leave the rest of the file untouched.

As such, OpenOffice.org is also not a native XML application, throwing away content at a whim, without notification.

Custom XML? This Custom XML!!!

While doing my round of blogs I found this post by Stephane Rodriguez on the support in Open XML for embedded custom schemas in a WordprocessingML document, and after wading through the ranting going on, there is little proof of actual knowledge of Open XML.

What Stephane rants about is Open XML support for defining business semantics in a document, but he doesn't really understand what it is about. Given his post, I'd be amazed if he actually looked at this feature. Since he forgot to mention the one single WordprocessingML tag used to define Custom XML markup.

His opening paragraph starts out by saying that Custom XML is a poorly chosen name, since XML is already custom. Ehr… Next going into the fact that it is application specific, which it isn't, and then calling it useless, while in reality the support is one of those things that really drive businesses since it is easy to get to document data outside of the formatting and layout.

The second section 'Custom XML definition, as per Microsoft' serves little purpose but to belittle Microsoft, while it is an ECMA feature! Perhaps you should investigate their stance and not Microsoft, all in all little useful stuff in that area.

Next is 'Enough marketing fluff. What it really is', which is a hilarious title since Stephane actually doesn't know what it is, but does try his best to figure it out without reading the Open XML spec.

Stephane says to take a piece of markup like such:

<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>

Add in some extra tags like so:

<w:p>
<w:r>
<mytag>myvalue</mytag>
<w:t>test</w:t>
</w:r>
</w:p>

And then be amazed at the fact that your document is broken. Wow, unknown XML breaks a parser, how insightful. The fix that he tries next is even more wondrous, and gives me the feeling that he is not that in tune with XML at all. Solution to prevent breaking? Let's try and add a random namespace prefix, and let's re-use the one used by WordprocessingML. Ehr….

<w:p>
<w:r>
<w:mytag>myvalue</w:mytag>
<w:t>test</w:t>
</w:r>
</w:p>

By the way Stephane, WordprocessingML also doesn't support or define the w:stupid tag or the w:Idontknowxmlnamespaces tag as well.

To summarize in a nifty table as well:

Test

Idea?

mytag="myvalue"

Stupid

<mytag>myvalue</mytag>

Stupid

<w:mytag>myvalue</w:mytag>

Stupid

What Custom XML is all about

So let me explain custom XML markup. It is about embedding custom XML defined outside of Open XML to support solution which aim to structure a document using business semantics, not only using formatting. A great advance since you want to get to the data, and not by saying that the customer name is the 3rd paragraph. The issue is that you cannot just allow any arbitrary XML to be stored in the WordprocessingML package. This would become application specific, and it would break validation since all XML is valid. Not a great idea.

So the way to do this in Open XML is to use a w:customXML tag for it:

<w:p>
<w:customXml w:element="customerName" w:namespace="urn:my:order">
<w:r>
<w:t>test</w:t>
</w:r>
</w:customXml>
</w:p>

This allows you to embed business semantics in such a way that it is discoverable, and implementers not interested in using that feature can skip over it easily, without needing to know what application stored it there.

You can even easily extract your data using one simple generic XSLT as I have shown on my older Info Support blog:

http://blogs.infosupport.com/wouterv/archive/2007/10/11/Extracting-data-from-a-xml_2D00_mapped-document.aspx

Stephane, the only thing defective by design is your blog. Sorry dude, but please investigate before spending a Friday ranting.

And also excuse my slightly attacking nature of writing today. I am fed up with people saying stuff about Open XML on their blogs, influencing the ISO process, while the actual knowledge contained in the blog post is so extremely slim. Not even funny anymore.

[Edit]

I just have to respond to the last thing claimed in his post:

Something interesting to note is that Microsoft thinks that storing data inside the ZIP package independently of the document is a good thing. From a pure technical point of view, you can view this "Custom XML data" as a cache of values thanks to which the consumer is able to drill into the data without a connection to the actual data source (corporate data). But there is a major flaw. Anybody using this feature will end up storing arbitrary data in ZIP packages shared across colleagues and others inside and outside the organization. Eventually, confidential information from the corporate databases will end up there, and a PR disaster automatically follows. You don't want to use this feature.

Dude, so by separating the content from the formatting you are creating PR disasters? Wow. When not using CustomXML your document still displays a customer name, it is only harder to discover it as such. How does defining a semantic structure for it create a PR disaster? There is no more or no less data in the document itself.
[/Edit]

 

 

Comments notification error

The comment feature on my blog requires me to accept your comment before it is published. This way I want to prevent name-calling, but I think up until now I passed 100% of the items posted. Today I got an email asking why a comment didn't appear. I checked and found about a dozen comments in the 'pending' state. Accepted them all, and I think I know what is wrong. The thing broken is me. As a dev guy, my dev stuff (source control etc..) is in top notch working order, but my IT pro stuff (DNS, networking, AD…) is not really IT pro just yet, more IT intermediate… J

On separating style from fiction

As I mentioned in my previous blog post I was planning on writing something about the separation of document style (bold / italic etc) from document content (text, tables, images etc), and in this post I will go ahead and take action on that plan. The main reason is an event held two days ago at the Delft university called 'The war surrounding Microsoft's standard' (not a title to indicate an open discussion, but I digress). During one of the interesting discussions with some of the attendees one thing came up in the more than once; the notion of separating styles from document content, and how Open XML does this in a not-so-nice fashion compared to ODF. Something to investigate!

Note that I am not pro- or anti-ODF. If you use if to drive your business, may your business fare well.

Before going into the differences in implementation and what ramifications that has for developers of business solutions first let me discuss how you can style a document in Open XML and ODF. Note that I am not an ODF expert, so if there are any misrepresentations please feel free to publicly chastise me, or just tell me so I can adapt this post to my increase in knowledge. I will also focus on a simple document with text only, not on spreadsheets or presentations.

The Wordprocessing specification of Office Open XML supports three ways to define formatting for your text. After a simple inspection of the OpenOffice.org Writer application it seems that ODF support the same methods.

Format type

Usage

Default

Used when no direct format or style is applied.

Direct formatting

A container for formatting which is applied to a single element.

Styles

A container for formatting which can be applied to multiple elements.

The defaults are pretty obvious. What font, alignment, color etc… will be used when the author of the document hasn't explicitly set a value. Direct formatting is something that you do as an author when you realize that you want some piece of text to be bold, select it, and press the Bold button, usually on some type of toolbar (or ribbon). Next are the styles, something that is more usable because you create styles separate from the document and apply it to various pieces of content. We probably all know 'Heading 1', 'Heading 2' etc….

Separating the formatting from the content basically makes it easy to change the way your documents looks without needing to touch the content of the document (a bit duh-ish, I know). Note that I explicitly state that for the application of styles. In my opinion direct formatting is exactly the opposite. When you apply a style, you are basically saying 'I want this to be a heading', which is expressly different from 'I want this to be bold'. The first is document specific, the second content specific, e.g. styles should travel with the document, while direct formatting should travel with the content. You can see this in the way formatting and styles are stored and applied. ODF uses the same method of calculating the final picture for a piece of content which has a style applied as well as a direct format (and implicitly the document-defaults).

The following sample is the same for Open XML as for ODF. First the document defaults are applied to the text, next the style is applied (which can be a style-hierarchy), and finally the direct format is applied to the text. In the sample the style indicates that the text should be bold and italic. The direct format of the word 'some' says no bold and no italics, so the end result should be equal the last item.

Now if this is all the same for ODF and Open XML, the difficulty with this separation must be in the way the format stores this information. Let's take a closer look.

Let's first look at on ODF document which uses a direct format. Note that I removed unrelated content and indented the XML (for all samples). The following XML is part of the main document body. Various things that the XML shows is the use of a style definition for the span of text to store the direct formatting. The style is identified using a computer generated name (the author pressed the bold button, no name dialogs appeared obviously). The style definition is stored inside the same XML part as the textual content.

ODF – Direct format

Stored in main document part

<office:document-content>

    <office:automatic-styles>

        <style:style style:name="T1"

                     style:family="text">

            <style:text-properties

                fo:font-weight="bold" />

        </style:style>

    </office:automatic-styles>

    <office:body>

        <office:text>

            <text:p>

                <text:span
                    
text:style-name="T1">
                    
Text
                </text:span>

            </text:p>

        </office:text>

    </office:body>

</office:document-content>

Second sample, Open XML using direct formatting. There are some similarities to ODF, such as the usage of a container element for storing the properties (rPr versus text-properties). Only Open XML chooses to keep the properties enclosed in the content that has been formatted by the author and hence lacks the separate style definition with the computer generated name. You can also note the usage of the w:t element for storing the text, not used by ODF, which I think is due the mixed versus non-mixed content model, but that is a discussion for another day.

Open XML – Direct format

Stored in main document part

<w:document>

    <w:body>

        <w:p>

            <w:r>

                <w:rPr>

                    <w:b />

                </w:rPr>

                <w:t>Text</w:t>

            </w:r>

        </w:p>

    </w:body>

</w:document>

Now for the application of styles. First ODF. Two samples necessary since the style itself is stored in a different part than the content of the document (styles allow easy changing without touching content remember).The first sample shows the document content referring to the style, the second XML sample shows the style part of an ODF document. In the first sample you can note the usage of the exact same attribute to indicate the style being used as the sample using direct formatting. Also note that the name of the style is now functional since the author has chosen it when he created the style. The second sample is largely the same as the first ODF sample, only moved to a separate part in the ODF ZIP package.

ODF – Styles

Stored in main document part

<office:document-content>

    <office:body>

        <office:text>

            <text:p>

                <text:span
                    
text:style-name="MyStyle">
                    
Text
                </text:span>

            </text:p>

        </office:text>

    </office:body>

</office:document-content>

ODF – Styles

Stored in styles part

<office:document-styles>

    <office:styles>

        <style:style style:name="MyStyle"

                     style:family="text">

            <style:text-properties

                fo:font-weight="bold" />

        </style:style>

    </office:styles>

</office:document-styles>

Next of course are the same samples for Open XML. Some similar things going on here. The style itself is stored outside of the document content in its own part in the Open XML ZIP package. The style has a (somewhat) useful name and is referenced by the formatted content using that name. A big difference is the use of a different piece of markup to identify the style separate from the direct formatting, for which ODF uses the same attribute.

Open XML – Styles

Stored in main document part

<w:document>

    <w:body>

        <w:p>

            <w:r>

                <w:rPr>

                    <w:rStyle w:val="MyStyle" />

                </w:rPr>

                <w:t>Text</w:t>

            </w:r>

        </w:p>

    </w:body>

</w:document>

Open XML – Styles

Stored in styles part

<w:styles>

    <w:style w:type="paragraph" w:styleId="MyStyle">

        <w:rPr>

            <w:b />

        </w:rPr>

    </w:style>

</w:styles>

So now that we have examined some of the XML samples for direct formatting and styles let's talk about the results of this for you as a developer of software that uses this stuff. Five scenarios that I want to investigate.

Copying document content

Let's say I want to copy a paragraph from one document to another. Since I want to copy only a little bit of content I want the style of the target document to apply to the copied paragraph (styles travel with documents, direct formatting with content). When using Open XML you can look up the element that you need to copy, say a paragraph, and copy this element to the target document. This will copy not only the paragraph, but also the direct formatting applied to the paragraph and all its inner content. When using ODF, you look up the element that you need to copy, a paragraph again, and copy this element to the target document. Next you need to parse this paragraph and look up all direct-formatting styles that are used in the paragraph and in the inner content, next copy each of these separate styles into your target document, of course making sure to prevent naming collisions and adjusting the content accordingly. And in all likelihood you will have naming collisions since the name 'T1' used by the OpenOffice.org Writer application will probably always be used (I think even a GUID for the name would have made life easier here for ODF developers)

Open XML

ODF

  • Lookup element
  • Copy to target document
  • Lookup element
  • Copy to target document
  • Parse element
  • Lookup direct formatting styles
  • Prevent style name collisions
  • Copy styles

Adding new styles programmatically

If you create a new style programmatically, you need to prevent naming collisions again. In Open XML I open the styles part and check that single part. For ODF I need to open two parts, and check in both parts for naming collisions.

Open XML

ODF

  • Open Styles part
  • Prevent style name collisions
  • Create style
  • Open main part
  • Prevent style name collisions
  • Open styles part
  • Prevent style name collisions
  • Create style

Adding new styles through the UI

Take a look at the ODF sample of direct formatting and take notice of the 'T1' generated name being used. First of all, is this naming convention in the spec? (seriously, I haven't looked yet) Now what if I go into OpenOffice.org Writer and add a style using that exact same name (since I think T1 s a functional name for my style… somehow). What must be done now by the office application is finding and changing that piece of direct–formatted text since the name of the new style collides with the name of it. As a result the name of the direct format style changes. Hence you cannot easily rely on the name for identifying some formatted text. Given that the same attribute is used to identify a direct format as well as a style, you must go look if that style is a direct format, in which case you shouldn't cache the name somewhere in your code since it might change after being edited. Now you could reason that you should never rely on the name of a direct format, since it is only applied to a single piece of content the name shouldn't matter right? Wrong! The issue is that the same attribute is used to identify style and direct formatting, so you are stuck with checking this programmatically. This means you will need to learn more about ODF before doing this correctly, and in my opinion this makes things more difficult, not less. I can already image the naïve implementation saying that each style named T plus a number will probably be a direct format and not a real style.

(funny side-note is that at the event one of the speakers actually said that open source makes your life easy since you can peek into the source code instead of reading or defining things in a spec, which opens the floodgates for these types of implementation stupidities)

Combining style and direct format

While thinking about these issues suddenly I thought of another interesting detail. How can you apply both a direct format and a style to some content if the same attribute name is used to identify both levels of formatting. Given the output of OpenOffice.org Writer I would suspect you can't. End result? Just put a span in a span! The outer span points to the real style being used, the inner span points to the direct formatting style. In Open XML this is a no-brainer since there are different elements for indicating the style and the direct format.

ODF – Direct format and style

<office:document-content>

    <office:body>

        <office:text>

            <text:p>

                <text:span
                    
text:style-name="MyStyle">
                    <
text:span
                        
text:style-name="T1">
                        
Text
                    </text:span>

                </text:span>

            </text:p>

        </office:text>

    </office:body>

</office:document-content>

And for Open XML:

Open XML – Direct format and style

<w:document>

    <w:body>

        <w:p>

            <w:r>

                <w:rPr>

                    <w:b />

                    <w:rStyle w:val="MyStyle" />

                </w:rPr>

                <w:t>Text</w:t>

            </w:r>

        </w:p>

    </w:body>

</w:document>

Amount of text content used

One thing I heard earlier about Open XML's way of doing things is that it is more verbose. Take a look at the following character counts for the Open XML and ODF samples.

(Note that this is a somewhat lame example, real verbosity checks will probably take a bit more than a few small pieces of XML)

Sample

Open XML

ODF

Direct format

100 characters

326 characters

Styles

218 characters

365 characters

In conclusion

Both ODF and Open XML use the same ideas for separating formatting from content, and support similar notions, only Open XML takes a vastly different route. In my humble opinion, Open XML takes an approach which is better suited to development of solutions and makes it easier to work with the markup.

Hope it helps

1 - 10 Next

 Projects

Databinding toolkit for Word 2007Use SHIFT+ENTER to open the menu (new window).
Open XML Activities for Windows Workflow FoundationUse SHIFT+ENTER to open the menu (new window).
Package ExplorerUse SHIFT+ENTER to open the menu (new window).
Windows SharePoint Services 3 Workflow DesignersUse SHIFT+ENTER to open the menu (new window).
Word 2007 Source ViewUse SHIFT+ENTER to open the menu (new window).