Skip to main content

Wouter

Go Search
Home
Contact Me
  

 ‭(Hidden)‬ Admin Links

Copying a chart from a spreadsheet to a presentation

Needing to copy a chart from a spreadsheet into a presentation is probably a common scenario for many of us. Luckily the Open XML SDK makes this remarkably easy. Let me show you how.

Note: For this code to work you need the Open XML Format SDK 2.0 – April CTP which you can get on MSDN.( http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0)

There are three steps involved in moving a chart from a spreadsheet into a presentation.

  1. Opening the source worksheet and target slide
  2. Copying the chart ( a separate part in the package)
  3. Creating a shape that references the chart

Let's go ahead and implement this feature. You can get the code for this sample here (http://blogs.code-counsel.net/Wouter/Downloads/CopyChart.zip)

First you will need to create spreadsheet similar to the following sample. I am using a simple table and a bar-chart.

Step 1 – Opening the spreadsheet and the presentation

The first step is to open both the spreadsheet and the presentation. After opening these two files the source and target parts need to be retrieved. We will copy the chart from a worksheet into a slide. This sample code assumes that the first worksheet contains the chart, and that the chart is copied to the first slide in the presentation. After retrieving the source and target parts the chart is copied and then a shape is added to the presentation that shows the chart.

using (SpreadsheetDocument spreadsheet =
    SpreadsheetDocument.Open("Spreadsheet.xlsx"false))
using(PresentationDocument presentation = 
    PresentationDocument.Open("Presentation.pptx"true))
{
    // Gets the from / to items
    WorksheetPart worksheet = spreadsheet.WorkbookPart.WorksheetParts.First();
    SlidePart slide = presentation.PresentationPart.SlideParts.First();

    // Copies the chart from the spreadsheet into the presentation
    string relationshipID = ImportChartIntoPresentation(worksheet, slide);
    // Adds a shape to the slide that shows the chart
    ReferenceChartFromSlide(slide, relationshipID);
}
Step 2 – Copying the chart data

Both SpreadsheetML and PresentationML use the same format for defining charts. The chart format is also fully standardized as part of DrawingML (the same DrawingML that allows you to do cool things such as SmartArt graphics). Reusing a chart from a spreadsheet in a presentation is therefore very easy. You only need to copy the full XML definition of the chart into the presentation. The key difference between SpreadsheetML and PresentationML when it comes to charting is how the chart is embedded into the document. SpreadsheetML uses cell-based anchors, something which you can obviously not use in a presentation. PresentationML uses a special shape to reference the chart.

The following code opens the chart referenced by the worksheet. It then creates a new chart attached to the slide and copies over the XML. Note that the XML definition of the chart, which includes the plotted data, is not changed. This makes it easy for us since we only need to worry about how the slide will display the chart.

static string ImportChartIntoPresentation(WorksheetPart worksheet, SlidePart slide)
{
    // Assume chart is the only drawing on the first worksheet           
    DrawingsPart worksheetDrawings = worksheet.DrawingsPart;
    ChartPart spreadsheetChart = worksheetDrawings.ChartParts.First();

    // Copy the chart over to the first slide of the presentation
    ChartPart presentationChart = slide.AddNewPart<ChartPart>();
    using (Stream stream = spreadsheetChart.GetStream())
    {
        presentationChart.FeedData(stream);
    }
    // Return the relationship ID so the chart shape can 
    // reference the chart
    return slide.GetIdOfPart(presentationChart);
}
Step 3 – Creating a chart shape

The last step is to create a shape in the presentation that points to the newly added chart part. Presentations use the p:graphicFrame shape to hold pointers to special shapes like a chart. The graphic frame is a normal shape with the same unique identifiers as other shapes and with information on size and positioning. The XML structure that you need to create is not very complex. The easiest way to find out what to create is to use an Open XML enabled office suite such as Lotus Symphony (didn't some company deem Open XML too complex to implement?) to create a chart and then look at the XML that is being generated.

The following sample shows how to define a shape on a slide that points to a chart. A few things of interest are the unique identifiers, the positioning of the chart, the fact that the graphic is pointing to a chart and the relationship ID of the chart.

First let's discuss the unique identifiers stored in the p:cNvPr node. Well, they need to be unique... Insightful isn't it? You can use X-Path to ensure the ID and name are unique, or just hope for the best like I am doing in the code sample accompanying this blog post.

The size and position stored in the p:xfrm element are also important if you want people to actually see the chart. Shapes of 0-width are supported. Both the position and size are specified in EMUs (English Metric Units). Think of an EMU as the least common denominator between various measurement systems such as meters, inches and pixels. This ensures you do not need floats for calculating positions. Computers do not like floats, they like integers much better.

The a:graphicData indicates that it contains a reference to a chart by using the uri attribute. You can store a variety of graphics in a graphic frame and this attribute indicates exactly what can be expected. The chart itself is not stored inline but it is referenced using a relationship ID. The XML structure is similar to the following example.

<p:graphicFrame>
    <
p:nvGraphicFramePr>
        <
p:cNvPr id="2" name="Chart 1" />
        <
p:cNvGraphicFramePr />
        <
p:nvPr />
    </
p:nvGraphicFramePr>
    <
p:xfrm>
        <
a:off x="1524000" y="1397000" />
        <
a:ext cx="6096000" cy="4064000" />
    </
p:xfrm>
    <
a:graphic>
        <
a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/chart">
            <
c:chart r:id="rId2" />
        </
a:graphicData>
    </
a:graphic>
</
p:graphicFrame>
The code to generate this markup is very similar to the markup itself. The Open XML SDK provides us with classes corresponding to each XML element. The classes can be composed together by using the constructor. The first few bits are displayed below. As you can see it follows the exact same XML structure. Later versions of the SKD might provide an even higher level of abstraction.

GraphicFrame graphicFrame = new GraphicFrame(
    new NonVisualGraphicFrameProperties(
        new NonVisualDrawingProperties()
        {
            Id = 2, // should calculate this to ensure it is unique
            Name = "Chart 1" // should also be unique
        },
        new NonVisualGraphicFrameDrawingProperties(),
        new AppNonVisualDrawingProperties()),

That's it. Now if you open the presentation you should see something that looks like the following picture. Outside of improving our sizing and positioning code you should be good to go!

Hope it helps! And have fun developing with the Open XML SDK!

Proposing an easy way to improve the ODF spec

Now that I am reading the ODF spec in more detail I have seen a recurring pattern. It is a pattern which hinders readability and could be easily improved. Here's the deal.

Take the ZIP package structure as a sample. In the ODF 1.2 spec (I use draft 6) you find that the ZIP structure is discussed in chapter 2 of part 3. (yes, the ODF spec is moving into separate parts just like Open XML). However, some important bits of information are not there. Each ODF document uses a concept called document roots as a starting point, and spreadsheets, documents and presentations use the same roots. Each root is stored in a separate file in the package. Since these files are common to all documents, and vital for understanding ZIP packages, you would expect the details to be available in the chapter on package structures. Instead this information is in another part of the spec (part 1, chapter 2.2 Document Representation). Now one might argue that the package and the files contained within are two different concepts, and perhaps if the ODF concept of a package is more elaborate (say, like the Open Packaging Convention of Open XML) this might be a valid point. But for the current state of ODF, package means that there are these 4 roots, always in the same location. To me it was hard to find this, since I looked in the packaging information, not anywhere else.

Something similar occurs in another area. This time the issue is with encryption. Encryption is discussed in chapter 2.3 of part 3. However, the fact that preview thumbnails are never encrypted and have a special process is not mentioned. Instead that is mentioned not in the part on encryption, but in the part on thumbnails. Logically I will never see that part on thumbnails when I try to decrypt a document. It is not on my scope and I expect to see all relevant info in the right place.

So, I propose improving the structure of the spec by consolidating the information into more logically correlated blocks.

Help me crack this cryptic nut

As some of you may know I have been trying to decrypt an ODF file encrypted with Open Office using .NET code.

After failing to do this I am looking for people's help in finding the issue with my code. Hope someone spots what I didn't. Find the entire project in the download area. (http://blogs.code-counsel.net/Wouter/Downloads/ODFDecrypt.zip)

The password for the file is 'test1'.

I downloaded the Blowfish implementation using the reference from the author's website (http://www.schneier.com/blowfish-download.html).

The failure is the checksum validation. The decompression is not performed yet since that would be next.

The relevant code is as follows. The omitted code opens an ODF file using SharpZipLib, finds all encrypted entries and performs the following steps:

// Retrieve the necessary values for decryption 
int entrySize = Int32.Parse(entrySizeString);
int iterationCount = Int32.Parse(iterationCountString);
byte[] iv = Convert.FromBase64String(ivStringBase64);
byte[] salt = Convert.FromBase64String(saltStringBase64);

// The Rfc2898DeriveBytes class implements 
// PBKDF2 using HMACSHA1 natively in .NET
Rfc2898DeriveBytes keyGenerator = new Rfc2898DeriveBytes(
    passwordHash, salt, iterationCount);
// Use it to get a 16 byte key
byte[] key = keyGenerator.GetBytes(16);

// Create a BlowfishCFB decryptor and initialize 
// with the IV
BlowfishCFB decryptor = new BlowfishCFB(key, 0, key.Length);
decryptor.SetIV(iv, 0);

// Decrypt the data into a memory buffer
MemoryStream compressedData = new MemoryStream();
ZipEntry zipEntry = file.GetEntry(entryPath);
using (Stream stream = file.GetInputStream(zipEntry))
{
    // perform a buffered approach
    int bufferSize = 10 * BlowfishCFB.BLOCK_SIZE;
    byte[] encryptedReadBuffer = new byte[bufferSize];
    byte[] decryptedWriteBuffer = new byte[bufferSize];
    int bytesRead = 0;
    // Try and read one full buffer of data at a time until 
    // there is no more data to read
    while ((bytesRead = stream.Read(|
        encryptedReadBuffer, 0, bufferSize)) > 0)
    {
        // decrypt the buffer that is open, 
        //up to the number of bytes actually read
        decryptor.Decrypt(
            encryptedReadBuffer,// read from 
            0, // at position
            decryptedWriteBuffer, // write to
            0, // at position
            bytesRead); // how many bytes to process?
        // The write buffer now as the decrypted data
        // write the buffer to the memory stream 
        compressedData.Write(
            decryptedWriteBuffer, 0, bytesRead);
    }
}
// Set the stream to the beginning for 
// calculating checksums
compressedData.Position = 0;

// Find out how many bytes are used in the checksum
// max of 1024
int nrOfBytesForValidation = compressedData.Length > 1024 ? 
    1024 : 
    (int)compressedData.Length;

// Get the bytes for calculating the checksum
byte[] validationBytes = new byte[nrOfBytesForValidation];
compressedData.Read(validationBytes, 0, nrOfBytesForValidation);

// Create first checksum using the SHA1 hash 
// for the validation bytes
byte[] checksum1 = SHA1Managed.Create().ComputeHash(
    validationBytes, 
    0,
    nrOfBytesForValidation);

// Create the second checksum from the 
// base 64 encoded metadata
byte[] checksum2 = Convert.FromBase64String(checksumBase64);

// Check if they are equal
for (int i = 0; i < checksum1.Length; i++)
{
    if (checksum1[i] != checksum2[i])
    {
        throw new Exception();
    }
}

Is ODF designed to be not implementable without source code?

The past week I have been looking into developing ODF solutions using nothing but information in the specification. My first focus has been on encryption of documents, and I have been largely unsuccessful.

Dennis Hamilton has been keenly able to jot my remarks into a more sensible story. The core issue is that ODF just does not specify a lot of vital details, like the character encoding of the password used for encryption, or which files to encrypt and which ones not to.

I have two questions that keep on popping up. I just totally do not understand the following two observations:

How come I am the one to notice how deficient ODF really is?

I am not a security expert and I have just been developing software for 10 years. How come the world-wide team of experts allowed ODF to go through ISO? Did they miss these glaring omissions? How is that even possible? What about the person writing that part of the spec? Didn't he have the clue that it is vital to know the password encoding? I just cannot imagine that it wasn't known, especially since a noob such as me can figure that out. How come ODF 1.2 changes exactly nothing in this part, while it must be known that the text is insufficient?

How come there has already been more text written on my comment than on encryption in the spec itself?

This is another one of those amazing things. There has already been ten times more text on my comment than what ODF writes on encryption. How the hell is that possible?

Now I know I should not be saying this, but I will anyway. Given the fact that the authors of ODF are experts, and given the fact that there is so much vital information missing from the spec, is it too much to say that the under-specification perpetrated by ODF is by design or at least intentional?


Say you want to standardize the way in which documents are encrypted, not any document, all! the documents in the world. You have a lot of discussions on it, what to do, what not to do, that sort of thing.

You then condense these discussions into 5 steps, totaling exactly 200 words (including numbering, and the header).

WTF?????


I know of at least one company that is not allowed to look at GPL source code that you need to read in order to implement ODF properly. Guess who?

These are just a few simple questions. Perhaps the ODF chair member Rob Weir would like to spend some of his time writing about these things instead of on perceived deficiencies in Open XML. Being the chair of ODF I would expect him to be highly passionate and equally worried about these things as I am. I know for sure that I cannot implement ODF using the information I am provided with.

Implementing ODF: Encryption

Recently Malte Timmerman of Sun blogged about working with ODF document encryption. With my experience building Open XML solutions using Microsoft .NET, I thought it would be a nice test to see if I could write some code to decrypt documents created by Open Office, purely using information in the ODF specification or any referenced materials.

During my hour-long research of the ODF spec I kept a record of my experiences. Often I don't do this and then forget the exact order of things…really annoying. This is the write-up of that hour. This post reflects my genuine experience with finding out about ODF encryption. I have not seen this part of ODF before, or the specification of the package structure.

Learning about ODF encryption

I started out from the blog post of Mr Timmerman, and downloaded the ODF specification from his hyperlink. I could start reading at chapter 17, but as anyone would do, I skipped right to 17.3 mentioned by Malte. ODF encryption it is.

The first thing that struck me is that the entire text on ODF encryption was about ½ an A4 in size. Not what I was expecting given the complexity and importance of this topic.

The first sentence was also not what I expected. It says:

The encryption process takes place in the following multiple stages:

BAM. There you have it. These are the steps, now shush. (FYI, stages are always multiple, that why it's stages)

This is seriously the first time I have seen this little information in a specification of this importance. Dutch government for instance mandates that software should be ODF enabled, but I think there was probably little research on what that means and how that will pan out when you start building real software. This is what I jotted down at that point:

  • What can be encrypted
  • Goal
  • How
    • Who file
    • Per zip entry
    • Per zip folder
    • Per logical part
  • Process

I can't place the 'who file' remark, so let's scrap that one (hey, it was late. I had a DevDays session to present, stuff happens). However, the other things are quite relevant. What can be encrypted? Binaries, XML, everything? How? Per zip entry, folder, or logical structures? What is the general process? Why even encrypt? Can't we encrypt the entire file?

As you can imagine, here I am, lost after the first sentence.

Now don't take this the wrong way, but can't someone at least write some words of explanation before showing the steps on how to do it? I am a trainer, and this is not how you train. And IMHO, a spec is used to train people. People bad-mouth Open XML for lots of things, but if there is anything it has on ODF it is documentation. Size does matter.

So, steps it is.

Going through the steps

1: A 20-byte SHA1 digest of the user entered password is created and passed to the package component.

Ok, I know SHA1. There is support available in .NET, no worries. But what is a package component, and who passes who what? If it is my application, isn't it already having the password? So what's up with the 'passing'? Let's do a search for 'package component', nope, first mention. It'll probably just be 'application'.

Then something more critical struck me. In what encoding must the password be placed? Not sure, and definitely not sure if it is mentioned somewhere in the ODF specification. And even if it is mentioned, the password is not exactly part of the document, so does the ODF specification even apply for that?

2.The package component initializes a random number generator with the current time.

Ok. What's up? There are many pseudo random number generators. Do I need to use a cryptographically strong generator, or just do a rand()? I am no expert, please enlighten me.

3.The random number generator is used to generate a random 8-byte initialization vector and 16-byte salt for each file.

For each file? When did files come in to play? Ok. So first there is no explanation, just steps. This is probably one of those in-line explanations. Does this mean I always have to encrypt each file in the ZIP, or just each ODF file I want to encrypt?

4.This salt is used together with the 20-byte SHA1 digest of the password to derive a unique 128-bit key for each file. The algorithm used to derive the key is PBKDF2 using HMAC-SHA-1 (see [RFC2898]) with an iteration count of 1024.

WOW. Hold ON!

The salt is "used together" with the 20-byte SHA1 digest???? WTF??? How does that work? What is "used together" for a term to use in a specification? I could guess now that we do PASS+SALT, but what if it is SALT+PASS (or SSTAPALS J lol). Safe to say that at that point in time I was not feeling that confident with getting my implementation right without access to Open Office. Not happy.

5.The derived key is used together with the initialization vector to encrypt the file using the Blowfish algorithm in cipher-feedback (CFB) mode.

Blowfish? Isn't the inventor on-record saying that he is amazed that Blowfish is still used? Not expecting too much support for it in .NET. That'll be a library to build or license. A comment on the blog of Malte indicates that it is also an issue for people working in government.

Ok. I have lots of questions on the steps. If I don't find this information I'd better debug my code J Downside is that I'll now have to test not only my code, but also how it works against all the office apps that I want to interoperate with. More work for me. Not happy again.

Compression of encrypted files

The encryption section ends with a small blurb on compression.

Each file that is encrypted is compressed before being encrypted.

What, what ,what? Even binaries? Using what algorithm? Earlier I felt that all files should be encrypted, while this sentence makes me feel only some files should. Which is it? There are at least a few options here (take into account that I am not a zipping expert)

  • ZIP allows files to be stored compressed or uncompressed individually.
  • I need to manually ZIP each entry in the (parent) ZIP, using the .NET ZipStream or similar.
  • They actually mean to ZIP the entire package, not files inside the package.

Is it just me, or do we all feel the point of a ZIP container is to ZIP stuff in it? Why do I need to that again? This will be some more integration tests.

Since it was the first mention of compression that I found, perhaps the next section will hold some information on this topic. So I go scan section 17.4, only to find the second paragraph mention 'magic number' mechanisms such as Unix file/magic utility.

And here I am, yet a little more lost than before. Now I also think ODF might be a little depending on one platform.

To allow the contents of the package file to be verified, it is necessary that encrypted files are flagged as 'STORED' rather than 'DEFLATED'.

Ok, probably something that is defined by the ZIP specification. No worries.

As entries which are 'STORED' must have their size equal to the compressed size, it is necessary to store the uncompressed size in the manifest.

I love sentences starting with 'as'. I understand only a little bit of this sentence. It first says 'because zipped entries have a size equal to their size (yeah, duh), you need to store the uncompressed size too. Not sure why I should store the original size separately. Normally when I send a ZIP file to someone, I don't send over an extra text file with that information either. Why is this so necessary and why don't you please tell me about it, and not make me feel more stupid than I am.

Here's one thing I love about the Open XML specification. There are many anchors in the PDF and 'see… blah' links. You can easily navigate using clicks and alt+ß.

Conclusion

So after my initial research of ODF encryption I found it:

  • Grossly underspecified
  • Incomplete on important details
  • No mention of decryption
  • Using algorithms which even the author does not want you to use, without platform support in at least .NET, Java perhaps too.
  • Hard to navigate since the specification does not use anchors / references.

As you can imagine ODF does not hold up to scrutiny that well. How can this important topic be so under-documented?

I will continue to build the decryption application and let you know about my progress.

The story of Lotus Symphony

YouTube has a video on the story of Lotus Symphony.

Around 2:15 into the video General Manager Bob Picciano indicates that it is not lost on IBM that the ballgame is called 'world class collaborative innovation'.

It is not lost on me that the slide that he shows during his talk uses graphics taken from Microsoft (link works only when using the US Office Online)

Now that is innovative!

Dennis Hamilton, also known as Orchmid later told me the graphic was freely available somewhere. Consider me corrected on this.

 

WSS 3 Workflow Tools @ DevDays 09

Come visit my workflow session @ DevDays 2009 tomorrow. 95% coding, 5% lame jokes.

During the session you will learn how easy it is to use InfoPath for developing workflow forms in SharePoint. Then I will show that using ASP.NET can be just as easy, and how this approach has a few special benefits for SharePoint developers.

I can't wait! Can you?

SharePoint Preconference @ DevDays The Hague 2009

In two days time you can take part in the deep-dive into SharePoint development at the DevDays in The Hague. It's an awesome opportunity to learn about new and cool ways to build SharePoint products using WCF, Workflow, Silverlight and MOSS Publishing. If you have time, I'd recommend attending this low-cost, high-power event. There is a ton of content to discuss. Personally I'd like to focus on the following topics:

SharePoint Overview

Receive a little background on SharePoint and the development tools. Learn about community tools such as VSeWSS, WspBuilder, custom MSBuild configuration and other means to get your WSP ready for prime-time.

Exposing SharePoint data using Web Services and WCF

Learn how to expose SharePoint data to the outside world using Web Services and WCF endpoints.

Automating stuff using Workflow Foundation

Nobody likes repetitive chores. SharePoint integrates workflow foundation at its core and allows you to automate various tasks. Learn how to build custom workflow activities which properly use the workflow data caching feature of SharePoint. Learn how to use InfoPath and ASP.NET as your form technology.

Making stuff look better with Silverlight integration

The big pain of any web application is that it is a web application, and the HTML that it produces is difficult to style and make dynamic. Learn how to integrate Silverlight into your page design, and communicate with back-end data using WCF.

Publishing with MOSS

Combine the things you learned into SharePoint public facing websites which conform to standards.

 

Hope to see you there!

Code Counsel goes live

With pride and joy I announce my homepage for Code Counsel is now live.

My homepage is built with the MOSS Publishing infrastructure. It is compliant, optimized for search engines and has a bunch of tech goodies I'll talk about later. My next step is to take my code and push it out to CodePlex as a free learning tool. In the mean time, visit my home on the web to learn about other things I do besides blogging and twittering.

Validity to schema is not an ODF conformance requirement

A comment by Rob Weir:

To the point: can you point me to where it is stated that validity to the manifest schema is an ODF conformance requirement? No, I don't think you can, because such a requirement does not exist.

Wow. Now compared to Rob I am a noobie in the field of document formats, but I would expect that if you create a normative schema, you'd want people to normalize based on that schema. You know, that's why you created that schema stuff in the first place. Apparently, for ODF it does not really matter. Rob must be right on all matters ODF. We can all just conform without it.

Hey I can do ODF too:

<wouter:odf>I am an interoperable document</wouter:odf>

Cool stuff those open-ended document formats. Who can I invite to look at my parsing code, you know, for interoperability. Got a schema somewhere, I think.

Of course Rob will now say 'it is not a conformance requirement', but what is the difference. Can you be un-conformant to the spec but be valid? Why did you create a spec then? Currently it seems that for ODF we can be un-conformant in a valid way. Funky Funky

1 - 10 Next

 Projects

Databinding toolkit for Word 2007Use SHIFT+ENTER to open the menu (new window).
Open XML Activities for Windows Workflow FoundationUse SHIFT+ENTER to open the menu (new window).
Package ExplorerUse SHIFT+ENTER to open the menu (new window).
Windows SharePoint Services 3 Workflow DesignersUse SHIFT+ENTER to open the menu (new window).
Word 2007 Source ViewUse SHIFT+ENTER to open the menu (new window).