Skip to main content

Wouter

Go Search
Home
Contact Me
  

Wouter > Posts > 10 steps to building XHTML compliant and performing MOSS publishing sites
10 steps to building XHTML compliant and performing MOSS publishing sites

In this somewhat lengthy blog post I will provide guidelines for building XHTML compliant master pages for MOSS publishing sites in 10 easy steps. There are already some interesting considerations by Andrew Connell on MSDN that you might also want to read up on.

Why would you want to do this? To me it is all about consistent rendering across modern browsers. No differences in the box model, that sort of stuff. And wouldn't you say that the W3C logo is the sexiest you have ever seen?

When it comes to making MOSS XHTML compliant there is a lot that needs to be done. There are controls which don't render well, script registrations to fix and a few more issues.

First you should probably consider which scenarios you want to support. For intranet publishing or team sites compliancy is usually not that interesting. You just demand the specific browser employees can use so who cares if it is compliant, as long as the page looks decent in that browser. For publically accessible publishing sites it is a different ballgame. You do not know which browser will be used and your homepage is your calling card, not a place where work gets done. Having one pixel differences in the page display can really kill your visual design. XHTML is no perfect solution, but it is a lot better than sticking to HTML 4.01 where you know stuff will happen. Just considering the public read only view of a SharePoint site makes life much easier. You will not have to re-implement stuff like the site actions menu. The solution you will read about here targets anonymous users only, no editing capabilities at all.

When it comes to the approach to reach compliancy for anonymous users there is one common tendency. People just love post-processing the generated HTML and scrub out stuff that breaks the validation. I see this as a perfectly working solution, just not a high-perf solution. I started out wondering if you can get to the same point with just replacing a few pages and controls. Since there will be no post processing, performance does not suffer, and you can feel like the awesome developer you are

So our goals:

  • XHTML compliancy for anonymous read access
  • No post processing or scrubbing of HTML
  • Some extra smarts like delay loading scripts when possible, meta-tags etc…
  • Support for variations in MOSS publishing

Let's do it! In 10 simple steps even!

There are a few areas that you need to invest time in order to get compliant. Here's the list of to-dos. Not all steps are purely focused on XHTML. A few are also about adding the right metadata, scripts, that sort of stuff.

  1. Ensure the right <!DOCTYPE!> declaration is used
  2. Fix the <HTML> tag
  3. Lazy loading / not loading core.js
  4. Rendering script links
  5. Rendering publishing pages
  6. Adding meta tags
  7. Overriding the presence feature
  8. Container controls for editing capabilities
  9. Rich Text Editor
  10. Done!

1. The <!DOCTYPE!> declaration

This is of course a no-brainer. XHTML compliant websites make use of specific doc types, and you should replace the default SharePoint doc type with one indicating the page is XHTML. I choose to use XHTML-Strict using the following doc-type declaration.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Wouldn't it be nice if that was all?

2. The <html> tag

The next thing in your HTML is the <html> tag. If you take a look at a basic <html> tag here's what you will find:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" dir="ltr">

What are you seeing here? First there is the xmlns attribute providing the XML namespace. There are two language tags and finally the page layout direction. To get this tag you have two options. Either you hard-code it into your master page, but then you are somewhat stuck with those language tags. Wouldn't it be nice if these tags actually represent the language of the page based on the language pack of the site or the MOSS publishing variation? I thought so too. In my master page this tag is generated using a special control called the HtmlControl:

<CC:HtmlControl runat="server" dir="<%$Resources:wss, multipages_direction_dir_value %>">

This control is rather small, like many of the components you will find in this article. It is a normal ASP.NET control which reads the page language and then outputs the right tags for the lang and xml:lang attributes. The actual page language is retrieved using a helper class called PageLanguage, which will later be used in other controls too.

[ConstructorNeedsTag(false)]
public class HtmlControl : HtmlGenericControl
{
  public HtmlControl() : base("html")
  {}

  protected override void RenderAttributes(HtmlTextWriter writer)
  {
    writer.WriteAttribute("xmlns", "http://www.w3.org/1999/xhtml");
    string language = PageLanguage.GetPageLanguage();
    writer.WriteAttribute("lang", language);
    writer.WriteAttribute("xml:lang", language);
    base.RenderAttributes(writer);
  }
}

Inside the PageLanguage class I use variations combined with language packs to determine the real page language. The grist of that code is just a few lines long:

CultureInfo cultureInfo = null;

if (DetermineVariationBasedLanguage(out cultureInfo) == false)
{
  cultureInfo = new CultureInfo((int)SPContext.Current.Web.Language);
}
language = cultureInfo.TwoLetterISOLanguageName;

There you go. A fully functional <html> tag. Nice!

3. Meta tags

If you go further down the master page you will first end up in the <head> of the HTML page. There are a few things that need to be done here in order to get compliant and a few more to be smart with scripts. First let's look at <meta> tags. My master page adds four <meta> tags. My page layouts add a few more, but these are used for search engine optimization. Let's focus on those in the master page.

The first <meta> tag can be hardcoded. It is the ContentType meta tag.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

The second meta tag is the Expires meta tag. MOSS publishing pages already support the concept of being expired, but this information is not pushed down into the HTML stream. The following control reads that expiring date and puts in the head of the page where it belongs.

<CC:ExpiresMetaTag runat="server" />

The code for this control is of equal complexity as the HtmlControl. Just open the publishing page and read the EndDate field. There is just one funky thing to worry about and that is the null or 'not-set' date. This value is hardcoded in MOSS to some date value. This somewhat sucks since you will need to copy this 'not-set' value and check for it. If MOSS changes its internals, this code dies painfully.

DateTime NeverEndDate = new DateTime(
  0x802, 1, 1, 0, 0, 0, DateTimeKind.Utc);

if
(SPContext.Current.ListItem != null &&
  PublishingPage.IsPublishingPage(SPContext.Current.ListItem))
{
  PublishingPage page = PublishingPage.GetPublishingPage(
    SPContext.Current.ListItem);
  if (page.EndDate != NeverEndDate)
  {
    writer.AddAttribute(HtmlTextWriterAttribute.Name, "expires");
    writer.AddAttribute(HtmlTextWriterAttribute.Content,
      page.EndDate.ToString(
        DateTimeFormatInfo.InvariantInfo.RFC1123Pattern));
    writer.RenderBeginTag(HtmlTextWriterTag.Meta);
    writer.RenderEndTag();
  }
}

More tag soup to follow. Next is the Robots <meta> tag. There is already a Robots meta tag included in SharePoint. Unfortunately it generates markup which uses CAPITALS for the tag names which is not allowed by the smart people of the W3C. This is just my cheap imitation of the built-in robots meta control.

<CC:RobotMetaTag runat="server" />

And the code for this control would be similar to the following.

writer.AddAttribute(HtmlTextWriterAttribute.Name, "robots");
bool indexed = SPContext.Current.Web.ASPXPageIndexed;
if (indexed) 
{
  writer.AddAttribute(HtmlTextWriterAttribute.Content,
    "index");
}
else
{
  writer.AddAttribute(HtmlTextWriterAttribute.Content,
    "noindex, nohtmlindex");
}

That's <meta> tag number three. The last one is the Language tag. Hey, we need one of those too! Nice detail is that it can plug into the HtmlControl language infrastructure.

<CC:ContentLanguageMetaTag runat="server" />

The full implementation of this tiny tag is as follows:

public class ContentLanguageMetaTag
 
: WebControl
{
  protected override HtmlTextWriterTag TagKey
  {
    get { return HtmlTextWriterTag.Meta; }
  }

  protected override void AddAttributesToRender(
    HtmlTextWriter writer)
  {
    writer.AddAttribute(HtmlTextWriterAttribute.Name,
      "content-language");
    writer.AddAttribute(HtmlTextWriterAttribute.Content,
      PageLanguage.GetPageLanguage());
  }
}

Life can be easy indeed!

That's it for the meta information. I add a few extra like Keywords and Description used for search engine optimization, but that's another story.

4. Scripts

Still inside the <head> tag you will find the script registrations. I don't know about you, but for publishing I like to keep the payload as small as possible, meaning that registering core.js is a no-go area if at all possible. Luckily Microsoft has a semi-supported let's not load core.js approach which we can use to prevent the scripts being loaded. Downside is that the approach you need to take works for anonymous access, but authenticated users still need to have the scripts for the editing menus to work. The approach is documented on MSDN.

First you indicate to SharePoint that no scripts must be loaded, by adding a script tag to the page (??? lol).

<SharePoint:ScriptLink runat="server" />

Next you add core.js only when the user is authenticated. You can build another one of these tiny controls for this.

<CC:CoreScriptLink runat="server" />

The implementation is horribly simple.

if (Context.Request.IsAuthenticated)
{
  ScriptLink.RegisterCore(Page, true);
}

Whew! We've finally resolved our script issues! There will be no core.js being pushed to anonymous users at all. Of course you should not be making use of any of the features provided in this script library, but you'll probably not need to anyway for your public facing website.

5. Presence

What's next? Since we're talking about scripts anyway, let's discuss the presence feature of MOSS. Presence is all about that menu that expands on people names allowing you to IM and email that person amongst other things. It is client side ActiveX. Many public publishing sites have forgotten about this feature and leave it enabled, which tends to make an ugly information bar pop up in Internet Explorer.

How's that for being unprofessional? There are a few ways to work around this. In order to disable this you need to change the implementation of a core script function. Luckily Javascript allows you to replace functions by naming them the same and through load presedence.

<CC:PresenceOverrideScriptLink runat="server" />

This takes care of it all. It is another one of those small controls. This one just registerers a script for anonymous users that overrides the default capability of SharePoint.

if (Context.Request.IsAuthenticated == false)
{
  writer.AddAttribute(HtmlTextWriterAttribute.Type,
    "text/javascript");
  Uri scriptUri = new Uri(new Uri(SPContext.Current.Web.Url),
    "_layouts/CodeCounsel/axo.js");
  writer.AddAttribute(HtmlTextWriterAttribute.Src, scriptUri.ToString());
  writer.RenderBeginTag(HtmlTextWriterTag.Script);
  writer.RenderEndTag(); 
}
 

You have now created all the content that goes into the <head> of the page. Of course the <body> needs fixing too! And since this page is not static, more content will be added to the <head> at runtime that we will need to fix. Sigh. Let's first talk about the <body> tag difficulties.

6. Containerized controls

Containerized, is that even a word? The spell check seems to think so. Good enough for me.

What I mean to say here is that you need to put various core controls such as the Site Actions menu into a container which prevents the control from executing at all in certain scenarios. The reason being that these controls register non-compliant script markup somewhere during the Load phase. Script that you cannot fix without post-processing the entire HTML (you cannot easily break the page lifecycle of a single control to stop it from registering scripts).

For the site actions menu and the Web Part Manager I have two wrapper controls. These wrappers only add the Site Actions menu or the Web Part Manager when the user is actually authenticated. If the control is never added, it never registers uncomforming script, and we're happy again. My Site Actions menu looks like this:

<CC:SiteActionMenuContainer runat="server" />

The control is again supremely easy to build:

if (Context.Request.IsAuthenticated)
{
  Controls.Add( Page.LoadControl(
    "~/_controltemplates/PublishingActionMenu.ascx"));
}

It's the same for the Web Part Manager.

Almost there!

7. Adaptive rending

ASP.NET supports the notion of control adapters that allow you to modify the way controls are executed on the page. You can also change the generated markup. You can imagine how easy it is to use this feature to change the breaking HTML.

There are two control adapters that are needed to create a compliant page. Noticed that <SharePoint:ScriptLink> control in the page <head>. This is the largest source of headaches. It generates tons of non-compliant markup like <script> tags with a LANGUAGE attribute. Not cool! The nice thing is that a ScriptLink is a control, and thus we can use a Control Adapter to modify the ugly markup it generates. You cannot replace or remove ScriptLink since that would break many features under the covers.

To move scripts into compliancy you need various string replacements. Note that this is the same as post-processing /scrubbing the entire page, but on a much smaller scale.

public override void Write(string s)
{
  s = s.Replace("<SCRIPT LANGUAGE='JavaScript'", "<script");
  s = s.Replace("</SCRIPT>", "</script>");
  s = s.Replace("type=\"text/javascript\"", "");
  s = s.Replace("type=\"text/JavaScript\"", "");
  s = s.Replace("<script", "<script type=\"text/javascript\"");
  s = s.Replace("language=\"javascript\"", "");
  s = s.Replace("language=\"JavaScript\"", "");
  s = s.Replace("defer", "defer=\"defer\"");
  InnerWriter.Write(s);
}

The real funky Control Adapter that you need is caused by the WebPartPage base class used on all publishing pages. It generates one! line of markup that is non-compliant, namely a <script> tag which is missing the type="text/javascript" attribute. Something like the following:

<script> var MSOWebPartPageFormName = 'aspnetForm'</script>

The real pain you'll feel is that this script is generated in the Load event of the WebPartPage base class. Hard to get to indeed! After some reflectoring you'll find that a Control Adapter for the PublishingPage class will allow you to replace this script. However, this is totally funky, perhaps supported, but perhaps not (it doesn't break anything though)

protected override void OnInit(EventArgs e)
{
if (Page.Request.IsAuthenticated == false &&
  Page.Form != null)
{
  SPWebPartManager webPartManager = (SPWebPartManager)SPWebPartManager.GetCurrentWebPartManager(Page);
  if (webPartManager == null ||
    webPartManager.GetDisplayMode().AllowPageDesign == false)
  { 
    this.Page.ClientScript.RegisterClientScriptBlock(
    typeof(WebPartPage),
    "FormNameVariable",
    @"<script type=""text/javascript""> var MSOWebPartPageFormName = '" + Page.Form.Name + "';</script>");
    ScriptLink.Register(Page, "init.js", true);
    Page.ClientScript.RegisterClientScriptBlock(
      typeof(Page), "MenuClientFilesKey", "", true);
    HttpContext.Current.Items["BrowserScriptLink"] = new object();
    return;
    }
  }
  base.OnInit(e);
}

All of this code is normally executing in the WebPartPage base class. Notice the return statement? It breaks the page lifecycle, OnInit will never be called on the WebPartPage class, and thus the breaking script is prevented.

8. Rich Text Editor

The last step is more of a user-land thing. The default in-browser rich text editor generates uncompliant markup. So, you need to replace that editor in its entirety. My first choice was to use the Accessible Rich Text Editor from HiSoftware which was built for Microsoft. However, I noticed that even when using this compliant editor, the final markup on the page is still invalid. The issue seems to be that the compliant markup taken from the editor is pushed into the field, but before that happens SharePoint scrubs the content, making the HTML invalid again. Double sigh.

To work around this last issue I chose to build a custom field type, and use my own custom editor (actually, I integrated the FCKEditor)

   

And what about steps 9 and 10? I lied, it's only eight

  

Comments

Andre

Hi Wouter,
 
This is a very interesting article, and I was wondering if it is possible for you to create a zip containing all the controls you are mentioning? That would make a nice package for the community to use!
 
Cheers,
 
Andre
at 5/6/2009 6:18 PM

Steve

Thanks for the article - it's always nice to see an alternative approach.  I have to say that missing out the whole "making Publishing Page content compliant" is a bit of a side-step of the other issues that come up with many (if not most) of the Publishing web controls not producing XHTML compliant mark up.  Saying that, what you've got is a good starting point.
at 9/3/2009 12:25 PM

Add Comment

Items on this list require content approval. Your submission will not appear in public views until approved by someone with proper rights. More information on content approval.

Name (required) *


Your Url

Type the Web address: (Click here to test)  

Type the description: 

Comments (required) *

Attachments