XSLT Performance tip: don’t indent output

Summary: turn off xsl output indenting

When transforming XML via XSLT, make sure the output setting for indenting is turned off and honoured by your code.

Turning it off will

  • Speed up the transform time
  • Reduce the output size

How is XSLT output indent turned off?

Indenting output should be off by default. If for whatever reason it is not, it can be turned off simply by using this:

<xsl:output indent="no" />

I have seen a lot of .NET XSLT ignore this important setting. Sometimes it is because the output is written to a stream or something else which doesn’t take the output settings from the XSLT.

When transforming to an XmlWriter in .NET, be sure to correctly create the XML Writer using the overload that takes in output settings obtained from a loaded XSLT. E.g:

XslCompiledTransform xslt = GetXslt(XsltPath);

// The 2nd argument honours XSLT the output settings!
using (XmlWriter w = XmlWriter.Create(sb, xslt.OutputSettings))
{
    XsltArgumentList args = GetXsltArgs();

    XPathNavigator navigator = _someXmlData.CreateNavigator();

    if (navigator == null)
        throw new NullReferenceException("navigator");

    if (w == null)
        throw new NullReferenceException("w");

    xslt.Transform(navigator, args, w);
}

(xsl:output has many other useful attributes worth looking into.)

What kind of savings do you get?

The savings will differ for many reasons (the XML document size, the structure of the XML, the types of transformation being done etc etc), so it is hard to give a definitive savings. But here’s an illustrative example:

A few weeks back, a colleague at work was having trouble with an XSLT that was taking a long time to transform.

We started to have a look; I was expecting to find inefficient XPath, or an XML document structure that wasn’t conducive to decent transformation speed, or something like that.

However, I noticed the first thing was his output was set to be indented, and the XML input was HUGE.

So, the first thing we did was turn it off.

That alone reduced a 12 minute transform (on a 300MB document) to just 1 minute!

In another scenario, a 70K XML document was taking about 0.25 seconds to transform. Turning off indenting shaved a few more milliseconds (can’t remember exact amount now) — and saved about 2K in the output HTML that it generated.

Why can this make such a difference?

I think the specifics may vary depending on the XSLT parser you are using, but I believe it is basically this:

  • Each newline/white space/tab(s) created for the indent requires an extra text node to contain these characters which requires extra memory (though at this point may not add that much time to the transformation).
  • When the transform is then saved to a file, or written out to an output stream strings are often involved. String processing can be expensive in many programming languages, so each of these indented text nodes needs handling. For very large documents (as above) this can require a lot of unnecessary processing.

But doesn’t this make the output harder to read?

The consumer of a transformation result is likely to be another process such as another XSLT in a pipeline, another process, or even a web browser.

None of these typically care about the extra white space, which also would require more processing when loading.

If you need to view the XML, it may be worth keeping the indent off and manually opening it in a text editor that has the ability to “pretty print” it for you. (Warning: some editors and IDEs, e.g. Visual Studio, can do automatically pretty print an XML document for you when you open it, making you think the XML itself had the indented output!)

Other savings are still possible

Turning off the indent is just one of many things you can look into. Other things include the following (though your mileage may vary):

  • Use attributes instead of elements (where possible; usually this is for simple values, such as numbers, dates, and very limited strings, and where the element is not expected to be indented)
  • Look at the XML structure to see if can be improved to make XSLT processing easier
  • Cache the XSLT Processor
  • Cache the output

I’ll try to expand on some of those in future posts.

Some more detailed XSLT performance tips which also created cleaner code were covered in an earlier post.

Why Use XSLT in Server Side Web Frameworks For Output Generation?

On this page:

  1. Summary
  2. Poor quality markup from the server side.
    1. Web application frameworks help immensely
    2. Some frameworks still produce poor quality markup
    3. Abstraction: a good idea but leaky
  3. Gaining control of how you create the output
    1. Why does it matter?
    2. Some frameworks give full control but may use specific templating languages
    3. Consultants and contractors moving between frameworks
  4. XSLT — an open standard
  5. XSLT in MVC
  6. Advantages/Benefits of an XSLT-based approach
    1. XSLT is an Open Standard
    2. Better Separation of the View from the Model and Controller in MVC
    3. Unit Testable
    4. Gain Full Control of the Markup Generation
    5. Platform and Framework Agnostic
    6. Portable skill
    7. Repeatable
    8. Development Productivity; Help Inject Key Skills into Project Delivery
    9. Encourage Front and Back-end Developers to Work More Closely
    10. Coding productivity
  7. Drawbacks (or perceived drawbacks) to the XSLT approach
    1. Not everyone likes XSLT!
    2. Yet another technology to learn
    3. XSLT 1.0 is Verbose and Weakly Typed
    4. XSLT is a limited language
    5. Poor Tool Support Compared to Other Languages?
    6. Refactoring in a mixed language environment can be problematic
    7. Getting data as XML may not be practical
    8. Perceived Performance Concerns
    9. Fragment Caching is Hard
    10. It implies error-prone XML-based templates
    11. Coding Productivity is Perceived to be Poor; Hand writing all that code seems inefficient
    12. Unlikely to get vendor buy-in if they have vested interest in other ways
    13. XSLT for web-based Views hasn’t taken off yet, so it must be a bad choice
  8. In conclusion
  9. Some examples
  10. Image credits
  11. Translation

Summary

This post follows on from previous posts about the way markup is sometimes created.

In a previous post, I worried about some web frameworks that abstract away the creation of markup, thus preventing (or making it more difficult for) web developers to create what they need — the leaky abstraction.

This post explores one way to overcome this, using XSLT.

As this is a rather long post, here are the key points:

  • Creating good quality markup is important for a variety of reasons (quick loading pages, cross browser compatibility, accessibility, etc).
  • Some server side frameworks may either generate poor quality markup or use templating systems that are specific to that framework.
  • Using XSLT to generate the output can be one way to overcome these limitations, while giving you full control of the markup that you need to be create.
  • Markup generation via XSLT may be simpler to unit test as part of a developer’s daily work, which may not be the case (or not necessarily as simple) in other templating systems.
  • Being platform agnostic, XSLT can be an effective part of a View in an MVC framework and be applied to different frameworks and platforms, reducing upskilling costs in the long term.
  • XSLT may not always be appropriate in all cases, and many are certainly put off by its oddities or perceived limitations, but perhaps it deserves another look?

(Some forth-coming posts will provide some code-examples, in particular of unit-testing XSLT and how it might fit into MVC as well as how it can be written in a reusable manner.)

Continue reading

XSLT Profilers

Microsoft recently announced an XSLT profiler for Visual Studio 2008. (I have used it briefly and it seems quite good.)

PHP recently announced PHP 5.3 which will include an XSLT profiler that can be invoked from within code. Current versions of PHP and the Microsoft one can invoke an XSLT profiler through the command line or against static XML/XSL files only, so being able to call it from within code is quite useful.

The run time invocation is really useful because if you are passing parameters into the XSLT or are generating the XML through DOM programmatically it is easier to profile. Otherwise, you need to capture the XML generated and save it, then invoke a profiler separately from the command line.

An example of calling it from within PHP is this (taken from a SitePoint article explaining the new features — see previous link):

Continue reading

XSLT Tips for Cleaner Code and Better Performance

XSLT is a transformation language to convert XML from one format to another (or to another text-based output).

People seem to love or hate XSLT. Some find it hard to read or strange to get used to. Yet, it can be quite elegant when coded right. So this will be the first in a series of posts to show where it can be useful (and what its pitfalls/annoyances may be), how to make best use of XSLT, etc.

This first post looks at coding style in XSLT 1.0 and XPath 1.0.

I think some frustrations at this technology come from wanting to do procedural programming with it, whereas it is really more like a functional programming language; you define what rules to act against, rather than how to determine the rules (kind of).

For example, consider the following example where a named template may be used to create a link to a product:

<xsl:template name="CreateLink">
  <xsl:param name="product" />

  <xsl:element name="a">
    <xsl:attribute name="href">
      <xsl:value-of select="'/product/?id='" /><xsl:value-of select="normalize-space($product/@id)" />
    <xsl:value-of select="$product/name" />
  </xsl:element>
</xsl:template>

I have found the above to be a common way people initially code their XSLTs. Yet, the following is far neater:

<xsl:template match="product">
  <a href="{concat('/product/?id=', normalize-space(./@id))}">
    <xsl:value-of select="./@name" />
  </a>
</xsl:template>

Not only does such neater coding become easier to read and maintain, but it can even improve performance.

(Update: As Azat rightly notes in a comment below the use of ‘./’ is redundant. That is definitely true. I should have added originally that I tend to use that to help others in the team, especially those newer to XSLT to understand the context of which element your template is running under a bit more clearly.)

Lets look at a few tips on how this may be possible (a future post will concentrate on additional performance-related tips; the tips below are primarily on coding style):

Continue reading