Closure Tools

Security

Many web applications suffer from XSS vulnerabilities. XSS was number 2 in OWASP's top 10 application security risks in 2010. Closure Templates include optional security features to prevent XSS in your application.

Table of Contents

  1. Autoescaping in Closure Templates
  2. Strict Autoescaping
    1. Kinds of Content
    2. Usage
    3. Passing parameters to strict templates
  3. Contextual Autoescaping (Deprecated)
  4. Anatomy of an XSS Hack (and its prevention)
  5. Escaping: the fine details
    1. Substitutions in HTML
    2. Substitutions in Tag and Attribute Names
    3. Substitutions in URLs
    4. Substitutions in JavaScript
    5. Substitutions in CSS
  6. Print Directives
  7. Formal Guarantees

Many web applications suffer from Cross-Site Scripting (XSS) vulnerabilities. XSS was number 2 in OWASP's top 10 application security risks in 2010. Closure Templates includes optional security features to prevent XSS in your application.

Autoescaping in Closure Templates

XSS vulnerabilities typically occur when dynamic text from an untrusted source is embedded into an HTML document. To prevent such attacks, escaping is used. Escaping is the process of converting text to be properly displayed in its context, such as turning angle brackets into < and > so they are not interpreted as tags.

The type of escaping needed will vary based on the location in the document where the value appears. For example, a value that appears inside a <style> tag will need to be escaped differently than a value that appears in a URI.

Closure Templates' autoescaping ensures that every single dynamic value is escaped in a context-appropriate way.

Closure Templates supports a number of different autoescaping modes, the most recent of which is the strict escaping mode. The previous system of contextual autoescaping is now officially deprecated (as of Q3 2013).

Strict Autoescaping

The most secure way to use Closure Templates is with strict autoescaping. Strict templates are recursively guaranteed not to underescape the output -- that is, every last dynamic value is forcibly printed with the correct escaping technique.

The output of a strict template is not a plain string, but rather a SanitizedContent object, which associates a content kind with the text. The content kind represents how the content is intended to be used, and the type of escaping, if any, that has already been applied to it. This information is particularly important in cases where the output of one template is used as an input parameter to another template.

For every dynamic value that appears in the output of a template, Closure Templates automatically identifies the output context at the point of use. The output context is determined by the surrounding text.

The combination of these two factors - content kind and output context - determines what kind of escaping is applied to the text. For example, if the text has already been URI-escaped, and it's being used in a URI context, then there's no need to escape it again. This prevents "double-escaping" of the text.

(In the old contextual autoescaping mode, special print directives such as |noAutoescape were used to prevent double-escaping. Unfortunately, the only way to know that an |noAutoescape is safe is to track down every single place the data can originate. This is not easy in the context of a large project, especially if the template is in a deep call chain. These directives are no longer needed in strict mode, and in fact are no longer allowed.)

Kinds of Content

The different content kinds are:

Content kind Description Example Notes
html HTML markup <div>Hello!<d/div>
attributes HTML attribute-value pairs class="foo" width="100%" Represents the combination of both attribute names and attribute values, and must include the quotation marks around the attribute value. If the template output is intended to be just an attribute value alone - the part inside the quotes - then specify either the text or html content kind.
text Plain text, not yet escaped Hello!
uri URIs http://www.google.com/search?q=android
css Stylesheet text .myClass{ color: red; display: block; }
js JavaScript or JSON {"a": 1, "b": 2}

It's important to understand what this "content kind" really represents, and avoid reading too much into it. The content kind isn't a compiler type; you won't get a warning if you attempt to use a text kind in a css context or vice versa. Rather, the content kind is an indication that the text is safe for a given context and therefore does not require additional escaping.

For input values that are not SanitizedContent objects, a strict template coerces the value to a text string, and then applies escaping based on the context.

Usage

To turn on strict autoescaping, add autoescape="strict" to your namespace or template declarations. The recommended practice is to enable strict mode at the namespace level, since that automatically enables it for all templates defined in that file.

By default, the output of a strict template will be html kind. If your template produces a different kind of content, you'll need to add kind attributes to your template. For example, a template that produces a URI might look like this:

A strict template
{template .googleUri autoescape="strict" kind="uri"}
  http://www.google.com/
{/template}

The kind attribute can be added to the following Closure Templates commands:

Closure Templates Command Notes
template Optional. Assumed to be kind="html" if omitted.
deltemplate Same as template. All matching delegates must have the same kind.
let The kind attribute is required for "block form" let statements.
param The kind attribute is required for "block form" param statements.

"Block form" means commands that contain templated content, as opposed to "short form" commands that contain value expressions such as $foo. The following examples shows the difference between these two command forms, and illustrates the usage of the kind attribute:

{template .foo autoescape="strict" kind="html"}
  // Block-form 'let' command, 'kind' is required.
  {let $message kind="text"}
    {msg}Hi, {$name}!{/msg}
  {/let}

  // Short form 'let', no 'kind' attribute.
  {let $category: $categoryList[0] /}

  {call .bar}
    // Block-form 'param' command, kind is required.
    {param attributes kind="attributes"}
      title="{$message}"{sp}
      onclick="foo('{$message}')"
    {/param}
    {param content kind="html"}
      <b>{$message}</b>
    {/param}

    // Short-form 'param' command, no 'kind' attribute.
    {param visible: true /}
  {/call}
{/template}

Short form commands don't need the kind attribute because they pass values rather than constructing strings, and values simply keep whatever kind they already have.

Strict autoescaping can be turned on for some templates and not for others, so you do not need to change all your templates at once. However, given how prevalent XSS problems are, it is a good idea to eventually migrate all your templates to use strict autoescaping.

Strict mode templates can be called from older contextual-mode templates, but not the other way around (with one exception: strict-mode templates of type "text" can call non-strict templates).

When a contextual-mode template calls a strict-mode template, the compiler makes a check to ensure that the content kind of the strict template matches the context in the calling template. If not, then the template compiler signals an error.

Passing parameters to strict templates

For ordinary content that doesn't contain markup, you can just pass in the string values as template parameters as before, and they will get escaped.

For template parameters that contain markup that you don't want re-escaped, ensure the content is wrapped in an appropriate SanitizedContent object. However, make sure that the marked-up text is really safe, otherwise you risk introducing exactly the kind of XSS vulnerability that strict mode was designed to prevent.

For JavaScript, the function to wrap an HTML content string is called soydata.VERY_UNSAFE.ordainSanitizedHtml. In Java, the equivalent function is UnsafeSanitizedContentOrdainer.ordainAsSafe (in package com.google.template.soy.data).

You might want to place restrictions in your project that limit where and when these wrapper functions can be called, such as limiting these calls to a specific class or package that can easily be searched and audited. Otherwise, it becomes tempting to simply wrap arbitrary, untrusted strings whenever it's convenient in the code, which defeats the whole purpose of preventing security holes.

Contextual Autoescaping (Deprecated)

Contextual autoescaping is the evolutionary predecessor to strict autoescaping and is enabled by setting autoescape="contextual" in your template or namespace declarations. It's documented here for the sake of developers working with older templates.

Both contextual and strict autoescaping share many features because strict mode is built on top of contextual mode. Both modes support the notion of output contexts, and can handle dynamic values that are SanitizedContent objects. The biggest difference between them is that strict templates generate SanitizedContent objects, while contextual templates produce plain text.

Contextual autoescaping also supports a number of directives that are no longer useful in strict templates, including the various escaping directives such as |noAutoescape. (Part of the reason for introducing strict templates was that widespread use of |noAutoescape and similar directives for large projects made it difficult to reason about the behavior and safety of the templates.)

Except where noted, all of the documentation in the following sections applies to both strict and contextual templates.

Anatomy of an XSS Hack (and its prevention)

Templates make it easy to compose content in a language like HTML from static HTML and dynamic values. Closure Templates's autoescaping makes it even easier by letting you use the same values in many contexts without having to explicitly specify encoding.

An enterprising hacker might try to sneak a malicious value into your template to take it over via XSS. Perhaps using

{ x: 'javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>' }

If we pass this to a naive template, like

{template .foo autoescape="false"}
  <a href="{$x}"
   onclick="{$x}"
   >{$x}</a>
  <script>var x = '{$x}'</script>
  <style>
    p {
      font-family: "{$x}";
      background: url(/images?q={$x});
      left: {$x}
    }
  </style>
{/template}

then the attack succeeds. That template produces

  <a href="javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>"
   onclick="javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>"
   >javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script></a>
  <script>var x = 'javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script></script>
  <style>
    p {
      font-family: "javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>";
      background: url(/images?q=javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>);
      left: javascript:/*</style></script>/**/ /<script>1/(alert(1337))//</script>
    }
  </style>

which pops up "1337" 6 times, and a seventh if you click the link.

Let's take another look at that malicious input to figure out why:

javascript:At the beginning of a URL, this changes the rest of the content into JavaScript. In a script statement, this is just an unused label.
/*</style></script>/**/This breaks out of any style or script element. If already in a script attribute value, this just looks like a comment. It prematurely ends any unquoted attribute value and its containing tag.
/<script>1/If outside a script, this starts a script tag with a useless division. Inside a script, this is a self-contained regular expression literal.
(alert(1337))If preceded by a regular expression literal, this tries to call it, but only after executing the real malicious code, alert(1337).
//</script>If inside a script tag, this closes it correctly. If inside a javascript: URL attribute or event handler attribute, this is a harmless comment.

Many of the pieces of that malicious input depend on being interpreted different ways by different parts of a browser. Autoescaping defangs this and other malicious inputs by choosing a single consistent meaning for a dynamic value, and choosing an escaping scheme that makes sure the browser will interpret it the same way.

So if we pass that same malicious input to a contextually autoescaped template: (Note that only the {template}'s autoescape attribute changed.)

{template .foo autoescape="false""contextual"}
  <a href="{$x}"
   onclick="{$x}"
   >{$x}</a>
  <script>var x = '{$x}'</script>
  <style>
    p {
      font-family: "{$x}";
      background: url(/images?q={$x});
      left: {$x}
    }
  </style>
{/template}

We get a very different output; one that is altogether saner:

  <a href="#zSoyz"
   onclick="'javascript:/*&lt;/style&gt;&lt;/script&gt;/**/ /&lt;script&gt;1/(alert(1337))//&lt;/script&gt;'"
   >javascript:/*&lt;/style&gt;&lt;/script&gt;/**/ /&lt;script&gt;1/(alert(1337))//&lt;/script&gt;</a>
  <script>var x = 'javascript:/*\x3c/style\x3e\x3c/script\x3e/**/ /\x3cscript\x3e1/(alert(1337))//\x3c/script\x3e'</script>
  <style>
    p {
      font-family: "javascript:/*\3c /style\3e \3c /script\3e /**/ /\3c script\3e 1/(alert(1337))//\3c /script\3e ";
      background: url(/images?q=javascript%3A%2F%2A%3E%2Fstyle%3C%3E%2Fscript%3C%2F%2A%2A%2F%20%2F%3Escript%3C1%2F%28alert%281337%29%29%2F%2F%3E%2Fscript%3C);
      left: zSoyz
    }
  </style>

  • When {$x} appeared inside HTML text, we entity-encoded it (<&lt;).
  • When {$x} appeared inside a URL or as a CSS quantity, we rejected it because it had a protocol javascript: that was not http or https, and instead output a safe value #zSoyz. Had {$x} appeared in the query portion of a URL, we would have percent-encoded it instead of rejecting it outright (<%3C).
  • When {$x} appeared in JavaScript, we wrapped it in quotes (if not already inside quotes) and escaped HTML special characters (<\x3c).
  • When {$x} appeared inside CSS quotes, we did something similar to JavaScript, but using CSS escaping conventions (<\3c ).

The malicious output was defanged.

Escaping: the fine details

Substitutions in HTML

When a print command appears where normal HTML text could appear, then the result is HTML entity-escaped. For example, in

  <div title="{$shortMessage}">{$longMessage}</div>

given ({ "shortMessage": "I <3 ponies", "longMessage": "OMG! <3 <3 <3!" }) produces

  <div title="I &lt;3 ponies!">OMG!  &lt;3 &lt;3 &lt;3!</div>

You can safely substitute data anywhere a tag can appear or in a plain attribute value. It's good practice to quote all your attributes, but if you do forget quotes, the autoescaper makes sure the attribute value cannot be split by spaces in the dynamic value. Given the input above, <div title={$shortMessage}><div title=I&#32;&lt;3&#32;ponies!>. Spaces, which would normally end an unquoted attribute value, are encoded to keep the value together.

To avoid over-escaping of known safe HTML, you can use sanitized content. The template <div>{$foo}</div> given { foo: new soydata.SanitizedHtml("<b>Foo</b>") } produces output that is not re-escaped: <div><b>Foo</b></div>, instead of the over-escaped version that would have been produced if the soydata.SanitizedHtml wrapper were not there: <div>&lt;b&gt;Foo&lt;/b&gt;</div>.

Sanitized content is safe to use with attributes and with elements that cannot contain tags such as TEXTAREA. The template <div title="{$foo}">{$foo}</div> given the input above produces a sensible output: <div title="Foo"><b>Foo</b></div>. When embedded in an HTML attribute, sanitized content will have tags stripped first.

Substitutions in Tag and Attribute Names

Substitutions in tag and attribute names are sanity-checked rather than entity-encoded.

<h{$headerLevel}>Foo</h{$headerLevel><h3>Foo</h3> for headerLevel=3 but for headerLevel='><script>alert(1337)<script' you get <hzSoyz>Foo</hzSoyz>. You'll also get a log message in Java, and in JavaScript, if you're running with closure asserts enabled, you get an assert.

Don't try to specify special tag names; like script or style; or special attribute names; like href, style, or onclick; dynamically. Trying to use <{$name}>{$content}</{$name}> with ({ "name": "script", "content": "alert(1337)" }) or <a {$name}="{$content}"> with ({ "name": "onmouseover", "content": "alert(1337)" }) is asking for trouble. Since the autoescaper cannot distinguish JavaScript, CSS, or URLs from plain HTML with those tag and attribute names, it must reject them.

Substitutions in URLs

Values that are substituted into different parts of URIs are treated differently. Substitutions in the query part are URI-escaped.

<a href="{$x}"> Entity-escape and filter out bad protocols.
({ "x": "http://foo/" }) <a href="http://foo/">
({ "x": "/foo?a=b&c=d" }) <a href="/foo?a=b&amp;c=d">
({ "x": "javascript:alert(1337)" }) <a href="#zSoyz">
<a href="/foo/{$x}"> Just entity-escape.
({ "x": "bar" }) <a href="/foo/bar">
({ "x": "bar&baz/boo" }) <a href="/foo/bar&amp;baz/boo">
<a href="/foo?q={$x}"> Percent encode inside query.
({ "x": "bar&baz=boo" }) <a href="/foo?q=bar%26baz%3dboo">
({ "x": new soydata.SanitizedUri(
"bar&baz=boo") })
<a href="/foo?q=bar&amp;baz=boo">
({ "x": "A is #1" }) <a href="/foo?q=A%20is%20%231">

As long as you stick to standard HTML attribute names, the autoescaper figures out which attributes contain URLs, which contain CSS, etc. If you do decide to define custom attributes such as data-… attributes, you can still use a naming convention to tell the autoescaper which attributes have URL content: Names that start or end with "URL" or "URI", ignoring case, will be treated as having URL values. For example, the autoescaper treats data-secondaryUrl, foo:urlForLogin, and data-thesauri as having URL content; but not data-curliewurly. Precisely, /\bur[il]|ur[il]s?$/i is the set of custom attribute names with URL values.

Substitutions in JavaScript

Values in JavaScript that are inside quotes are dealt with differently from those outside quotes.

<script>alert('{$x}');</script> Escaped inside quotes.
({ "x": "O'Reilly Books" }) <script>alert('O\'Reilly Books');</script>
({ "x": new soydata.SanitizedJsStrChars(
"O\\'Reilly Books") })
<script>alert('O\'Reilly Books');</script>
<script>alert({$x});</script> Without quotes, treated as a value.
({ "x": "O'Reilly Books" }) <script>alert('O\'Reilly Books');</script>
({ "x": 42 }) <script>alert( 42 );</script>
({ "x": true }) <script>alert( true );</script>

Substitutions in CSS

Values in CSS can be parts of classes, IDs, quantities, colors, or URLs.

<style>div#{$id} {lb} {rb}</style> Classes and IDs
({ "id": "foo-bar" }) <style>div#foo-bar { }</style>
<div style="color: {$x}"> Quantities
({ "x": "red" }) <div style="color: red">
({ "x": "#f00" }) <div style="color: #foo">
({ "x": "expression('alert(1337)')" }) <div style="color: zSoyz">
<div style="margin-{$ltr-dir}: 1em"> Property Names
({ "ltr-dir": "left" }) <div style="margin-left: 1em">
({ "ltr-dir": "right" }) <div style="margin-right: 1em">
<style>p {lb} font-family: '{$x}' {rb}</style> Quoted Values
({ "x": "Arial" }) <style>p { font-family: 'Arial' }</style>
({ "x": "</style>" }) <style>p { font-family: '\3c \2f style\3e ' }</style>
<div style="background: url({$x})"> URLs in CSS are handled as in attributes above
({ "x": "/foo/bar" }) <div style="background: url(/foo/bar)">
({ "x": "javascript:alert(1337)" }) <div style="background: url(#zSoyz)">
({ "x": "?q=(O'Reilly) OR Books" }) <div style="background: url(?q=%28O%27Reilly%29%20OR%20Books)">

Autoescaping works by automatically adding print directives to templates, so you can remove the print directives that you explicitly added, including |escapeJs, |escapeUri, |escapeHtml, and especially those dangerous |noAutoescape directives.

In case you have defined custom print directives, the autoescaper does not interfere with any {print …} command containing a directive that returns true from shouldCancelAutoescape(). Thus, if the escape directive transforms plain text to the expected content type, then override shouldCancelAutoescape() to return true. If your custom directive expects already-escaped input instead of plain text, you can implement SanitizedContentOperator to get the autoescaper to insert escaping directives before your directive so they produce the already-escaped input and pipe it to your directive.

Formal Guarantees

Autoescaping augments Closure Templates to choose an appropriate encoding for each dynamic value so that even if a particular dynamic value can be controlled by an attacker, certain safety properties hold.

Specifically, if a template, and all the templates that it calls have autoescape="contextual" or autoescape="strict", and have no manual escaping overrides such as |noAutoescape, then the following properties hold:

Structure is preserved

If you, the Closure Templates author, write <b>{$x}</b>, then the tags <b> and </b> always correspond to matched tags in the template output regardless of the value of $x.

No dynamic value can change the meaning of an HTML, CSS, or JavaScript token in the template, or correspondences between pairs of matched tokens.

Only code in the template is executed

Dynamic values cannot specify unsafe code. Any code hidden in dynamic values (whether via <script> elements, javascript: URIs, or some other mechanism) are treated as plain text and encoded properly on output instead of being rendered as code.

Dynamic values that appear in JavaScript (e.g. $message in <script>alert('{$message}')</script>) are encoded to expressions without side effects or free variables (to preserve privacy constraints). Given { "message": "'//\ndoEvil()//" }, the template produces <script>alert('\x27//\ndoEvil()//');</script>, which alerts the garbage string passed in instead of calling doEvil.

All code in the template is executed

A dynamic value cannot cause code to fail to parse. Some applications have security-critical code that they need to run if JavaScript is enabled. Take for example the following template:

<script>
  var s = '{$s}';
  doSecurityCriticalStuff();
</script>

If the value of the variable s is a newline character "\n", then a non-autoescaped template would produce the following output:

<script>
  var s = '
';
  doSecurityCriticalStuff();
</script>

The autoescaped version of the template instead produces:

<script>
  var s = '\n';
  doSecurityCriticalStuff();
</script>

which parses properly.

If a template or the templates that it calls do not have autoescaping enabled, or use explicit escaping directives like |noAutoescape incorrectly, then the autoescaper makes a best effort to preserve these properties but might fail.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.