What's wrong with Request.Headers["Accept-Encoding"].Contains("gzip")?
This post has been brewing for a little while. It stems from an inconsistency I've seen in code posted here and there over the web. Quite specifically this happens often when trying to detect which Accept-Encoding a browser can accept, so a GZIP or DEFLATE filter can be used to compress the content.
The Offending Code
The code in question goes something like this:
string encoding = Request.Headers["Accept-Encoding"];
if (encoding.Contains("gzip"))
{
Response.AppendHeader("Content-Encoding", "gzip");
Response.Filter = new GZipStream(Response.Filter, CompressionMode.Compress);
}
else if (encoding.Contains("deflate"))
{
Response.AppendHeader("Content-Encoding", "deflate");
Response.Filter = new DeflateStream(Response.Filter, CompressionMode.Compress);
}
What's wrong with it? It seems to be pretty standard and widely in use.
Well, consider the following possible values that can be sent in the Accept-Encoding header.
"gzip,deflate"
"deflate,gzip"
"gzip;q=.5,deflate"
"gzip;q=0,deflate"
"deflate;q=0.5,gzip;q=0.5,identity"
"*"
and here are some observations when running the above code listing, compared to what is expected by section 14 of RFC2616.
-
"gzip,deflate"
: This one passes the example code and the rfc fine. As the browser, we're expecting gzip first, otherwise give us deflate. -
"deflate,gzip"
: Unfortunately, as the browser, we want deflate first, but gzip is okay if the server can't give us our first preference; we're served gzip in this case as the code just looks to see if "gzip" is contained within the Accept-Encoding header before it looks to see if "deflate" is there. This isn't such a big deal as we're still receiving compressed content, but we should have been served deflate. -
"gzip;q=.5,deflate"
: This one uses another valid method to tell the server (and your application) that the browser would prefer deflate encoding before gzip encoding. When a "q=" is specified after the encoding, with a value between 0 and 1, the preference is given that weight (or quality). When no "q=" is specified, it defaults to 1. So this rule tells the server the browser wants gzip only half as much as it wants deflate compression. Because our code is looking to see if gzip is contained within the header, it serves us gzip despite our preference. It's still not critical as we can still accept the gzip encoding. -
"gzip;q=0,deflate"
: Here is where we get into trouble. Now our header is telling the server "Do NOT give me gzip compressed content, but I'll take content encoded with deflate if you can". Again, because the code simply looks to see if gzip is contained within the header, regardless of its weight and being disallowed, we're still served gzip. -
"deflate;q=0.5,gzip;q=0.5,identity"
: This one is more of a trick question; its not as innocent as you might think. For accept encodings, "identity" is a special case. It means that it should be served without content encoding. Considering that the "q" defaults to "1" when not specified, this is the order of our preferred encodings"identity,deflate;q=0.5,gzip;q=0.5"
. This means that we shouldn't give a content-encoding. But once again, because the code simply looks to see if contains gzip, we get gzipped content. -
"*"
: The last one has said "We'll accept any encoding you want to serve.", and while we could (and perhaps should) be served compressed content, because the code can't see gzip or deflate in the header, it serves the normal content. This one isn't so critical, but it could have been easily solved and help in reducing bandwidth and download time.
In all fairness, its probably not likely now days to not accept gzip encoding so it probably effects very few requests - but it is incorrect. In my examples I've used an arbitrary ordering to illustrate the importance of the "q" part in regards to preference. Quite usually however, the most preferred values appear first in the list - but it's not guaranteed
To handle this situation we need to be able to find the preferred encoding (and this can be applied to any of the similar headers i.e. Accept-Encoding, Accept-Charset, Accept-Language & Accept) that is accepted.
Finding a preferred HTTP Header Value in C#.
I decided to split the header value up into its relevant segments, sort
it and then interrogate it in a generic list. There is possibly more
overhead from parsing the value into a list and then looping over the
values, but verses having an incorrect implementation as the first code
listing does, its worth it. My implementation of a QValueList
follows
shortly. First, if you're interested, I'd encourage you to have a look
at the built-in Microsoft "System.Web.Caching.OutputCacheModule"
module via Lutz Roeder's .NET Reflector.
It uses a parsing method to determine whether an encoding is allowed or not; and
truthfully, its also where I encountered the "identity" encoding and
investigated its impact on what we're trying to achieve So an
alternative to this approach would be to parse similar to the
OutputCache module.
Here's how we'll change the typical code from the start of the post:
/// load encodings from header
QValueList encodings = new QValueList(Request.Headers["Accept-Encoding"]);
/// get the types we can handle, can be accepted and
/// in the defined client preference
QValue preferred = encodings.FindPreferred("gzip", "deflate", "identity");
/// if none of the preferred values were found, but the
/// client can accept wildcard encodings, we'll default
/// to Gzip.
if (preferred.IsEmpty && encodings.AcceptWildcard && encodings.Find("gzip").IsEmpty)
preferred = new QValue("gzip");
// handle the preferred encoding
switch (preferred.Name)
{
case "gzip":
Response.AppendHeader("Content-Encoding", "gzip");
Response.Filter = new GZipStream(Response.Filter, CompressionMode.Compress);
break;
case "deflate":
Response.AppendHeader("Content-Encoding", "deflate");
Response.Filter = new DeflateStream(Response.Filter, CompressionMode.Compress);
break;
case "identity":
default:
break;
}
The first observation one might make is that there is quite a lot more code. However, it is required to get the correct behaviour. Besides, its not a lot if you factor it into your code/library/component correctly.
So why is it called "QValue" and not simply simply "AcceptEncoding"? Well, you can apply this to any of the headers that use the same qualified value convention including:
Request.Headers["Accept-Encoding"]
Request.Headers["Accept-Charset"]
Request.Headers["Accept-Language"]
Request.Headers["Accept"]
Source Code Listing & Download
Here's the code listing in line, or you can download the .cs file containing both QValue and QValueList.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
/// <summary>
/// Represents a weighted value (or quality value) from an http header e.g. gzip=0.9; deflate; x-gzip=0.5;
/// </summary>
/// <remarks>
/// accept-encoding spec:
/// http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
/// </remarks>
/// <example>
/// Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
/// Accept-Encoding: gzip,deflate
/// Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
/// Accept-Language: en-us,en;q=0.5
/// </example>
[DebuggerDisplay("QValue[{Name}, {Weight}]")]
public struct QValue : IComparable<QValue>
{
static char[] delimiters = { ';', '=' };
const float defaultWeight = 1;
#region Fields
string _name;
float _weight;
int _ordinal;
#endregion
#region Constructors
/// <summary>
/// Creates a new QValue by parsing the given value
/// for name and weight (qvalue)
/// </summary>
/// <param name="value">The value to be parsed e.g. gzip=0.3</param>
public QValue(string value)
: this(value, 0)
{ }
/// <summary>
/// Creates a new QValue by parsing the given value
/// for name and weight (qvalue) and assigns the given
/// ordinal
/// </summary>
/// <param name="value">The value to be parsed e.g. gzip=0.3</param>
/// <param name="ordinal">The ordinal/index where the item
/// was found in the original list.</param>
public QValue(string value, int ordinal)
{
_name = null;
_weight = 0;
_ordinal = ordinal;
ParseInternal(ref this, value);
}
#endregion
#region Properties
/// <summary>
/// The name of the value part
/// </summary>
public string Name
{
get { return _name; }
}
/// <summary>
/// The weighting (or qvalue, quality value) of the encoding
/// </summary>
public float Weight
{
get { return _weight; }
}
/// <summary>
/// Whether the value can be accepted
/// i.e. it's weight is greater than zero
/// </summary>
public bool CanAccept
{
get { return _weight > 0; }
}
/// <summary>
/// Whether the value is empty (i.e. has no name)
/// </summary>
public bool IsEmpty
{
get { return string.IsNullOrEmpty(_name); }
}
#endregion
#region Methods
/// <summary>
/// Parses the given string for name and
/// weight (qvalue)
/// </summary>
/// <param name="value">The string to parse</param>
public static QValue Parse(string value)
{
QValue item = new QValue();
ParseInternal(ref item, value);
return item;
}
/// <summary>
/// Parses the given string for name and
/// weight (qvalue)
/// </summary>
/// <param name="value">The string to parse</param>
/// <param name="ordinal">The order of item in sequence</param>
/// <returns></returns>
public static QValue Parse(string value, int ordinal)
{
QValue item = Parse(value);
item._ordinal = ordinal;
return item;
}
/// <summary>
/// Parses the given string for name and
/// weight (qvalue)
/// </summary>
/// <param name="value">The string to parse</param>
static void ParseInternal(ref QValue target, string value)
{
string[] parts = value.Split(delimiters, 3);
if (parts.Length > 0)
{
target._name = parts[0].Trim();
target._weight = defaultWeight;
}
if (parts.Length == 3)
{
float.TryParse(parts[2], out target._weight);
}
}
#endregion
#region IComparable<QValue> Members
/// <summary>
/// Compares this instance to another QValue by
/// comparing first weights, then ordinals.
/// </summary>
/// <param name="other">The QValue to compare</param>
/// <returns></returns>
public int CompareTo(QValue other)
{
int value = _weight.CompareTo(other._weight);
if (value == 0)
{
int ord = -_ordinal;
value = ord.CompareTo(-other._ordinal);
}
return value;
}
#endregion
#region CompareByWeight
/// <summary>
/// Compares two QValues in ascending order.
/// </summary>
/// <param name="x">The first QValue</param>
/// <param name="y">The second QValue</param>
/// <returns></returns>
public static int CompareByWeightAsc(QValue x, QValue y)
{
return x.CompareTo(y);
}
/// <summary>
/// Compares two QValues in descending order.
/// </summary>
/// <param name="x">The first QValue</param>
/// <param name="y">The second QValue</param>
/// <returns></returns>
public static int CompareByWeightDesc(QValue x, QValue y)
{
return -x.CompareTo(y);
}
#endregion
}
/// <summary>
/// Provides a collection for working with qvalue http headers
/// </summary>
/// <remarks>
/// accept-encoding spec:
/// http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
/// </remarks>
[DebuggerDisplay("QValue[{Count}, {AcceptWildcard}]")]
public sealed class QValueList : List<QValue>
{
static char[] delimiters = { ',' };
#region Fields
bool _acceptWildcard;
bool _autoSort;
#endregion
#region Constructors
/// <summary>
/// Creates a new instance of an QValueList list from
/// the given string of comma delimited values
/// </summary>
/// <param name="values">The raw string of qvalues to load</param>
public QValueList(string values)
: this(null == values ? new string[0] : values.Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
{ }
/// <summary>
/// Creates a new instance of an QValueList from
/// the given string array of qvalues
/// </summary>
/// <param name="values">The array of qvalue strings
/// i.e. name(;q=[0-9\.]+)?</param>
/// <remarks>
/// Should AcceptWildcard include */* as well?
/// What about other wildcard forms?
/// </remarks>
public QValueList(string[] values)
{
int ordinal = -1;
foreach (string value in values)
{
QValue qvalue = QValue.Parse(value.Trim(), ++ordinal);
if (qvalue.Name.Equals("*")) // wildcard
_acceptWildcard = qvalue.CanAccept;
Add(qvalue);
}
/// this list should be sorted by weight for
/// methods like FindPreferred to work correctly
DefaultSort();
_autoSort = true;
}
#endregion
#region Properties
/// <summary>
/// Whether or not the wildcarded encoding is available and allowed
/// </summary>
public bool AcceptWildcard
{
get { return _acceptWildcard; }
}
/// <summary>
/// Whether, after an add operation, the list should be resorted
/// </summary>
public bool AutoSort
{
get { return _autoSort; }
set { _autoSort = value; }
}
/// <summary>
/// Synonym for FindPreferred
/// </summary>
/// <param name="candidates">The preferred order in which to return an encoding</param>
/// <returns>An QValue based on weight, or null</returns>
public QValue this[params string[] candidates]
{
get { return FindPreferred(candidates); }
}
#endregion
#region Add
/// <summary>
/// Adds an item to the list, then applies sorting
/// if AutoSort is enabled.
/// </summary>
/// <param name="item">The item to add</param>
public new void Add(QValue item)
{
base.Add(item);
applyAutoSort();
}
#endregion
#region AddRange
/// <summary>
/// Adds a range of items to the list, then applies sorting
/// if AutoSort is enabled.
/// </summary>
/// <param name="collection">The items to add</param>
public new void AddRange(IEnumerable<QValue> collection)
{
bool state = _autoSort;
_autoSort = false;
base.AddRange(collection);
_autoSort = state;
applyAutoSort();
}
#endregion
#region Find
/// <summary>
/// Finds the first QValue with the given name (case-insensitive)
/// </summary>
/// <param name="name">The name of the QValue to search for</param>
/// <returns></returns>
public QValue Find(string name)
{
Predicate<QValue> criteria = delegate(QValue item) { return item.Name.Equals(name, StringComparison.OrdinalIgnoreCase); };
return Find(criteria);
}
#endregion
#region FindHighestWeight
/// <summary>
/// Returns the first match found from the given candidates
/// </summary>
/// <param name="candidates">The list of QValue names to find</param>
/// <returns>The first QValue match to be found</returns>
/// <remarks>Loops from the first item in the list to the last and finds
/// the first candidate - the list must be sorted for weight prior to
/// calling this method.</remarks>
public QValue FindHighestWeight(params string[] candidates)
{
Predicate<QValue> criteria = delegate(QValue item)
{
return isCandidate(item.Name, candidates);
};
return Find(criteria);
}
#endregion
#region FindPreferred
/// <summary>
/// Returns the first match found from the given candidates that is accepted
/// </summary>
/// <param name="candidates">The list of names to find</param>
/// <returns>The first QValue match to be found</returns>
/// <remarks>Loops from the first item in the list to the last and finds the
/// first candidate that can be accepted - the list must be sorted for weight
/// prior to calling this method.</remarks>
public QValue FindPreferred(params string[] candidates)
{
Predicate<QValue> criteria = delegate(QValue item)
{
return isCandidate(item.Name, candidates) && item.CanAccept;
};
return Find(criteria);
}
#endregion
#region DefaultSort
/// <summary>
/// Sorts the list comparing by weight in
/// descending order
/// </summary>
public void DefaultSort()
{
Sort(QValue.CompareByWeightDesc);
}
#endregion
#region applyAutoSort
/// <summary>
/// Applies the default sorting method if
/// the autosort field is currently enabled
/// </summary>
void applyAutoSort()
{
if (_autoSort)
DefaultSort();
}
#endregion
#region isCandidate
/// <summary>
/// Determines if the given item contained within the applied array
/// (case-insensitive)
/// </summary>
/// <param name="item">The string to search for</param>
/// <param name="candidates">The array to search in</param>
/// <returns></returns>
static bool isCandidate(string item, params string[] candidates)
{
foreach (string candidate in candidates)
{
if (candidate.Equals(item, StringComparison.OrdinalIgnoreCase))
return true;
}
return false;
}
#endregion
}