WindowsDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


AddThis Social Bookmark Button

Filtering HTTP Requests with .NET

by Ben Lowery
10/20/2003

Introduction

ASP.NET has a number of extensibility points that developers can use. One such point is response filtering, accessible via the Filter property of the HttpResponse class. Filters intercept content destined for the client and have an opportunity to modify that content prior to sending it out. Filters are unique in that they can access the raw byte stream that is going to be sent to the client. This article will show you how to create and install a set of simple filters and will expose some of the gotchas that accompany the technology.

I highly recommend you download the code examples for this article and play around with them.

Building Blocks

The examples and code for this article work with either the 1.0 or 1.1 versions of the .NET framework. This article assumes you are familiar with creating virtual directories and using custom HttpModules and HttpHandlers inside of an ASP.NET app. If you're not, it would be a good idea to read up on them before proceeding or installing the examples. The examples must be rooted in a web application to work. The easiest way I know of to do so is the use the "Sharing and Security" option available via right-clicking on the folder where the examples live. Inside of the "Web Sharing" tab, simply share the folder using some name. For this article, I'll name the virtual directory ourfirstfilter.

Related Reading

Programming ASP.NET
By Jesse Liberty, Dan Hurwitz

Additionally, the ASP.NET worker process will require write permissions to the example directory. The samples perform file-based logging to illustrate how filters and the processing pipeline interact; therefore, the ASP.NET process has to be able to write out the files. Giving write access to the ASP.NET worker process is not a fantastic idea; I would highly recommend that you lock down the virtual directory to authenticated users.

Lastly, a word on scope. ASP.NET response filters will only see content flowing through the ASP.NET processing pipeline. This may seem obvious, but it catches a lot of people off guard. By default, static HTML files, static images, PHP, JSP, or any other technology that does not use the ASP.NET processing model will not be filtered. If you'd like to filter static HTML content, you could map the .html and .htm extensions onto the ASP.NET ISAPI extension and configure ASP.NET appropriately, but this comes at some performance cost. Whether the cost is too great is something only your situation can decide.

What is Filtering?

So what is filtering and why would you want to use it? Filtering allows you to intercept the content being written back to the client and do interesting things with it. Our goal with any web application is to respond to an HTTP request. This request could be formulated using a number of different methods: a user with a browser, an automated script using a library, a user at a command prompt using curl, or any number of other techniques. ASP.NET exposes the request and response via two objects, HttpRequest and HttpResponse. You can find this pair in the System.Web namespace of the System.Web.dll assembly. The HttpResponse object exposes a Filter property. That one property is what the rest of this article will discuss.

First, why would you want to filter? Why not just bake your logic into the application some other way? Truthfully, filtering is hard, and I've only seen a few really useful things done with it. One is my own HttpCompressionModule. This is an HttpModule that adds standard compression to any ASP.NET app (and is fraught with its own perils).

Another case that some people seem to pursue involves string replacement. With a filter, you could parse the output and replace certain tokens with something else. For example, you could write a censoring filter that finds and removes naughty words from a user-supplied comment. This approach can work, but it too is fraught with peril.

A Basic Filter

A filter has to derive from System.IO.Stream and follow a policy of taking ownership of the current filter, storing it, and writing filtered content into the held original filter. Filters are write-only streams, so your override of Stream should support writing, at the very least. The filtering model uses chaining to accomplish its work; you can easily install multiple filters and have each perform some work as long as you follow the model. Here's a short example of what I mean, along with some code to install the filter multiple times:

// A simple filter that does the right thing
using System.IO;

public class AnyFilter : Stream {
  
  Stream originalStream;
  
  public AnyFilter(Stream originalFilter) {
    originalStream = originalFilter;
  }
  public override void Write(...) {
    //perform filtering
    originalStream.Write(...filtered content...);
  }
}


// Installing the filter more than once
using System.Web.UI;


public class MyPage : Page {


  public override OnInit(EventArgs args) {

    // install filter once
    AnyFilter f = new AnyFilter(Response.Filter);
    Response.Filter = f;
		
    // install filter twice!
    AnyFilter g = new AnyFilter(Response.Filter);
    Response.Filter = g;
    base.OnInit(args);
  }

}

As you can see in the code above, you have to access and store the Response.Filter property. The following code will also work, but it not recommended:

public override OnInit(EventArgs args) {
  AnyFilter f = new AnyFilter(Response.OutputStream);
  Response.Filter = f;
}

This code does not respect any other filters that may already be in place. Therefore, it is preferable to use the Filter property over the OutputStream.

I have provided an HttpFilter class in the example code that performs the necessary overrides and can act as a good starting point. Given HttpFilter, a basic do-nothing filter looks like this:


using System;
using System.IO;
using OurFirstFilter;


public class PassThroughFilter : HttpFilter {
  
  public PassThroughFilter(Stream baseStream) : base(baseStream) {}
		
  public override void Write(byte[] buffer, int offset, int count) {

    if(Closed) throw new ObjectDisposedException("PassThroughFilter");
    BaseStream.Write(buffer, offset, count);

  }
}

Given a good base class, implementing a filter can be pretty easy.

Installing a Filter

The ASP.NET folks have given us a variety of ways to plug the filter into the HTTP processing pipeline. Anywhere we can access ASP.NET's Request object, we can insert a filter. Keep in mind that the filter must be installed before any content is written back to the client. This means that we must install the filter before flushing any content to the client. That said, the three primary places we're going to install a filter are: 1) from within a custom HttpHandler, 2) from within a custom HttpModule, and 3) from within an ASP.NET Page object. I'll cover these in reverse order.

Installing a filter from within a Page object is great if you want to scope the filter to one page on your site. Remember, you have to install the filter before flushing any content back to the client. The best place to do this with a Page is inside of OnInit(). The code looks likes this:

using System.Web.UI;

public class MyPage : Page {

  protected override OnInit(EventArgs e) {
    Response.Filter = new MyFilter(Response.Filter);
    base.OnInit(e);
  }
  
}

Great! However, what if we want to apply a filter to an entire site?

HttpModules fit this niche nicely. An HttpModule can sink appropriate events from the HttpApplication and install the filter as needed. A common place to install the filter is in a handler for the BeginRequest event. This ensures that the filter is installed before any content is written to the client. By using an HttpModule, you can easily filter content without having change your content-generating pages. Additionally, any other requests that flow through the ASP.NET HTTP pipeline will also use the filter. This includes web services and any custom HttpHandlers you may be using. Check out the FilteringModule in the downloadable example code for an example of HttpModule-level filtering. This is probably the most common way to install a filter.

HttpHandlers have the easiest time installing a filter. You can set it directly into the Response exposed by the HttpContext passed into the ProcessRequest method. This gives you the ability to scope the filter to one handler, instead of an entire site. This is very similar to Page-based installation, but you don't have to worry about the Page processing model. Realistically, you're probably not going to do this; you're already controlling the entire process of responding to an HTTP request and there are usually simpler methods to accomplish the same result. Of course, most of the examples in the provided code are written against custom HttpModules.

Considerations

So far, filtering is sounding pretty nice. You can easily install a filter, it doesn't require any setup inside of IIS, and it can fit nicely into an xcopy-based deployment. In many ways, filtering inside of ASP.NET is a great alternative to writing an ISAPI filter. But with any technology, there are drawbacks. With filters, the drawbacks center on state management, inefficiency, and a number of strange interactions with other parts of the ASP.NET framework.

Simply put, writing a good filter can be very hard. Take a look at the Catch22Filter in the provided sample. It's a pretty simple replacement filter. This filter looks for words in the output buffer and wraps them in some nice HTML that makes them appear as censored text in the final output. Hit censor.ashx and give it a shot. Notice that with the default word list, not only are the words in the body of the document censored, the words in the title, and possibly inside of tags, are also censored. To properly filter an HTML document, you have to keep track of where you are in the document and only apply the filter at the appropriate time. This may not seem too hard, but remember two things: 1) that you have to take into account different text encodings, and 2) looking at each character imposes a performance penalty, especially in terms of working set.

You may not realize it, but the byte[] being passed into the Write(byte[] buffer, int offset, int count) override has already been encoded into the target text encoding. That's right, it's already UTF-8 with the default ASP.NET install. If you're looking for any kind of token in the output stream, you have to encode the token using the same encoding and search for its byte sequence. You could convert the entire buffer back into a string, but that can create large amount of garbage and negatively impact performance and working set. Be sure to profile your memory usage if you take this route. Additionally, for situations like the Catch22 filter where you're trying to find given words, you have to fully understand the character comparison rules for your target language. In the face of internationalization, it can get rather messy.

Filtering sometimes feels more like a tacked-on feature than a first-class citizen in the processing pipeline. Filtering doesn't work if your code calls the End method on the HttpResponse, either directly or indirectly. When End is called, the HTTP pipeline bypasses the filter and sends whatever is currently in the real output buffer to the client. Unfortunately, Server.Transfer calls End as part of its processing. Therefore, if you make a call to Server.Transfer, your filter is going to be bypassed, and any work you expected it to perform will be skipped. To prove the point, hit the transfer.ashx HttpHandler in the provided samples. If you take a look at the log file, you'll see that a TracingFilter is constructed, but then nothing further happens. The writes by the transferred-to page are not passed into the filter, and the document escapes uncensored. A workaround is to use Server.Execute instead of Server.Transfer. The semantics are a bit different, as Server.Execute returns execution to the original page after the called page is executed, but it does work with filters.

The biggest problem with using an HttpModule is figuring out where to add yourself into the processing pipeline. There are a number of events you could sink, but which ones really make sense? For example, what if you want to only filter HTML content? To do this, you would probably check the ContentType header and then decide whether or not to install the filter. From looking at the docs, it would seem to make sense to sink the ReleaseRequestState event, check the ContentType, and then install the filter if we have a ContentType we like. The code looks something like this:

using System;
using System.Web;

namespace OurFirstFilter {
  
  public class HtmlOnlyFilteringModule : IHttpModule {
    
	public void Init(HttpApplication context) {
      context.ReleaseRequestState += new EventHandler(context_ReleaseRequestState);
    }

    public void Dispose() { }

    private void context_ReleaseRequestState(object sender, EventArgs e) {
      HttpResponse response = HttpContext.Current.Response;

      if(response.ContentType == "text/html") {
        response.Filter = new TimestampFilter(response.Filter);     
      }
    }
  }
}

This works wonderfully until you call the Flush method on the HttpResponse object from within your page. When you flush, content is written to the client before ReleaseRequestState is fired, so the filter is installed after some content has been written. To fix this, we need to also sink PreSendRequestHeaders(which should be called PreSendResponseHeaders) and remember who installed the filter. Here's the fixed code:

using System;
using System.Web;

namespace OurFirstFilter {

  public class HtmlOnlyFilteringModule : IHttpModule {

    const string CONTEXT_KEY = "HttpOnlyFilteringModuleInstalled";

    public void Init(HttpApplication context) {
      context.ReleaseRequestState += new EventHandler(HandleOpportunityToInstall);
      // have to sink PreSendRequestHeaders too to handle calls to Flush
      // comment this next line out and hit Flusher.ashx to see it fail
      context.PreSendRequestHeaders += new EventHandler(HandleOpportunityToInstall);
    }

    public void Dispose() { }

    private void HandleOpportunityToInstall(object sender, EventArgs e) {

      if(!HttpContext.Current.Items.Contains(CONTEXT_KEY)) {
      
        HttpResponse response = HttpContext.Current.Response;

        if(response.ContentType == "text/html") {
          response.Filter = new TimestampFilter(response.Filter);      
        }

        HttpContext.Current.Items.Add(CONTEXT_KEY, new object());
      }            
    }
  }
}

Notice that we remember if the filter has been installed by setting an item in the HttpContext's Items collection. We can't reliably use a member variable, because one instance of the HttpModule is reused among all requests. We really need to scope the "installed" memento to the current request, and the HttpContext is a great way to do that.

Alternatives

If the limited abilities of ASP.NET's HttpResponse filter don't meet your needs, there are a few other options. If you have your heart set of filtering the output of the ASP.NET HTTP pipeline, your only real option is to write an ISAPI filter. The ISAPI filter can do anything you desire and it is completely decoupled from ASP.NET. An ISAPI filter would work for any content flowing through IIS, including normal HTML files, PHP, JSP, and any other technology that can sit inside IIS. The downside is that you have to write an ISAPI filter, no small task.

Another alternative is to redesign your application to include the functionality you're trying to provide from within the existing ASP.NET framework. Be sure you understand all of the extensibility points of the framework before you latch onto the response filter as a golden hammer. There are often better ways to include common functionality. Check out user controls, the RegEx classes, and custom HttpHandlers to make sure you can't accomplish your goal some other way. This is a bit of a cop out, as you're not really filtering, you're refactoring.

Ben Lowery is a developer at FactSet Research Systems, where he works on all things great and small.


Return to ONDotnet.com