Using NDoc: Adding World-Class Documentation to Your .NET Components

by Shawn Van Ness

I've never been a big fan of source-code-based documentation generators -- tools that attempt to produce reference documentation by mining specially- formatted comments out of source code; at least, not a big fan of any of the popular ones that have existed for C++ and Java -- and certainly not for any purpose greater than supplementing an internal spec doc. The concept is clearly of great value: by scanning the source code, the doc-generator can alert the author to any code items that are missing documentation. But historically, in this developer's opinion, they've always suffered from too many problems to be truly useful -- from buggy parsers to a wide range of usability issues, I've just never met a doc-generator that was worth the effort.

In fact, I so despised the lot of them that I once took a crack at writing one myself. But the rumors are true -- I've recently met a new documentation generator, and I've fallen in love. Its name is NDoc, and I do believe it loves me too.

NDoc is not Just Another Source-Code-Based Document Generator

Strictly speaking, NDoc isn't really a source code-based documentation generator, at all. It's simply a very fancy set of XSLT templates that consume the XML doc-comments emitted by the C# compiler, and emit a folder full of HTML files (in a style very reminiscent of the .NET docs on MSDN) and associated manifests for building compiled HtmlHelp modules, or CHM files. All this, wrapped by a nice GUI and command-line interface, of course.

What's the problem with source-code-based documentation generators? Historically, tools that mean to generate docs based on source code face two fundamental problems. The first problem is that they rely on third-party parsers, which have a tendency to be buggy. Let's face it, writing a parser for languages like C++ and IDL is a serious challenge. Modern languages like Java and C#, perhaps less so -- but still it's a non-trivial exercise, and there are bound to be bugs, oversights, and omissions. Sometimes it's something complex, like a nested class with a static constructor, other times it's something simple like a variable name containing an "ü" character. But there always seems to be some obscure corner case in the language's grammar that doesn't play well with the documentation generator's parser.

The other problem -- suffered by virtually all traditional documentation generators -- is that they tend to swamp one's code with "green". That is, if you actually want to write more than a sentence or two for a class or method (heaven forbid!), your actual code will become lost in a sea of doc-comments.

Furthermore, any serious development project with external documentation deliverables will, at some point, probably want to hand off the reference documentation to a team of technical writers. But nobody wants a horde of tech writers stomping all over their product's source code control system! What to do? There's just no easy way to separate doc-comments from source -- the two are inextricably linked (which, arguably, is the point of having a source-code-based documentation generator in the first place).

And all of this is to say nothing of the fact that source code text editors do not typically offer a very user-friendly interface for writing and maintaining English-language documentation.

I attempted to solve this second problem when I created DocGen (see References) by keeping the English-language documentation in a separate XML file, and "weaving" it into the code structure parsed from an IDL file. The technique worked well -- the XML file itself could be handed off to a technical writer, even checked out separately so as not to interfere with the development effort proper. And yet, the tool could still alert us to missing documentation, as it attempted to match the contents of the IDL file to elements in the XML file. Unfortunately, DocGen failed to rise above the first problem (IDL is a very ugly language, and our parser was a veritable nest of bugs). But the premise was sound.

NDoc (and .NET) to the Rescue!

Addressing the first problem, NDoc doesn't rely on a buggy C# parser. It doesn't have to scan your C# code at all -- it uses .NET reflection to peer inside of your compiled assemblies' metadata tables, which contain the name of every class, structure, method, event, and so forth, in your entire project. Unlike type libraries in the days of COM, the type information in a .NET assembly is complete and true, with nothing left to the imagination -- perfectly suited for generating full-fidelity documentation!

Addressing the second problem, NDoc doesn't force you to drown your source code in a sea of doc comments -- it fetches documentation text from an XML file. NDoc matches documentation from this XML file against type information from a .NET assembly to produce a folder full of HTML files (and optionally, a compiled CHM file). Although many programmers out there do choose to use C#'s doc-comment feature to generate this XML file, it's not strictly mandatory. And, as we'll see later, there are ways to point the C# compiler to an external XML file, as well.

Now how much would you pay? Nothing? That's fine, because NDoc is freely available on SourceForge!

A Quick Overview of C#'s XML Documentation Feature

The C# compiler includes a command-line switch (/doc:<file>) that instructs the compiler to emit an XML file containing docs mined from the C# source code. Such docs are marked up as XML, embedded within line comments that begin with a triple forward-slash sequence, like so:

using System;

namespace Arithex.Samples
  /// <summary>
  /// This is a summary of the Foo class.
  /// </summary>
  public class Foo
    /// <summary>
    /// This is a summary of the Bar method.
    /// </summary>
    public void Bar(int x, int y)
    { }

The XML produced for the above C# code looks something like this:

<?xml version="1.0"?>
    <member name="T:Arithex.Samples.Foo">
      This is a summary of the Foo class.
      This is a summary of the Bar method.

Now, this XML "schema" is a bit obtuse -- it models pretty much everything as a <member> element, with a single attribute, name, that indicates whether the item is a type, a method, a property, an event, or a field, via a single-letter prefix. The details (such as the types of a method's parameters) are also encoded into the name attribute in a crufty text format. It's ugly, but I suppose it gets the job done -- NDoc can consume it, and that's all that matters! For a thorough tour of C#'s doc comment feature and the XML it generates, I refer you to the MSDN links in the References section of this article. However, as a convenient reference, I've created a mapping of which XML doc-comment elements apply to which aspects of the C# language, and vice versa, in Tables 1 and 2.

Table 1: XML doc comment elements vs. C# language elements

XML doc element C# language element
<c> n/a -- used to mark up code, inline
<code> n/a -- used to mark up blocks of code, as in an example
<example> n/a -- used to declare a block of example code
<excepetion> constructor, method, property -- used to declare the exceptions your code might throw
<include> n/a -- used to import some XML docs from an external XML file
<list> n/a -- used to mark up a list of items
<para> n/a -- used to separate paragraphs
<param> methods, constructors -- used to provide documentation for individual parameters
<paramref> n/a -- used to mark up a reference to a method parameter
<permission> methods, constructors, properties -- used to declare which .NET permissions are needed in order to call a method
<remarks> classes, structs, interfaces, enums, methods, constructors, properties, events, fields -- this is the "long" description of a C# code element
<returns> methods, delegates
<see> n/a -- used to mark up a link to another documentation item, inline
<seealso> anything -- used to provide a manifest of related documentation items
<summary> classes, structs, interfaces, enums, methods, constructors, properties, events, fields -- this is the "short" description of a C# code element
<value> properties

Table 2: C# language elements vs. XML doc-comment elements

C# language element XML doc elements
class <summary>, <remarks>, <seealso>*
struct <summary>, <remarks>, <seealso>*
interface <summary>, <remarks>, <seealso>*
delegate <summary>, <remarks>, <seealso>*, <param>*, <returns>
enum <summary>, <remarks>, <seealso>*
constructor <summary>, <remarks>, <seealso>*, <param>*, <permission>*, <exception>*
property <summary>, <remarks>, <seealso>*, <value>, <permission>*, <exception>*
method <summary>, <remarks>, <seealso>*, <param>*, <returns>, <permission>*, <exception>*
event <summary>, <remarks>, <seealso>*

*may have multiple instances of this element

