An Architectural Tour of Rotorby David Stutz
The Microsoft Shared Source CLI Implementation (aka "Rotor") is a source code distribution that includes fully functional implementations of both the ECMA-334 C# language standard and the ECMA-335 Common Language Infrastructure standard. These standards together represent a substantial subset of what is available in the Microsoft .NET Framework. The source code will build and run under Windows XP or FreeBSD 4.5, and the distribution contains numerous additional goodies, including a JScript compiler written entirely in C#, an IL assembler, a disassembler, a debugger, tools for examining metadata, and other samples and utilities. To complement this article, we've also published "Get Your Rotor Running", which takes you through the steps of installing, building and running Rotor.
Downloading and expanding the Rotor tarball reveals a bewildering collection of scripts, license files, specifications, and subdirectories jammed full of mysterious source code. The READFIRST file may also seem daunting, as it refers to the distribution as "experimental" and "beta-quality code with known defects..." The prospective Rotor enthusiast may have several reservations at this point: "Hmmm, is it this ready for primetime? What are the most interesting directories to browse?" This article will strive to answer these questions, dispel any such reservations, and help with the most interesting question of all: "How do you make Rotor build and run?"
A View from the Top (the sscli directory)
Before doing anything else, it is a good idea to take a look at the license for Rotor, which you will find in the root of the distribution in a file named license.txt. Rotor is liberally licensed for non-commercial use -- you are free to make modifications and to share these with other folks, for example -- but it is licensed. Before browsing the code, as with any source-code distribution, you'll want to ensure that you're comfortable with the terms that come attached to that code!
There are four interesting "districts" in the Rotor code:
- A full-featured runtime execution engine
- Frameworks that expose this runtime to programmers
- Compilers and other tools that target the runtime
- A portability layer, copious tests, and related build utilities
|To download Rotor, and for support and license information, visit Microsoft's Shared Source CLI Beta page.|
These four areas are spread, scattershot, across the source tree. As with any project of this scope, both history and build dependencies have conspired to make navigation less than perfect. Fortunately, there is documentation to help. Whether you want to learn, to tinker, or to experiment with Rotor's infrastructure, you are likely to find that the files in the docs directory make a valuable first stop (along with the file named readfirst.html found in the root of the distribution).
Execution Engine (the clr/src directory and environs)
Conceptually, the execution engine is the heart and soul of the CLI runtime, and as such, it contains a large quantity of fascinating code. Compilers and tools that target this engine, including the C# and JScript compilers that come as a part of Rotor, create and manipulate executable files that contain metadata tables, resource blobs, and code in the form of abstract CIL opcodes. (CIL, the Common Intermediate Language, is an intermediate representation of program instructions that can be shared by tools targeting the CLI.) Executables of this form are commonly referred to as "managed executables," and the code contained in them, when running under the control of the execution engine, is called "managed code."
Dave Stutz is coauthoring O'Reilly's upcoming book on Rotor, Shared Source CLI Essentials. This book will provide a roadmap for anyone trying to navigate or manipulate the Shared Source CLI code, and will include a CD-ROM that contains all the source code and files.
The loading of a managed program is a miracle of self-assembly, during which those inert blobs of metadata, resources, and CIL are transformed into instructions executing directly on the microprocessor. The vm sub-directory contains the main core of the life-support system for managed components that accomplishes this transformation, including the CLI's sophisticated automatic heap and stack management, its object-capable type system, and its mechanisms for dynamically loading code, safely. The fusion and md (metadata) subdirectories are also important; they comprise important parts of the data-driven process, and have code for resolving references to external types, and metadata manipulation and validation code, respectively.
One instructive way to find your way through the execution engine is to take a
very simple managed program and trace its execution in your debugger of choice.
By doing this, you can see the initial load sequence used, and familiarize
yourself with the C++ classes used to implement the execution engine itself.
MethodTables, and finally, the various
classes that represent managed objects directly, are all worth browsing.
fx, managedlibraries, bcl, and
In addition to the exotic machinery of managed code, you'll also find more familiar programming support infrastructure in the Rotor CLI, wrapped up as a set of class frameworks. The specification for these frameworks is part of ECMA-335, and includes a "base class library" (commonly referred to as "the BCL"), runtime infrastructure and reflection classes, networking and XML classes, and floating point and extended array libraries. All of these are in Rotor, in source code form. There are also a few additional libraries included in this distribution, most notably support for regular expressions and an extensive framework for type serialization, object remoting, and automatic type marshaling.
|What do you think of Rotor and Microsoft's shared source initiative? Is shared source open enough for you? Are you excited to start working with Rotor?|
Unlike some virtualized execution platforms, the CLI was never designed to obscure the details of whatever system runs beneath it. Much as the implementers of the C programming language took a minimalist approach to exposing portable runtime services, the CLI likewise tries to provide only what is absolutely necessary. Of course, in 2002, the list of services that programmers take for granted is quite a bit larger than it was in the early 1970s: verified typesafety, support for Web services, and support for interop between managed and unmanaged code all fall on the list of today's "minimal subset."
Compilers and Tools (
One of the unique features of the CLI is the depth to which components written in many different languages can share their representation and runtime behavior. This seamless interoperability is one of the most compelling reasons to use the services of the CLI; component builders can exploit the unique characteristics of the platform on which their components are running, while still enjoying the benefits of shared infrastructure. Furthermore, tools written against the CLI will automatically complement pre-existing tools, languages, and runtimes because of the CLI's built-in capability for interoperation. To see this aspect of the CLI in action, examine the implementations of language compilers and tools found in the Rotor distribution.
The JScript compiler, found in the jscript directory, is completely written in C#. The language itself is quite interesting from the perspective of a compiler writer, since it supports dynamic reshaping of classes, as well as the runtime evaluation of arbitrary fragments of code. For those who would like to implement dynamic languages such as Python or Scheme on top of the CLI runtime, or understand how this could be done, this code will prove instructive. In particular, note the heavy use of runtime reflection and the dynamic emission of metadata.
In order to build the sources for the JScript compiler, there must be a C# compiler in the Rotor distribution. Not surprisingly, there is, and it can be found in the clr/src/csharp directory. C# is a new language that has been developed in parallel with the CLI to highlight the features of its environment. Not only does C# reflect these capabilities, but the language was also standardized by the same ECMA technical group that worked on the CLI. The Rotor C# implementation should be a useful guide to anyone building their own C# compiler and/or frameworks.
Besides the C# compiler, the sources to a managed code debugger, an assembler and disassembler (ILASM and ILDASM), an assembly linker, and a stand-alone verification tool reside in subdirectories of clr/src. These tools will be indispensible as you look through Rotor and work with the code; they will serve as both implementation examples and everyday programming tools.
The final compiler to point out during this leg of the tour is the combination JIT compiler and verifier that lives in the clr/src/fjit directory. Large parts of Rotor are written in C#, and because the Rotor C# compiler outputs CIL opcodes rather than native code, this means that large parts of Rotor are compiled by the JIT compiler when the types are loaded, at the last possible moment. The "just-in-time" approach to loading allows for an executable format that is portable from platform to platform -- those of you with both FreeBSD and Windows can compile a C# program on one OS and then run it on the other. Of course, in this case, both systems are built for x86 microprocessors, and so the same low-level instruction set can be used without changes. What is much more interesting is that the Rotor JIT has been designed to be portable to other microprocessors. Although its design is simple, it is very easy to build new versions of the compiler that target other chips. Which is a fine way to move on to the topic of portability ...
The Platform Adaptation Layer and Build (
pal, palrt, tests and
Our final stop on this whirlwind tour is Rotor's portability layer, tests, and build tools, which together enable moving this distribution to alternate platforms. It is relatively simple to target new microprocessor architectures in Rotor's JIT, but remember, there is more to a platform than just the instruction set on which it relies. The native operating system APIs and the toolchain through which they are programmed also present large porting issues! In order to facilitate moving Rotor's codebase, which was originally written for Windows, our development team adopted a very common porting strategy: the use of a "Platform Adapatation Layer" (PAL).
The code for the FreeBSD PAL, which is found in the pal/unix directory, is well worth a peek. This code was written to conform with a subset of the Win32 API, outlined in docs/techinfo/pal_guide.html and implemented in pal/rotor_pal.h. By mimicking Win32's semantics for structured exception handling, threading, synchronization primitives, file and network I/O, debugger support, and other similar system-level services, porting the Rotor codebase became more of an exercise in finding PAL bugs than in re-implementing existing code. Furthermore, anyone who wishes to move Rotor to new platforms should find repeating the same exercise straightforward. There is a small amount of platform-specific assembler code in the execution engine, but besides this, the PAL and JIT make up the bulk of porting work.
The bootstrap sequence for the Rotor build process is interesting, and shows that the PAL is important for more than just the CLI runtime itself. The first thing to build on any platform is the PAL itself; obviously, this must be done using native libraries and tools. After the PAL has been successfully built, Rotor's own build tools are then compiled against the PAL, after which the CLI C# compiler can be built. By this point, we have a working C# compiler, and so the large number of C# files that are a part of this distribution can be compiled. And since C# uses the managed execution environment, the last step of the build process actually occurs when you run any of the programs that contain managed code -- the JIT compiler is invoked on your behalf!
The build tools used in this bootstrap sequence can be found in the tools directory, and are documented in the docs/buildtools directory. Once the build has been successfully executed and you are making modifications, you'll want to pay a visit to the tests directory, in order to take advantage of its PAL suite, as well as the general Rotor quality suites, which currently contain base IL tests, base verification tests, and some JIT verification tests.
That's it for the tour. There is no better way to learn about the CLI and C# standards than by browsing and building the Rotor sources. For those ready to take the next step -- modifying or extending the code -- the depth of the Rotor codebase will not disappoint. The entire Rotor team sincerely hopes that this project will provide a great foundation for whatever programming itches you want to scratch!
David Stutz is a tenured member of the Microsoft Research team, and is currently working on the team that is implementing the Microsoft Shared Source CLI.
Return to the .NET DevCenter.