WindowsDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


AddThis Social Bookmark Button

Rotor Comes to Linux

by Shaun Bangay
07/01/2002

The Quest Begins

When it comes to Linux and open source issues, people slink, cringing, past my office for fear of accidentally triggering off an evangelic outburst. The last person to mention Microsoft in my presence is still receiving counselling. When a number of my colleagues announced their intention to conduct further investigations into the recently-released shared source C# compiler, code-named Rotor [1], I was appalled that they would be restricted to working on Windows or FreeBSD. "It already runs on one Unix base," I thought, "how hard could it be to make it compile under a different flavor?"

[1] Incidentally, as a paraglider pilot, the definition of rotor that I am most familiar with is: the wind turbulence behind obstacles that causes your wing to collapse, and you to crash and die.

I tracked down the release on Microsoft's Web site, surprisingly getting to the download page without my browser expiring. Not unexpectedly, I am expected to actively acknowledge a license agreement. After careful scrutiny, it appears that I am to be allowed to read the code without forfeiting all future rights to approach within 100 meters of a computer. (Although since writing this article, I have already had one warning from Microsoft about the manner in which I am complying with the terms of this license.)

I continue with the download. Tarred and gzipped ... this is looking promising. Perhaps someone is getting a feel for this open source business after all? Once expanded, the 10MB file grows to a 100MB monster (and finally reaches 900MB once the build is complete). At initial glance, it doesn't look too bad -- the use of autoconf is promising, and it is with optimism that I run a quick configure, followed by the instructed buildall.

This is when things stop looking so good. The first line of the build process seems to be trying to link C files into an object file. Justifiably, the system complains. At this point it is clear that some manual intervention is required. I check the documentation -- and start to get some idea of the way the system is structured.

Porting a PAL

The first part of the build process compiles a section known as the Platform Adaptation Layer (PAL). This essentially makes any other platform appear to be Windows -- at least from the point of view of the standard C library functions. The PAL comprises some 60 files -- and represents the area where the most effort needs to be expended when porting the system to a new platform.

I apply a bit of manual intervention at this point, grateful that compiling before linking has its benefits. Each file in the PAL demonstrates some quirks. Problems during this process fall into three categories:

  • Functions that have a slightly different format under Linux. These can quickly be adapted to the appropriate form by comparing definitions in the Linux man pages and those on a FreeBSD system.
  • Functions that don't seem to exist at all under Linux. Initially, I comment these lines, out on the assumption that they are obviously not important. Inevitably, they will be returned to active status later, as they are adapted to the Linux environment.
  • Header file ordering (for want of a better term). The PAL is supposed to provide the C functions as used under Windows; however, it must implement them using the C functions as used under Linux (in this case). Parts of the system may need to see one or the other or even both versions of these functions.
  • There is an impressive piece of header file engineering in place, in the form of rotor_pal.h and palinternal.h, which rename, undefine, and redefine functions to support all permutations of requirements. The working of these is complicated by Linux having its own overlapping set of functions that sneak through some of the gaps, leaving one never quite sure of which function one is using or where it has been defined. I settle for the option of commenting out all offending lines. These will be brought back in on an individual basis as required.

Once all of the pieces have compiled and linked, I try out the single example in the directory examples. After a bit of tweaking, it starts (and I finally find out who is responsible for calling main in a C program -- for the answer send a self-address, stamped envelope to ...), and stops, and then even prints out the debugging message I added. A bit of self-congratulation is applied at this point, and I explore further to see what is next.

Compliance Testing

At this point, I encounter something I haven't seen very often with open source systems: compliance tests that actually run the various pieces of code and check to see if they perform as expected. In this context, these are wonderful -- trying to build the rest of the system with the port of the PAL functions in an uncertain state would have been well-nigh impossible.

For interest, the number of test failures during the porting of the PAL is shown in Figure 1.

Graph of test failures versus trial number.
Figure 1. PAL Compliance Test Results.

The brief setback at trial 4 is a result of finally figuring out how the header files work, a process that initially broke more things than it fixed.

The compliance tests are comprehensive, and informative as to the nature of any problems. I have only spotted a single test with which I refuse to comply -- the one for random numbers that insists that no element may recur in a series of 10 random numbers.

Correcting some of the other problems makes me aware of incompatibilities between FreeBSD and Linux. The format string %05d for the printf function is fairly universally accepted as printing an integer, with leading zeroes. The string %05s is apparently somewhat more ambiguous: do you still add the leading zeroes (required for PAL compliance) or provide leading spaces (as assumed by Linux)? Other similar printing issues also keep manifesting in the test failures, which eventually prompts me to grab the source of these functions from that nearby FreeBSD system and to use these to replace calls to the native Linux print routines. Once again, a triumph for open source. The mix of GNU and Shared Source licenses should keep the lawyers employed for a while.

Other areas that require work are the networking functions and process management. Most of the problems with the networking functions involve translating the Linux error value into the value expected by the compliance test for that scenario. Process management requires some changes in signal names to their Linux equivalent, and some rather processor-specific register reporting. More challenging is the lack of a suspend option in the Linux pthreads package. This fortunately can be replaced by some juggling with signal handlers -- and has the beneficial side effect of uncovering a rather sneaky race condition in the process termination code.

Fixing the various problems is a process that extends over a number of weeks. Since each individual problem is relatively small and can be corrected in a short time, this process provides a delightful "hacking fix" between preparing lectures, marking tutorials, and terrorizing research students.

Onward with the Build

A number of outstanding issues still remain in the PAL: input equivalents of the print problems described earlier, still more networking constant re-labelling issues, and about three serious problems that will require substantial re-engineering (which I decide to defer until there appears to be justification for the effort).

By the time I complete this exercise, I have identified the pieces required to convince autoconf to build working makefiles, and the PAL is building to completion in an aesthetically pleasing manner. I now go back to the Rotor root directory and unleash the build process on the rest of the system.

To my horror, I discover that autoconf is merely a thin veneer over something more primeval. It is starting to look as though the PAL is required just so that nmake and the other utilities reminiscent of Visual C can be compiled. These then take over the rest of the compilation process and drive the construction of the C# compiler and interpreter.

Between some exciting debugging and repeatedly running off to find another machine from which to kill the processes that have gone berserk, draining my machine of every spare byte [2], I eventually get these tools compiled and begin to let them plough through the 100 billion lines [3] of code making up the bulk of the system.

[2] There must be a technical term for this operation.

[3] I exaggerate. Slightly. At last count I saw about 580 000 lines of C code, another 320 000 lines of header files, and 660 000 lines of C#.

I cringe as the compiler (good old g++) begins throwing up syntax errors. The prospect of individually hand-fixing that amount of code is daunting. Fortunately, closer observation shows that most of the problems fall into a small number of categories:

  • Warnings about using negative values with unsigned types, and casting pointers to various other pointer types. I ignore these; in the past, g++ has always been smarter than me in dealing with these, anyway. These messages do, however, seem to indicate a rather intimidating dependence on word size and address space of a particular architecture.
  • The preprocessor complaining about the way # and ## are used. [4] These are used to turn preprocessor tokens into strings and concatenate them. It seems recent versions of the preprocessor have imposed new restrictions on what can be used between the tokens, and this code violates them. Fortunately, the problems are confined to a small set of header files; an exhaustive search of all possible ways to arrange all of the non-alphanumeric symbols on the keyboard reveals a formulation that is acceptable to the preprocessor without altering the original intent of the header files.
    [4] I resist the urge to comment about the use of # following a single uppercase C.
  • The use of the variable name or as shorthand for object reference. I hadn't realized that this is a reserved word, but g++ certainly did. A global search and replace is tempting, but I quickly realize that a two-letter string would occur in too many other places -- and while it was tempting to have every file copyrighted by the Microsoft Cobjrefpobjrefation, I eventually settle on the path of tedious supervised substitution.

Houston, We Have a Binary

At this point, I start the build process and a few hours pass before I need to exercise any manual intervention. Executable files are starting to materialize, and I spot several that I need to reach my ultimate target: the hello world program mentioned as the second step in the quick start process.

The build seems to be jamming consistently, so I decide to short-circuit the process and jump directly to the goal. Invoking csc hello.cs immediately informs me that I am missing some DLLs. Missing DLLs on a Linux box! At this point in the past, previous software products (most notably an early commercial office package, purportedly for Linux) have been drag-and-dropped to the (physical) trash can. Strangely tolerant in this case, I apply a little manual intervention in the build process and the missing pieces begin to appear.

With these additional pieces present, the compiler now churns happily (it's quite snappy, actually) and disgorges something entitled hello.exe. [5] The application of another utility, clix, is required to produce the anticipated global greeting message. Running it as instructed produces ... nothing. No message printed, alas. On the other hand, no ominous error message requiring backtracking through two million lines of code. In fact, not even the command line prompt -- the interpreter has wandered off into never-never land.

[5] Tirades relating to .exe files on a Linux box have been deleted in interests of sticking to the point.

At this point, I am on the verge of discarding the whole thing. The error is either in the compiler, meaning I might have to wade through the complex data structures involved in that sub-system, or in the interpreter, which means dealing with the virtual machine -- the part responsible for the bulk of the compilation time.

A few days later a synapse fires and I realize that the executable produced should be portable across all of the supported platforms. A number of my colleagues are subjected to a barrage of hello.exe, and they quickly confirm that the executable not only works on their systems, but is also byte-for-byte identical to their version of the executable program. Kudos to the compiler writers -- cross-platform compatibility on the first attempt.

Enter gdb, the Gnu debugger. At this point, a number of things become clear:

  • The majority of the time taken to execute small executables results from pulling in various dynamic libraries (not all DLLs, though).
  • Some magic numbers, presumably used for signing binaries, are not cooperating. There is no way I'm reversing an encryption algorithm to find out what is missing, so for the moment, the Linux version of Rotor is ignoring these.
  • There are at least two race conditions in the system. One is in the process termination code as part of the PAL, which I find and correct. The other seems to manifest itself on dual processor systems (such as the one I have been using) and may have something to do with the invocation of the garbage collector. In the time-honored tradition, I'll leave this as an exercise to the reader.
  • At some point in the build process, C# programs are compiled and run. A deadlock in the interpreter obviously impacts negatively, hence current warnings in the Linux README file about building on dual-processor systems.

Eventually, I am triumphantly able to announce:

Hello World! 

(As copied and pasted from the output of a genuine C# program running on a single processor RedHat Linux 7.3 system.)

Conclusion

A number of issues still remain to be resolved, but the port is currently usable for its primary purpose: experimenting with the system. Issues of networking conformance in the PAL, and of security in the compiler/interpreter, will need to be resolved before serious work can be done using these aspects.

Having got to this point, I have realized that it is possible to port a system of this nature without needing an understanding of exactly what it does. I have invoked several of the applications, including the compiler and interpreter, with a very limited knowledge of C#, the nature of the intermediate language, compilation strategies, or execution environment. With the debugger, I have waded through morasses of code while still being able to identify the point at which things start misbehaving. Such successes I attribute to the nature of the programming; for the most part, sensible use of variable and function names make the code quite readable. For code explicitly acknowledged as pre-release quality, I am rather impressed. In many cases, it is clear where provision has been made for adapting and building extensions to the system, in the form of debugging structures, comments around the appropriate pieces, and test suites for relevant modules.

Another significant advantage when performing this port has been the availability of the compliance tests. When available in other systems, I have found these vaguely interesting but of limited usefulness. [6] Successfully porting the PAL would have been impossible without the comprehensive set of tests provided. In addition, the compliance tests have one unanticipated benefit. It is extremely rewarding to be able to make alterations to code and have one's changes immediately tested, and to receive confirmation in quantitative form that one is actually getting measurably closer to one's desired goal. Porting this code is fun.

[6] In one previous case while porting a virtual reality system to another language on a different architecture, the compliance tests indicated that the compiler was unable to distinguish between positive and negative numbers. I was somewhat amused to discover the system that I was porting (except for one minor module) continued working happily.

Where to Now?

The Linux port of Rotor is now publicly available. Thanks to the rapid response and assistance from Brian Jepson at the O'Reilly Network, both the code and announcement of its availability have been spread further than I would have been able to do personally. I have received positive feedback from a number of sources -- including offers of assistance from several members of the Rotor team at Microsoft. A number of people have offered to host a CVS archive to coordinate further work on the system. Patches are already starting to filter in, supporting Linux distributions and working environments that I wouldn't have been able to test myself. The benefits of an open source development process are already clear.

Resources

Shaun Bangay is employed as an associate professor in the Computer Science Department at Rhodes University, where he teaches courses in Computer Graphics, Networks and Operating Systems.


Return to .NET DevCenter