Deep Inside C#: An Interview with Microsoft Chief Architect Anders Hejlsbergby John Osborn
In July, O'Reilly editor John Osborn attended the Microsoft Professional Developer's Conference where he conducted the following interview with Anders Hejlsberg, Distinguished Engineer and Chief C# Language Architect about Microsoft's .Net framework and the C# programming language. Anders Hejlsberg is also known for having designed Turbo Pascal, one of the first languages available for PCs. Anders licensed Turbo Pascal to Borland and later led the team that created Delphi, a highly successful visual design tool for building client server applications. Also in attendance at the interview were Tony Goodhew, Microsoft C# product manager, and O'Reilly Windows editor Ron Petrusha.
Osborn: I've been looking at press stories about C# [pronounced "See sharp"] and notice that many of them seem to lead with the observation -- or perhaps the theory -- that C# is either a clone of or a Microsoft replacement for Java. If you could write the headlines, what would you like people to say about the language?
Hejlsberg: First of all, C# is not a Java clone. In the design of C#, we looked at a lot of languages. We looked at C++, we looked at Java, at Modula 2, C, and we looked at Smalltalk. There are just so many languages that have the same core ideas that we're interested in, such as deep object-orientation, object-simplification, and so on.
One of the key differences between C# and these other languages, particularly Java, is that we tried to stay much closer to C++ in our design. C# borrows most of its operators, keywords, and statements directly from C++. We have also kept a number of language features that Java dropped. Why are there no enums in Java, for example? I mean, what's the rationale for cutting those? Enums are clearly a meaningful concept in C++. We've preserved enums in C# and made them type-safe as well. In C#, enums are not just integers. They're actually strongly typed value types that derive from System.Enum in the .NET base-class library. An enum of type "foo" is not interchangeable with an enum of type "bar" without a cast. I think that's an important difference. We've also preserved operator overloading and type conversions. Our whole structure for name spaces is much closer to C++.
But beyond these more traditional language issues, one of our key design goals was to make the C# language component-oriented, to add to the language itself all of the concepts that you need when you write components. Concepts such as properties, methods, events, attributes, and documentation are all first-class language constructs. The work that we've done with attributes -- a feature used to add typed, extensible metadata to any object -- is completely new and innovative. I haven't seen it in any other programming language. And C# is the first language to incorporate XML comment tags that can be used by the compiler to generate readable documentation directly from source code.
Another important concept is what I call "one-stop-shopping software." When you write code in C#, you write everything in one place. There is no need for header files, IDL files (Interface Definition Language), GUIDs and complicated interfaces. And once you can write code that is self-describing in this way, then you can start embedding your software, because it is a self-contained unit. Now you can slot it into ASP pages and you can host it in various environments where it just wasn't feasible before.
But going back to these key component concepts, there's been a lot of debate in the industry about whether languages should support properties or events. Sure, we can express these concepts by methods. We can have naming patterns like a "get" block or a "set" block that emulate the behavior of a property. We can have interfaces and adapters that implement an interface and forward to an object. It's all possible to do, just as it's possible to do object-oriented programming in C. It's just harder, and there's more housekeeping, and you end up having to do all this work in order to truly express your ideas. We just think the time is right for a language that makes it easier to create components. Developers are building software components these days. They're not building monolithic applications or monolithic class libraries. Everyone is building components that inherit from some base component provided by some hosting environment. These components override some methods and properties, and they handle some events, and put the components back in. It's key to have those concepts be first class.
Osborn: You gave an introduction to C# recently, and the first bullet on the first slide said, "The first component-oriented language in the C/C++ family."
Hejlsberg: Yes, it's one of my primary goals. We talk about how everything is an object, which is also very key. Languages like Smalltalk and Lisp have done this before, but at great cost. I think C# contains some pretty interesting innovations that make component development easier, such as its notions of boxing and unboxing. Boxing allows the value of any value type to be converted to an object, while unboxing allows the value of an object to be converted to a simple value type. It's not as though this hasn't happened before, but the way we've applied it to the language is pretty innovative.
We've tried not to take an "ivory tower" approach to engineering C# and the .Net framework. We can't afford to rewrite all of our software. The industry just can't afford it, especially now when we're moving on Internet time. You've got to leverage what you have, and so I think interoperability is just key. We focused hard on giving programmers all of the right solutions for interoperating with Internet standards, such as HTTP, HTML, XML, and with existing Microsoft technologies, so you don't fall off a cliff the minute you find that something isn't provided by the new .NET environment, or when you realize you want to leverage some existing API or component. You've seen all the COM interoperability that we have built into the language and into the common runtime; you've seen how you can just import existing DLLs [Dynamically Linked Libraries] using the DllImport attribute; and you've seen how even if that doesn't get you there, we have the notion of unsafe code. Unsafe code allows you to write inline C code with pointers, to do unsafe casts, and to pin down memory so it won't accidentally be garbage-collected.
Read more about the .NET platform in .NET vs. J2EE: How Do They Stack Up?. Jim Farley, author of Java Enterprise in a Nutshell, compares the two platforms.
There's been a lot of discussion about unsafe code, and people seem to think we're on drugs or something. I think it's a misunderstanding. Just because code is marked "unsafe" does not mean that it is unmanaged. Of course, we're not just throwing in unsafe pointers and leaving people vulnerable to downloading unsafe code over the Internet. Unsafe code is deeply tied into the security system. We give you the flexibility to stay within the managed code box and to get the job done without falling off the cliff and having to jump into a different language and a different programming model for native code. And by keeping you within the box, we can make the code much safer because the system understands what's going on. The fact that you write unsafe code doesn't actually mean that you're leaving the managed space. So your unsafe code becomes much more efficient.
Osborn: Tell me more about dealing with unsafe code in a managed environment.
Hejlsberg: Yes. One of the things that characterizes managed execution environments, like Smalltalk, Java, and the .NET common language runtime, is that they provide garbage collection, and to provide garbage collection, at least with modern garbage collectors -- your "mark and sweep" garbage collectors -- you need to understand more about the code that's executing than you do about traditional unmanaged code. In order to find the dead objects by exclusion, you need to be able to walk the stack, to chase down all the live roots, and to figure out which objects are alive and which ones weren't visited. However, in order to be able to do that, you need closer cooperation from the code you are executing. The code needs to be more descriptive. It needs to tell you how I am laying out the stack, where my local variables are located, and so on.
When you're writing unsafe code in C#, you have the ability to do things that aren't typesafe, like operate with pointers. The code, of course, gets marked unsafe, and will absolutely not execute in an untrusted environment. To get it to execute, you have to grant a trust, and if you don't, the code just won't run. In that respect, it's no different than other kinds of native code. The real difference is that it's still running within the managed space. The methods you write still have descriptive tables that tell you which objects are live, so you don't have to go across a marshalling boundary whenever you go into this code. Otherwise, when you go out to undescriptive, unmanaged code (like through the Java Native Interface, for example), you have to set a watermark or erect a barrier on the stack. You have to remarshall all the arguments out of the box. Once you're using objects, you have to be very careful about which ones you touch because the GC [Garbage Collector] is still running on a different thread. It might move the object if you haven't pinned it down correctly by using some obscure method to lock the object. If you forget to do that, you're just out of luck.
We've taken a different approach. We've said, "Let's integrate this into the language. Let's provide statements, like the fixed statement, that allow you to pin down the object cooperatively with the GC and integrate it." In that way we provide the best means of bringing all existing code forward, instead of just throwing it away. It's a different design form.
Osborn: So the memory that you're working with in unsafe code is in fact being watched by the garbage collector?
Hejlsberg: Yes, it is. But, caveat emptor, it's unsafe. You can obtain a pointer and you can do the wrong thing. But you can do that in native code, too.
Osborn: Another area of confusion, I think, is understanding where C# stops and the common runtime begins. What's the innovation in the C# language itself versus what it gets from the common runtime library?
Hejlsberg: Well, I think some of this confusion comes from the fact that when people talk about Java, they don't really know which is the language and which is the runtime. Some of this confusion arises when people talk about Java. Which is the language and which is the runtime? What do people mean when they say Java? Do they mean Java, the language, Java, the syntax, or do they mean Java, the platform? People lump these different aspects together. We've taken an approach that says we want to be a multilingual platform. We're going to build a platform that actually allows you to implement multiple programming languages and also have them share a common set of APIs (Application Programming Interfaces). Let's face it, some people like to program in COBOL, some people like to program in Basic, some like C++, and some will like C#, I hope. But we're not trying to tell you to forget everything you ever did. We're not saying, "Now that there's only one language, there shall be no further innovations in this race." We're saying that our industry advances by its flexibility. How did Java come about? It came about because there were programming languages before it and there will be programming languages after it. We want to build a platform where your preference for one language over another doesn't negate the whole value proposition. We want to create a platform where there can be innovation. Who's helping COBOL programmers today? Who's taking them to the Web? Only on the .NET platform can you embed Fujitsu COBOL in an ASP page. I mean it's truly revolutionary.
Osborn: Given the availability of multiple languages for the .NET platform, why would you choose C# over Visual Basic, C++, or even COBOL? What is it that makes C# so compelling?
Hejlsberg: First of all, with C# we were able to start with a clean sheet of paper, so to speak. We did not have any backward compatibility requirements, and that certainly made things simpler. And not just from an implementation standpoint, but also from a usage standpoint. For example, we only have one kind of class in C#, and it is always garbage-collected. Managed C++, on the other hand, has two because it has to preserve the non-garbage collected style of programming. So, C# simply has fewer concepts you have to understand.
Language is a funny thing: It's a matter of taste. Language is almost a religious thing, and it's a lifestyle choice for programmers. I mean, we realize that we can't walk out and say, "Here's a platform where you have one language base." Even if you could do everything in that platform with one language, some people may not like its syntax; they might like curly braces instead or some other block delimiter. That's what they're familiar with. That's what makes them feel at home and productive and enabled. And so our approach with C# has simply been to offer an alternative to C++ programmers who find that language too complicated and to Java programmers who miss certain features of C and C++ that were lost in the translation. We looked for ways to simplify C++ and then put the result on a multilingual platform that provides greater interoperability, and that gives you all of these component concepts, and so forth.
Goodhew: One of the interesting things that came out of our developer tracking study is that over 60 percent of all developers in the professional developer market use two or more languages to build their applications. And what that tells us, especially when we ask which tools programmers use, is that there isn't going to be one object-oriented programming language which is the end all and be all language that everyone will use. As Anders said earlier, people will want certain syntaxes for what they're doing or how they feel. It's a personal choice. And that's what the whole .NET platform is about, providing developers with a choice of languages in which to implement. I think we've done a pretty good job. You can basically do the same tasks in Visual Basic, .NET, and C#. Visual Basic is still viewed as more accessible to programmers. C# has more headroom and more power than VB does.
Osborn: Meaning that you can accomplish more with fewer statements in C#?
Hejlsberg: Well, meaning you have more power through the provision for unsafe code.
Osborn: So you can't write unsafe code in VB?
Hejlsberg: No, you cannot.
Goodhew: But basically, both languages can do the same thing. That's a fundamental change from where we were with Visual Studio 6. If you wanted to build a multi-threaded MTS object using Visual Studio 6.0 and you were a VB programmer, you couldn't. You'd have to use C++. Now, with the .NET framework, you can use whichever language you want.
Hejlsberg: It's the thing I talked about in my general session talk: The unification of programming models, which the .NET framework offers. In the evolution of languages and frameworks, we always seemed to end up marrying a programming language to a particular API and a particular form of programming. VB was about rapid application development and forms, MFC (Microsoft Foundation Classes) was about sub-classing, and ASP was about putting stuff in web pages. In each case, your choice of programming model always dictated your choice of programming language and your choice of available APIs. It added to your workload the burden of learning new languages and APIs every time you switched frameworks. We have really tried to unify all of that. We provide one API, one supporting visual design tool, and we give you the flexibility to choose whichever language works for you.
Osborn: I'm wondering what this does to the use of scripting languages such as VBScript and JScript?
Hejlsberg: One of the wonderful things the .NET framework has done for scripting languages is to make them compiled. Look at ASP+. Now, you're actually running real compiled code in your pages; it's not late-bound, dispatch look-ups where you don't see a runtime error until the user hits the page. ASP+ developers can use the full power of Visual Basic .NET instead of VBScript. And for the first time, they have the ability to use Perl, Python, and other popular languages if they so choose.
Hejlsberg: Yes, that's right.
Goodhew: The .NET framework allows scripting languages to be used as full-featured languages because they now have access to a true programming framework and to the same base-class APIs. You should look at what the guys who are doing the JScript implementation have accomplished. [Editors Note: JScript is the Microsoft implementation of the ECMA 262 language specification (ECMAScript Edition 3). With only a few minor exceptions (to maintain backward compatibility), JScript is a full implementation of the ECMA standard.] So the .NET platform provides a common language framework, which is a huge benefit to script writers.
Osborn: We've talked about Java, C++, and scripting. I have heard a number of people here at the PDC argue that there really is no difference between .NET IL (IL is the Microsoft Intermediate Language that all compilers must produce to run in the .NET framework) and the Java byte code that is consumed by the Java Virtual Machine (JVM). It's clear from the talks you've given that you do not agree. Would you care to comment further on the distinction?
Hejlsberg: Sure. First of all, the idea of ILs is a very old idea. You could trace the concept back to the UCSD Pascal p- machine (an early implementation of Pascal for personal computers) or to Smalltalk. p-code is used by Basic and Visual Basic. Parts of Word, internally, use a p-code engine because it's more compact. So, p-code is nothing new.
I think the approach we've taken with the IL is interesting in that we give you options to control when compilation -- or translation, if you will -- of the IL to native code occurs. With managed C++, you can actually generate native code directly from source. Managed C++ can also generate IL, as can C# and VB. And when you install your code we give you the option to compile it at that point; to compile the IL to native at that point, so that when you run it there's no just-in-time compiler overhead. We also give you the option of running and compiling code dynamically, just-in-time compilation. And, of course, having an IL gives you many advantages, such as the ability to move to different CPU architectures and to introduce verifiability in type safety and then build the security system on top of that.
I think one of the key differences between our IL design and Java byte code specifically, is that we made the decision up-front to not have interpreters. Our code will always run native. So, even when you produce IL, you are never running an interpreter. We even have different styles of JITs. For the compact framework, we have the EconoJIT, as we call it, which is a very simple JIT [Editor's Note: .NET Compact is a subset of the .NET framework designed to be ported to other devices and platforms.]. For the desktop version we have a more full-fledged JIT, and we even have JITs that use the same back end as our C++ compiler. However, those take longer so you would only use them at install time.
When you make the decision up-front to favor execution of native code over interpretation, you are making a decision that strongly influences design of the IL. It changes which instructions are included, what type information is included, and how it is conveyed. If you look at the two ILs, you'll notice that they're quite different. In a sense, our IL is type-neutral. There's no information in the instructions that specifies the type of the arguments. Rather, that is inferred by what's been pushed on the stack. This approach makes the IL more compact. A JIT compiler needs to have that information anyway, so there's no reason to carry it along in the instructions. So you end up with some different design decisions, which in turn makes it easier to translate IL into native code.
Osborn: What distinction needs to be made between interpretation and the approach that you're describing?
Hejlsberg: At the core of an interpreter is a loop that fetches some bytes out of a p-code stream, which then falls into a big switch statement that says, "Oh, this was an ADD instruction, so it goes over here, but this wasn't" -- and so forth.
An interpreter emulates a CPU. We turn it upside down and we do one pass -- we always do one pass -- where we convert the instructions into machine code. Now, that machine code, in the case of EconoJIT, is actually very simple in that it just builds a list of calls and push instructions, and calls to runtime helpers. Then it sets off on that list instead. And, of course, that code executes much faster than interpreted code.
Osborn: So, let me run through this: You're completely compiling the code. Then, when you're done, the bits are ready to run completely, though the point at which translation from IL to machine code occurs may vary.
Hejlsberg: Yes. But then we may, if it's in a memory-constrained environment on a small device, throw the code away after we've run it.
Osborn: Jumping back to the particulars of language syntax: I'm wondering whether C# includes built-in support for regular expressions. I didn't see that support in the language reference, but maybe it's somewhere else.
Hejlsberg: First of all, there's a regular expression class in the base-class libraries. We don't have any direct support in the language for regular expressions, but we do have some features that are actually very similar. It's not worth making a big deal about them, but, for example, we give you the ability to write verbatim string literals where you don't have to write two back slashes every time you want to specify one. It actually helps a whole bunch when you're writing regular expressions and when you're writing quotes within quotes. That's a little thing that helps, but clearly the core is in the .NET framework, which can be shared by all programming languages.
Osborn: There appears to be a difference in the way we should view namespaces in C# and Java. Are they conceptually the same or are they implemented differently.
Hejlsberg: Conceptually, yes, but they are implemented very differently. In Java, package names are also a physical thing that dictates the directory structure of your source code files. In C#, we have a complete separation between physical packaging and logical naming, so whatever you call your namespaces has nothing to do with the actual physical packaging of your code. That gives you a lot more flexibility to package things together in physical distribution units without forcing you to also have a bunch of directories. In the language itself, there are clearly some differences. In Java, the packaging is also your physical structure, and because of this a Java source file has to be in the right directory and can only contain one public type or one public class. Since C# does not have that sort of marriage between physical and logical, you can name your source files anything you want. Each source file can contribute to multiple namespaces and can take multiple public classes. Further, you can choose to write all of your sources in one big file if you like, or you can spread them across smaller files. Conceptually, what happens with C# at compilation is that you give the compiler all of the source files that make up your project and then it just goes off and figures out what to do.
Osborn: I have a question about generic programming: Do you think it is an important concept, one that ought to be part of an object-oriented language? And if so, what are your plans to make generic programming a part of C#.
Goodhew: Well, some of what we had hoped to include in the first release has been constrained because -- unlike what everyone believes about Microsoft -- we do not have unlimited resources. We had to make some hard decisions in terms of what is actually in this first release.
Osborn: How many people were involved in the development of C#?
Hejlsberg: The language design team consisted of four people. The compiler team had another five developers.
Petrusha: What about the framework?
Hejlsberg: That's much bigger. The whole company is involved.
Goodhew: In terms of the entire Visual Studio and .NET platform group, we are about a thousand-person division. That includes program management, developers, testers, all the build functions, the frameworks, the runtimes, the ASP programming models, and then all of the people like myself, the management overhead.
Hejlsberg: But with respect to the generics that you asked about, I definitely think generics are a very useful concept and you can certainly tell that from all the generics research that's taking place in academia and industry. Templates are one solution to the problem. In our internal discussions, we concluded that we wanted to do it right for this new platform. But what we would really like is to have generics understood by the underlying runtime. This is different from how some of the generic prototypes have been built. Take Java's notions of "erasure" where there's really no knowledge of generics in the system. By having the common language runtime understand the concept of generics, multiple languages can share the functionality. You can write a generic class in C# over in one place and someone else using a different language can use it.
But making generics part of the runtime also enables you to do certain things much more efficiently. Instantiation of generics should ideally happen at runtime. With C++, instantiation of templates happens at compile time, and then you have two options: you can either let your code bloat or you can try, in the linker, to get rid of some of the bloat. But, if you have multiple applications, you can forget about it. You're just going to get bloated code.
If you push the knowledge of generics into the common language runtime, then the runtime can understand that when an application or a component asks for a list of "Foo's," it should first ask: "Do I already have an instantiation of a list of "Foo?" If so, use that one. Indeed, if Foo is a reference type, and if we do the design right, we can share the instantiation for all reference types. For value types, such as ints and floats, and we can create one instantiation per value type. But only when an application asks for it. We've done a lot of the design work and groundwork necessary to add generics to the runtime.
It's interesting you asked earlier about the IL because deciding to add generics impacts the design of the IL. If the instructions in the IL embed type information -- if, for example, an Add instruction is not an Add, but is an Add int or an Add float or an Add double -- then you've baked the type into the instruction stream and the IL is not generic at that point. Our IL format is actually truly type neutral. And, by keeping it type neutral, we can add generics later and not get ourselves into trouble, at least not as much trouble. That's one of the reasons our IL looks different from Java byte code. We have type neutral IL. The Add instruction adds whatever the two things are on top of the stack. In a generic world, that could translate into different code when the generic is instantiated.
Osborn: Is that available to all .NET languages?
Hejlsberg: Yes. Microsoft Research in Cambridge has created a generics version of the common language runtime and the C# compiler. We're looking at how to move that forward right now. It's not going to happen in the first release, that much we know, but we are working on making sure that we do things right for the first release so that generics fit into the picture.
Osborn: What are the planned release dates for C#, the .NET framework, and the next version of Visual Studio?
Goodhew: Well, we've brought the technology preview to the 6,500 attendees here at PDC. We expect to go to beta sometime in the fall (2000), and then we'll release when ready. One of the really exciting things we've done is take a good solid look at how the Windows 2000 launch release went and the way we involved key customers in the joint development and the joint deployment process. In the case of .NET framework and Visual Studio .NET, we will again work with customers to determine when the final product is ready to release. We're going to let them tell us when the product is ready. And, because we've got real customers involved in the process, we should get a much better product in terms of quality. The downside of that is the process then becomes a little indeterminate. This is a fundamental change. We are looking to hit a quality bar for the release of the product rather than just pick an arbitrary date and say we'll ship.
Osborn: So, instead of a code completion date, we're looking at a "ready-to-go" date?
Goodhew: Yes, that's right. I think developers will find the release of Visual Studio .Net to be one of the highest quality releases in Microsoft's history.
Osborn: You have submitted C# to ECMA. Is standardization really a serious objective? Would you like C# to be available on other platforms?
Hejlsberg: Absolutely. It's certainly our objective to present C# to the industry as a possible standard, which is why we've submitted it to ECMA. We certainly hope to gain support in ECMA for a process that will lead to a commonly designed language which has a common language infrastructure. And what I mean by a common infrastructure is the core set of class libraries this specification entails, such that if other companies using other platforms implement it, they could reasonably expect to find those classes available to their programs.
Goodhew: I might point out that we're taking a true open standards approach with ECMA. When and if ECMA actually arrives at a standard for C# and a common language infrastructure, the result will be available under ECMA's copyright and licensing policies, which are truly open. Any customer, and any person, will be able to license the ECMA C# standard, subset it, superset it, and they won't have to pay royalties. They'll be able take it and go implement it on any platform or any device. We fully expect people to do that. That is something fundamentally different from our competitors who wandered around the standards bodies, looking for someone to rubber-stamp their proprietary languages.
John: One question I heard at breakfast and lunch was: "How is portability really possible on a platform that doesn't have Microsoft COM baked into its infrastructure?"
Hejlsberg: It's completely possible. COM is not a must for standardization of C# and a common language infrastructure. Not at all. C# has a class model that is completely rich, whereas COM is just another view of how applications can interoperate. But there's nothing in C# or the core common runtime that says there must be COM, GUIDS, HRESULTS, AddRefs, or Releases. There's none of that. The .NET common language runtime completely eliminates that. But it gives you great interoperability with COM, which I will continue forever to think is super important for the reason I gave earlier. But it's not a prerequisite at all.
Goodhew: I think some of those comments were inspired by the initial version of the language reference that we provided to the public. Microsoft wrote it in Microsoft meetings in which we were thinking primarily in terms of Microsoft platforms. As a result, we make references to things like COM and DLLs in the spec when really a DLL is a specific case of the more general problem of how to invoke native code on a given platform. One of the benefits of going to a standards organization and working with people like IBM, with whom we worked on the SOAP specification, is that we ensure we don't make any such references that tie us or lock us into something like the COM framework in future versions of the specification.
As Anders said, COM interop and COM support is critically important to us and to existing Microsoft customers. I think we've done a great job supporting COM on the .NET platform. But people in the industry have been reading too much into our use of the words COM and DLL. They conclude that the .NET platform is for Windows platforms only, and that's absolutely incorrect.
Hejlsberg: And I think that just as COM interop is important to Microsoft and to customers who are building solutions on Microsoft platforms, a standardization of C# and the common language infrastructure should allow for other implementations to add meaningful interop with any platform on which they choose to implement the language.
Osborn: So you won't insist on there being a "pure C#" or a "pure .NET" implementation?
Hejlsberg: What's "pure?" How many "pure" Java applications really exist? I venture to guess very, very few. That's what the numbers I'm seeing suggest. Let's face it, people need to leverage their existing code. It's just not possible to require companies to throw everything away.
Osborn: No, I haven't.
Goodhew: Roger got the relevant section out of the Enterprise JavaBeans [EJB] specification, which talks about permissible vendor extensions. Not surprisingly, the vendor extensions include things like transactions management, which is fairly important in building enterprise systems, as well as security, messaging, and more. In an article, Sessions lists roughly eleven areas of functionality, which are vendor specific implementations that are permitted. So, if you pick IBM Websphere as your EJB implementation, the code that you write for your EJB application will inevitably lock you into Websphere. This notion that Java is 100% pure and gives you 100% portability just isn't true. There's a great interview with James Gosling on IBM's developer works site in which he directly addresses this issue. He said, yeah, the whole write-once-run-anywhere, 100%-pure-thing was a really goofy idea, and was more of a marketing thing. He says, in effect, "We didn't think we'd ever be able to deliver all that, and basically we haven't." Here's the inventor of the language saying that neither purity nor portability exists.
Osborn: Have we missed any untold great features or innovations in C# that you'd like to talk about?
Hejlsberg: There's one thing about the whole .NET framework and, by implication, about C# as well, that I'd like to mention, and that's the shift in how you build distributed applications. Not so long ago we were building two-tier client server apps, and then object protocols such as CORBA, IIOP, RMI, and DCOM came along. That type of programming is really the underpinning of an EJB, which is implemented as either CORBA or RMI underneath. We have learned to build these strongly connected distributed systems, but they don't scale. They just don't scale on the Web because they're stateful, they hold state on the server, and you can't just roll in another machine, plug it into the farm and have the thing replicate itself.
When we first sat down to design the .NET framework we took a step back and looked at what's actually happening on the Web. It's becoming this loosely connected, very distributed world, and we tried to understand what that does to your underlying programming model. And so we designed from the ground up with the assumption in place that distributed apps are built in a loosely connected, stateless fashion that gives you great scalability. You just scale out. You roll in more racks and plug them in. And once you make that fundamental assumption, it changes everything. It changes how you design your basic services, how you design all your messaging, and it even changes how you design your UI. It's a new programming model. We have chosen to use XML and SOAP as the means to make this model work. They are deeply integrated into .Net and this integration is so core to every decision we have made in designing the .NET framework that it's not something you could just walk in and sprinkle on later.
Osborn: Could you point to one specific place where this would be evident to a programmer?
Hejlsberg: One fairly good example of this is how XML integrates with C#. We have this notion of "attributes" in C# that allows you to add declarative information to types and members. Just as you can say a member is public or private, you also want to be able to say this one's transacted, or this one's supposed to be a Web service, or this one is supposed to be serializable as XML. So we've added attributes to provide this generic mechanism, but then we utilize it in all of our Web services and XML infrastructure. We also give you the ability to put attributes on classes and on fields in your classes that say: "When this class goes to XML, it needs to become "this" tagname in XML and it needs to go into "this" XML namespace. You want to be able to say a specific field in one place becomes an element, and that another becomes an attribute. You also want to control the schema of the XML that goes out; control it where you're writing your class declaration, so that all of the additional declarative information is available. When attributes are properly used in this way to decorate your C# code, the system can simply turn a specific class into XML, send it over the wire, and when it comes back we can reconstitute the object on the other side. It's all done in one place. It's not like additional definition files or assorted infos and naming patterns. It's right there. It gives you statement completion when you build it in the IDE, and we can then provide you with higher-level tools that do the work for you.
I know I'm on a tangent here, but some of the infrastructure we provide is truly exciting. Simply because we have these attributes, you can ask our XML serialization infrastructure or our Web services infrastructure to translate any given class into XML. When you do, we'll actually take the schema for the class, the XSD schema, and we'll build a specialized parser that derives from our generic XML parser (which is part of the .NET base classes), and then override methods and add logic into the parser so that it is specialized for that schema. So we've instantiated a parser that at native code speed can rip through XML. If it's not correct, we'll give you a nice error message, which tells you precisely what went wrong. Then we cache it in our code-caching infrastructure, and it sits around until the next time a class with an identical schema comes by and it just goes, "Bam!" I mean, incredible, incredible throughput.
Osborn: So, there's a lot of really interesting engineering under the covers.
Hejlsberg: Yeah, and I just think that we're a generation ahead when it comes to the thinking in this space.
Osborn: Great. Thanks for your time.
Hejlsberg: You're welcome.
John Osborn is a senior editor with O'Reilly Media, Inc., responsible for Windows and .NET developer books, PDFs and other content.