C# and LINQ
Osborn: It took me two presentations to get it, but I realized that there's a set of innovations in C# that are interesting to look at in their own right, and [that the LINQ project] is really a layer on top of it. [We touched on that earlier, but] could you talk a little bit more about [the C# 3.0 language extensions that make LINQ possible]?
Hejlsberg: Yes, I think you do hit a good point there that is quite subtle. There are many ways we could've gone about this. And you can speculate about, let's say that we all agree, gosh, it would be great to have query inside your programming language. We're not the first to have this idea. So what is it that's different about our approach from those that have gone before?
If you compare it to, for example, embedded SQL or SQL/J or whatever, those extensions also give you the ability to put a query inside the programming language. But really what that approach is doing is just hosting one language within another. [SQL/J] is just hosting SQL in the middle of Java; embedded SQL is just hosting SQL in the middle of C, you know what I mean? With some escapes, that take you from one world to the other, and some fairly crude binding mechanisms that go between the two worlds.
So [neither approach] really gives first-class treatment to the query language. It just puts one language within another. To me, a better approach is to identify and understand what it is--expressively--that is missing from, say C#, in order for it to be rich enough that it could itself be the query language. And then isolate those features and add them to the language. But also, add them to the language in a way that does not bind us to a particular technology.
I mean, I think, and as I said in my talk yesterday, [that would be the case] if we were to say that C# only works with SQL Server 2000 or whatever, for queries. That'd be the death of C#. Not to imply that there's anything wrong with SQL 2000, of course, but as a programming language, you have to be one step removed from that. You need to always think about the class of problem, not the particular instance of the problem, right? Customers always come to you and say here's the instance I want [solved]. And I always try to think about, "Well, what is the class they're talking about?" You know what I'm saying? Effectively, the big difference in approach with C# 3.0 and the LINQ project is that there is complete separation between the features that we added to the language, and the particular instances of use of those features like DLinq [for relational databases] and XLinq [for XML documents] and the standard query operators [for objects in memory].
And anyone could go write a different set of APIs if they feel that we didn't quite do it the best way, and all of the features in C# 3.0 are still relevant. So they stand on their own merit. But the synergy of the two, of course, is what drives the whole thing.
Osborn: I was hearing, in the C# presentation, a lot of "oohs" and "ahs" about extension methods, and some of the other innovations that you were going through. And [the extensions getting the accolades] didn't have anything to do with data per se.
Hejlsberg: Right. There are deep reasons for what we're doing, but they have other good uses. Take lambda expressions, for example. The ability to create expression trees out of lambda expressions got lots of interesting uses in event notification systems where you want to give a trigger, which is really a predicate that you want to give to the system in some form, that the system can reason about it and so forth.
Rules engines, constraint systems, there are all these other things that you can do with this stuff; we're just looking at the tip of the iceberg here. I think there's a whole bunch of stuff. In many ways, that's why so many people are interested in functional programming languages and why academia has so much energy going into that; there is a whole class of problems that all of a sudden become relevant once you have these capabilities in the programming language.
Osborn: One of the things you seemed to be saying in the presentations was that the approach that's built into C# has a lot of advantages over just plain SQL type queries in terms of memory use and that sort of thing. Is that worth talking about?
Hejlsberg: Well, I think I was saying a couple of things there. First of all, SQL obviously only works for relational data, where the stuff we're doing here is much broader. It's a query language that works on objects, and those objects can represent relational data, or they can represent just in memory instances of a class, or they can represent XML.
So in that sense, it's broader. In terms of the memory efficiencies of it, I think at that point I was talking about the XLinq API. You can think of XLinq as doing two things. It brings language-integrated query to XML, and it takes advantage of the ten years of experience that we have with the XML DOM, which is the foremost XML API that people use today in their applications.
And there are a bunch of design decisions that were made in the XML DOM that at the time seemed relevant, and then over time we realized that these are not important things to have. But unfortunately, because we have to have this particular feature, memory consumption doubles. It's like, you know, the ability to store unparsed entities in attributes; I know that this sounds like complete geek speak in XML terms, but it's like this completely esoteric feature that you need .001 percent of the time.
Yet the fact that you have to be able to represent it deeply affects how you internally structure your attribute storage. You can't just store an attribute as a string, because it now has to be a list of content elements and so forth, and stuff like that adds a bunch of overhead that we all pay for every day even though we never use it.
So we made some judicious choices about, you know, now that we have all this relevant information about how people use XML to deeply shape the data structures that underlie the XLinq API. And we get some big benefits from that. I was saying between 30 and 50 percent savings in memory consumption. And typically, you can equate that to better performance, too, because hey, you use less memory, you're going to perform better as a rule of thumb.