1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

6 Limitations of generics in C# and other languages

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.69 MB, 424 trang )


Limitations of generics in C# and other languages



3.6.1



103



Lack of covariance and contravariance

In section 2.3.2, we looked at the covariance of arrays—the fact that an array of a reference type can be viewed as an array of its base type, or an array of any of the interfaces

it implements. Generics don’t support this—they are invariant. This is for the sake of

type safety, as we’ll see, but it can be annoying.

WHY DON’T GENERICS SUPPORT COVARIANCE?



Let’s suppose we have two classes, Animal and Cat, where Cat derives from Animal. In

the code that follows, the array code (on the left) is valid C# 2; the generic code (on

the right) isn’t:

Valid (at compile-time):



Invalid:



Animal[] animals = new Cat[5];

animals[0] = new Animal();



List animals=new List();

animals.Add(new Animal());



The compiler has no problem with the second line in either case, but the first line on

the right causes the error:

error CS0029: Cannot implicitly convert type

'System.Collections.Generic.List' to

'System.Collections.Generic.List'



This was a deliberate choice on the part of the framework and language designers. The

obvious question to ask is why this is prohibited—and the answer lies on the second

line. There is nothing about the second line that should raise any suspicion. After all,

List effectively has a method with the signature void Add(Animal value)—

you should be able to put a Turtle into any list of animals, for instance. However, the

actual object referred to by animals is a Cat[] (in the code on the left) or a List

(on the right), both of which require that only references to instances of Cat are stored

in them. Although the array version will compile, it will fail at execution time. This was

deemed by the designers of generics to be worse than failing at compile time, which is

reasonable—the whole point of static typing is to find out about errors before the code

ever gets run.

NOTE



So why are arrays covariant? Having answered the question about why

generics are invariant, the next obvious step is to question why arrays are

covariant. According to the Common Language Infrastructure Annotated

Standard (Addison-Wesley Professional, 2003), for the first edition the

designers wished to reach as broad an audience as possible, which included

being able to run code compiled from Java source. In other words, .NET has

covariant arrays because Java has covariant arrays—despite this being a

known “wart” in Java.



So, that’s why things are the way they are—but why should you care, and how can you

get around the restriction?



104



CHAPTER 3



Parameterized typing with generics



WHERE COVARIANCE WOULD BE USEFUL



Suppose you are implementing a platform-agnostic storage system,11 which could run

across WebDAV, NFS, Samba, NTFS, ReiserFS, files in a database, you name it. You may

have the idea of storage locations, which may contain sublocations (think of directories

containing files and more directories, for instance). You could have an interface like this:

public interface IStorageLocation

{

Stream OpenForRead();

...

IEnumerable GetSublocations();

}



That all seems reasonable and easy to implement. The problem comes when your

implementation (FabulousStorageLocation for instance) stores its list of sublocations for any particular location as List. You might

expect to be able to either return the list reference directly, or possibly call AsReadOnly to avoid clients tampering with your list, and return the result—but that would

be an implementation of IEnumerable instead of an

IEnumerable.

Here are some options:





















11

12



Make your list a List instead. This is likely to mean you need

to cast every time you fetch an entry in order to get at your implementationspecific behavior. You might as well not be using generics in the first place.

Implement GetSublocations using the funky new iteration features of C# 2, as

described in chapter 6. That happens to work in this example, because the

interface uses IEnumerable. It wouldn’t work if we had to

return an IList instead. It also requires each implementation to have the same kind of code. It’s only a few lines, but it’s still inelegant.

Create a new copy of the list, this time as List. In some

cases (particularly if the interface did require you to return an IList

), this would be a good thing to do anyway—it keeps the

list returned separate from the internal list. You could even use List.ConvertAll to do it in a single line. It involves copying everything in the list, though,

which may be an unnecessary expense if you trust your callers to use the

returned list reference appropriately.

Make the interface generic, with the type parameter representing the actual type

of storage sublocation being represented. For instance, FabulousStorageLocation might implement IStorageLocation.

It looks a little odd, but this recursive-looking use of generics can be quite useful

at times.12

Create a generic helper method (preferably in a common class library) that

converts IEnumerator to IEnumerator, where TSource

derives from TDest.



Yes, another one.

For instance, you might have a type parameter T with a constraint that any instance can be compared to another

instance of T for equality—in other words, something like MyClass where T : IEquatable.



Limitations of generics in C# and other languages



105



When you run into covariance issues, you may need to consider all of these options

and anything else you can think of. It depends heavily on the exact nature of the situation. Unfortunately, covariance isn’t the only problem we have to consider. There’s

also the matter of contravariance, which is like covariance in reverse.

WHERE CONTRAVARIANCE WOULD BE USEFUL



Contravariance feels slightly less intuitive than covariance, but it does make sense.

Where covariance is about declaring that we will return a more specific object from a

method than the interface requires us to, contravariance is about being willing to

accept a more general parameter.

For instance, suppose we had an IShape interface13 that contained the Area property. It’s easy to write an implementation of IComparer that sorts by area.

We’d then like to be able to write the following code:

IComparer areaComparer = new AreaComparer();

List circles = new List();

circles.Add(new Circle(20));

circles.Add(new Circle(10));

circles.Sort(areaComparer);



That won’t work, though, because the Sort method on List effectively takes

an IComparer. The fact that our AreaComparer can compare any shape

rather than just circles doesn’t impress the compiler at all. It considers IComparer

and IComparer to be completely different types. Maddening, isn’t

it? It would be nice if the Sort method had this signature instead:

void Sort(IComparer comparer) where T : S



Unfortunately, not only is that not the signature of Sort, but it can’t be—the constraint is invalid, because it’s a constraint on T instead of S. We want a derivation type

constraint but in the other direction, constraining the S to be somewhere up the

inheritance tree of T instead of down.

Given that this isn’t possible, what can we do? There are fewer options this time

than before. First, you could create a generic class with the following declaration:

ComparisonHelper : IComparer

where TDerived : TBase



You’d then create a constructor that takes (and stores) an IComparer as a

parameter. The implementation of IComparer would just return the result

of calling the Compare method of the IComparer. You could then sort the

List by creating a new ComparisonHelper that uses the

area comparison.

The second option is to make the area comparison class generic, with a derivation

constraint, so it can compare any two values of the same type, as long as that type

implements IShape. Of course, you can only do this when you’re able to change the

comparison class—but it’s a nice solution when it’s available.



13



You didn’t really expect to get through the whole book without seeing a shape-related example, did you?



106



CHAPTER 3



Parameterized typing with generics



Notice that the various options for both covariance and contravariance use more

generics and constraints to express the interface in a more general manner, or to provide generic “helper” methods. I know that adding a constraint makes it sound less

general, but the generality is added by first making the type or method generic. When

you run into a problem like this, adding a level of genericity somewhere with an

appropriate constraint should be the first option to consider. Generic methods (rather

than generic types) are often helpful here, as type inference can make the lack of variance invisible to the naked eye. This is particularly true in C# 3, which has stronger

type inference capabilities than C# 2.

NOTE



Is this really the best we can do?—As we’ll see later, Java supports covariance

and contravariance within its generics—so why can’t C#? Well, a lot of it

boils down to the implementation—the fact that the Java runtime

doesn’t get involved with generics; it’s basically a compile-time feature.

However, the CLR does support limited generic covariance and contravariance, just on interfaces and delegates. C# doesn’t expose this feature

(neither does VB.NET), and none of the framework libraries use it. The

C# compiler consumes covariant and contravariant interfaces as if they

were invariant. Adding variance is under consideration for C# 4,

although no firm commitments have been made. Eric Lippert has written

a whole series of blog posts about the general problem, and what might

happen in future versions of C#: http://

blogs.msdn.com/ericlippert/

archive/tags/Covariance+and+Contravariance/default.aspx.



This limitation is a very common cause of questions on C# discussion groups. The

remaining issues are either relatively academic or affect only a moderate subset of the

development community. The next one mostly affects those who do a lot of calculations (usually scientific or financial) in their work.



3.6.2



Lack of operator constraints or a “numeric” constraint

C# is not without its downside when it comes to heavily mathematical code. The need

to explicitly use the Math class for every operation beyond the simplest arithmetic and

the lack of C-style typedefs to allow the data representation used throughout a program to be easily changed have always been raised by the scientific community as barriers to C#’s adoption. Generics weren’t likely to fully solve either of those issues, but

there’s a common problem that stops generics from helping as much as they could

have. Consider this (illegal) generic method:

public T FindMean(IEnumerable data)

{

T sum = default(T);

int count = 0;

foreach (T datum in data)

{

sum += datum;

count++;

}



Limitations of generics in C# and other languages



107



return sum/count;

}



Obviously that could never work for all types of data—what could it mean to add one

Exception to another, for instance? Clearly a constraint of some kind is called for…

something that is able to express what we need to be able to do: add two instances of T

together, and divide a T by an integer. If that were available, even if it were limited to

built-in types, we could write generic algorithms that wouldn’t care whether they were

working on an int, a long, a double, a decimal, and so forth. Limiting it to the builtin types would have been disappointing but better than nothing. The ideal solution

would have to also allow user-defined types to act in a numeric capacity—so you could

define a Complex type to handle complex numbers, for instance. That complex number could then store each of its components in a generic way as well, so you could

have a Complex, a Complex, and so on.14

Two related solutions present themselves. One would be simply to allow constraints on operators, so you could write a set of constraints such as

where T : T operator+ (T,T), T operator/ (T, int)



This would require that T have the operations we need in the earlier code. The other

solution would be to define a few operators and perhaps conversions that must be supported in order for a type to meet the extra constraint—we could make it the

“numeric constraint” written where T : numeric.

One problem with both of these options is that they can’t be expressed as normal

interfaces, because operator overloading is performed with static members, which

can’t implement interfaces. It would require a certain amount of shoehorning, in

other words.

Various smart people (including Eric Gunnerson and Anders Hejlsberg, who

ought to be able to think of C# tricks if anyone can) have thought about this, and with

a bit of extra code, some solutions have been found. They’re slightly clumsy, but they

work. Unfortunately, due to current JIT optimization limitations, you have to pick

between pleasant syntax (x=y+z) that reads nicely but performs poorly, and a methodbased syntax (x=y.Add(z)) that performs without significant overhead but looks like a

dog’s dinner when you’ve got anything even moderately complicated going on.

The details are beyond the scope of this book, but are very clearly presented at

http:/

/www.lambda-computing.com/publications/articles/generics2/ in an article on

the matter.

The two limitations we’ve looked at so far have been quite practical—they’ve been

issues you may well run into during actual development. However, if you’re generally

curious like I am, you may also be asking yourself about other limitations that don’t

necessarily slow down development but are intellectual curiosities. In particular, just

why are generics limited to types and methods?

14



More mathematically minded readers might want to consider what a Complex> would

mean. You’re on your own there, I’m afraid.



108



3.6.3



CHAPTER 3



Parameterized typing with generics



Lack of generic properties, indexers, and other member types

We’ve seen generic types (classes, structs, delegates, and interfaces) and we’ve seen

generic methods. There are plenty of other members that could be parameterized.

However, there are no generic properties, indexers, operators, constructors, finalizers, or events. First let’s be clear about what we mean here: clearly an indexer can have

a return type that is a type parameter—List is an obvious example. KeyValuePair provides similar examples for properties. What you can’t have is

an indexer or property (or any of the other members in that list) with extra type

parameters. Leaving the possible syntax of declaration aside for the minute, let’s look

at how these members might have to be called:

SomeClass instance = new SomeClass("x");

int x = instance.SomeProperty;

byte y = instance.SomeIndexer["key"];

instance.Click += ByteHandler;

instance = instance + instance;



I hope you’ll agree that all of those look somewhat silly. Finalizers can’t even be called

explicitly from C# code, which is why there isn’t a line for them. The fact that we can’t

do any of these isn’t going to cause significant problems anywhere, as far as I can

see—it’s just worth being aware of it as an academic limitation.

The one exception to this is possibly the constructor. However, a static generic

method in the class is a good workaround for this, and the syntax with two lists of type

arguments is horrific.

These are by no means the only limitations of C# generics, but I believe they’re the

ones that you’re most likely to run up against, either in your daily work, in community

conversations, or when idly considering the feature as a whole. In our next two sections we’ll see how some aspects of these aren’t issues in the two languages whose features are most commonly compared with C#’s generics: C++ (with templates) and Java

(with generics as of Java 5). We’ll tackle C++ first.



3.6.4



Comparison with C++ templates

C++ templates are a bit like macros taken to an extreme level. They’re incredibly powerful, but have costs associated with them both in terms of code bloat and ease of

understanding.

When a template is used in C++, the code is compiled for that particular set of template arguments, as if the template arguments were in the source code. This means that

there’s not as much need for constraints, as the compiler will check whether you’re

allowed to do everything you want to with the type anyway while it’s compiling the code

for this particular set of template arguments. The C++ standards committee has recognized that constraints are still useful, though, and they will be present in C++0x (the

next version of C++) under the name of concepts.

The C++ compiler is smart enough to compile the code only once for any given set

of template arguments, but it isn’t able to share code in the way that the CLR does with



Limitations of generics in C# and other languages



109



reference types. That lack of sharing does have its benefits, though—it allows typespecific optimizations, such as inlining method calls for some type parameters but not

others, from the same template. It also means that overload resolution can be performed separately for each set of type parameters, rather than just once based solely

on the limited knowledge the C# compiler has due to any constraints present.

Don’t forget that with “normal” C++ there’s only one compilation involved, rather

than the “compile to IL” then “JIT compile to native code” model of .NET. A program

using a standard template in ten different ways will include the code ten times in a C++

program. A similar program in C# using a generic type from the framework in ten different ways won’t include the code for the generic type at all—it will refer to it, and the

JIT will compile as many different versions as required (as described in section 3.4.2) at

execution time.

One significant feature that C++ templates have over C# generics is that the template

arguments don’t have to be type names. Variable names, function names, and constant

expressions can be used as well. A common example of this is a buffer type that has the

size of the buffer as one of the template arguments—so a buffer will always

be a buffer of 20 integers, and a buffer will always be a buffer of 35 doubles.

This ability is crucial to template metaprogramming 15 —an15advanced C++ technique the

very idea of which scares me, but that can be very powerful in the hands of experts.

C++ templates are more flexible in other ways, too. They don’t suffer from the

problem described in 3.6.2, and there are a few other restrictions that don’t exist in

C++: you can derive a class from one of its type parameters, and you can specialize a

template for a particular set of type arguments. The latter ability allows the template

author to write general code to be used when there’s no more knowledge available

but specific (often highly optimized) code for particular types.

The same variance issues of .NET generics exist in C++ templates as well—an

example given by Bjarne Stroustrup16 is that there are no implicit conversions

between Vector and Vector with similar reasoning—in this case,

it might allow you to put a square peg in a round hole.

For further details of C++ templates, I recommend Stroustrup’s The C++

Programming Language (Addison-Wesley, 1991). It’s not always the easiest book to

follow, but the templates chapter is fairly clear (once you get your mind around C++

terminology and syntax). For more comparisons with .NET generics, look at the blog

post by the Visual C++ team on this topic: http:/

/blogs.msdn.com/branbray/

archive/2003/11/19/51023.aspx.

The other obvious language to compare with C# in terms of generics is Java, which

introduced the feature into the mainstream language for the 1.5 release,17 several

years after other projects had compilers for their Java-like languages.



15

16

17



http://

en.wikipedia.org/wiki/Template_metaprogramming

The inventor of C++.

Or 5.0, depending on which numbering system you use. Don’t get me started.



110



3.6.5



CHAPTER 3



Parameterized typing with generics



Comparison with Java generics

Where C++ includes more of the template in the generated code than C# does, Java

includes less. In fact, the Java runtime doesn’t know about generics at all. The Java

bytecode (roughly equivalent terminology to IL) for a generic type includes some

extra metadata to say that it’s generic, but after compilation the calling code doesn’t

have much to indicate that generics were involved at all—and certainly an instance of

a generic type only knows about the nongeneric side of itself. For example, an

instance of HashSet doesn’t know whether it was created as a HashSet or

a HashSet. The compiler effectively just adds casts where necessary and performs more sanity checking. Here’s an example—first the generic Java code:

ArrayList strings = new ArrayList();

strings.add("hello");

String entry = strings.get(0);

strings.add(new Object());



and now the equivalent nongeneric code:

ArrayList strings = new ArrayList();

strings.add("hello");

String entry = (String) strings.get(0);

strings.add(new Object());



They would generate the same Java bytecode, except for the last line—which is valid

in the nongeneric case but caught by the compiler as an error in the generic version.

You can use a generic type as a “raw” type, which is equivalent to using

java.lang.Object for each of the type arguments. This rewriting—and loss of information—is called type erasure. Java doesn’t have user-defined value types, but you can’t

even use the built-in ones as type arguments. Instead, you have to use the boxed version—ArrayList for a list of integers, for example.

You may be forgiven for thinking this is all a bit disappointing compared with

generics in C#, but there are some nice features of Java generics too:

















The runtime doesn’t know anything about generics, so you can use code compiled using generics on an older version, as long as you don’t use any classes or

methods that aren’t present on the old version. Versioning in .NET is much

stricter in general—you have to compile using the oldest environment you want

to run on. That’s safer, but less flexible.

You don’t need to learn a new set of classes to use Java generics—where a nongeneric developer would use ArrayList, a generic developer just uses ArrayList. Existing classes can reasonably easily be “upgraded” to generic versions.

The previous feature has been utilized quite effectively with the reflection system—java.lang.Class (the equivalent of System.Type) is generic, which

allows compile-time type safety to be extended to cover many situations involving reflection. In some other situations it’s a pain, however.

Java has support for covariance and contravariance using wildcards. For

instance, ArrayList can be read as “this is an ArrayList of

some type that derives from Base, but we don’t know which exact type.”



Summary



111



My personal opinion is that .NET generics are superior in almost every respect,

although every time I run into a covariance/contravariance issue I suddenly wish I

had wildcards. Java with generics is still much better than Java without generics, but

there are no performance benefits and the safety only applies at compile time. If

you’re interested in the details, they’re in the Java language specification, or you

could read Gilad Bracha’s excellent guide to them at http://

java.sun.com/j2se/1.5/

pdf/generics-tutorial.pdf.



3.7



Summary

Phew! It’s a good thing generics are simpler to use in reality than they are in description. Although they can get complicated, they’re widely regarded as the most important addition to C# 2 and are incredibly useful. The worst thing about writing code

using generics is that if you ever have to go back to C# 1, you’ll miss them terribly.

In this chapter I haven’t tried to cover absolutely every detail of what is and isn’t

allowed when using generics—that’s the job of the language specification, and it

makes for very dry reading. Instead, I’ve aimed for a practical approach, providing the

information you’ll need in everyday use, with a smattering of theory for the sake of

academic interest.

We’ve seen three main benefits to generics: compile-time type safety, performance,

and code expressiveness. Being able to get the IDE and compiler to validate your code

early is certainly a good thing, but it’s arguable that more is to be gained from tools providing intelligent options based on the types involved than the actual “safety” aspect.

Performance is improved most radically when it comes to value types, which no

longer need to be boxed and unboxed when they’re used in strongly typed generic

APIs, particularly the generic collection types provided in .NET 2.0. Performance with

reference types is usually improved but only slightly.

Your code is able to express its intention more clearly using generics—instead of a

comment or a long variable name required to describe exactly what types are

involved, the details of the type itself can do the work. Comments and variable names

can often become inaccurate over time, as they can be left alone when code is

changed—but the type information is “correct” by definition.

Generics aren’t capable of doing everything we might sometimes like them to do,

and we’ve studied some of their limitations in the chapter, but if you truly embrace

C# 2 and the generic types within the .NET 2.0 Framework, you’ll come across good

uses for them incredibly frequently in your code.

This topic will come up time and time again in future chapters, as other new features build on this key one. Indeed, the subject of our next chapter would be very

different without generics—we’re going to look at nullable types, as implemented

by Nullable.



Saying nothing

with nullable types



This chapter covers











Motivation for null values

Framework and runtime support

Language support in C# 2

Patterns using nullable types



Nullity is a concept that has provoked a certain amount of debate over the years. Is

a null reference a value, or the absence of a value? Is “nothing” a “something”? In

this chapter, I’ll try to stay more practical than philosophical. First we’ll look at why

there’s a problem in the first place—why you can’t set a value type variable to null

in C# 1 and what the traditional alternatives have been. After that I’ll introduce you

to our knight in shining armor—System.Nullable—before we see how C# 2

makes working with nullable types a bit simpler and more compact. Like generics,

nullable types sometimes have some uses beyond what you might expect, and we’ll

look at a few examples of these at the end of the chapter.

So, when is a value not a value? Let’s find out.



112



What do you do when you just don’t have a value?



4.1



113



What do you do when you just don’t have a value?

The C# and .NET designers don’t add features just for kicks. There has to be a real, significant problem to be fixed before they’ll go as far as changing C# as a language or

.NET at the platform level. In this case, the problem is best summed up in one of the

most frequently asked questions in C# and .NET discussion groups:

I need to set my DateTime1 variable to null, but the compiler won’t let me.

What should I do?

It’s a question that comes up fairly naturally—a simple example might be in an

e-commerce application where users are looking at their account history. If an order

has been placed but not delivered, there may be a purchase date but no dispatch

date—so how would you represent that in a type that is meant to provide the

order details?

The answer to the question is usually in two parts: first, why you can’t just use null

in the first place, and second, which options are available. Let’s look at the two parts separately—assuming that the developer asking the question is using C# 1.



4.1.1



Why value type variables can’t be null

As we saw in chapter 2, the value of a reference type variable is a reference, and the

value of a value type variable is the “real” value itself. A “normal” reference value is

some way of getting at an object, but null acts as a special value that means “I don’t

refer to any object.” If you want to think of references as being like URLs, null is (very

roughly speaking) the reference equivalent of about:blank. It’s represented as all

zeroes in memory (which is why it’s the default value for all reference types—clearing

a whole block of memory is cheap, so that’s the way objects are initialized), but it’s still

basically stored in the same way as other references. There’s no “extra bit” hidden

somewhere for each reference type variable. That means we can’t use the “all zeroes”

value for a “real” reference, but that’s OK —our memory is going to run out long

before we have that many live objects anyway.

The last sentence is the key to why null isn’t a valid value type value, though. Let’s

consider the byte type as a familiar one that is easy to think about. The value of a variable of type byte is stored in a single byte—it may be padded for alignment purposes,

but the value itself is conceptually only made up of one byte. We’ve got to be able to

store the values 0–255 in that variable; otherwise it’s useless for reading arbitrary

binary data. So, with the 256 “normal” values and one null value, we’d have to cope

with a total of 257 values, and there’s no way of squeezing that many values into a single byte. Now, the designers could have decided that every value type would have an

extra flag bit somewhere determining whether a value was null or a “real” value, but

the memory usage implications are horrible, not to mention the fact that we’d have to

check the flag every time we wanted to use the value. So in a nutshell, with value types



1



It’s almost always DateTime rather than any other value type. I’m not entirely sure why—it’s as if developers

inherently understand why a byte shouldn’t be null, but feel that dates are more “inherently nullable.”



Xem Thêm