6 Limitations of generics in C# and other languages

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.69 MB, 424 trang )

Limitations of generics in C# and other languages

3.6.1

103

Lack of covariance and contravariance

In section 2.3.2, we looked at the covariance of arrays—the fact that an array of a reference type can be viewed as an array of its base type, or an array of any of the interfaces

it implements. Generics don’t support this—they are invariant. This is for the sake of

type safety, as we’ll see, but it can be annoying.

WHY DON’T GENERICS SUPPORT COVARIANCE?

Let’s suppose we have two classes, Animal and Cat, where Cat derives from Animal. In

the code that follows, the array code (on the left) is valid C# 2; the generic code (on

the right) isn’t:

Valid (at compile-time):

Invalid:

Animal[] animals = new Cat[5];

animals[0] = new Animal();

List animals=new List();

animals.Add(new Animal());

The compiler has no problem with the second line in either case, but the first line on

the right causes the error:

error CS0029: Cannot implicitly convert type

'System.Collections.Generic.List' to

'System.Collections.Generic.List'

This was a deliberate choice on the part of the framework and language designers. The

obvious question to ask is why this is prohibited—and the answer lies on the second

line. There is nothing about the second line that should raise any suspicion. After all,

List effectively has a method with the signature void Add(Animal value)—

you should be able to put a Turtle into any list of animals, for instance. However, the

actual object referred to by animals is a Cat[] (in the code on the left) or a List

(on the right), both of which require that only references to instances of Cat are stored

in them. Although the array version will compile, it will fail at execution time. This was

deemed by the designers of generics to be worse than failing at compile time, which is

reasonable—the whole point of static typing is to find out about errors before the code

ever gets run.

NOTE

So why are arrays covariant? Having answered the question about why

generics are invariant, the next obvious step is to question why arrays are

covariant. According to the Common Language Infrastructure Annotated

Standard (Addison-Wesley Professional, 2003), for the first edition the

designers wished to reach as broad an audience as possible, which included

being able to run code compiled from Java source. In other words, .NET has

covariant arrays because Java has covariant arrays—despite this being a

known “wart” in Java.

So, that’s why things are the way they are—but why should you care, and how can you

get around the restriction?

104

CHAPTER 3

Parameterized typing with generics

WHERE COVARIANCE WOULD BE USEFUL

Suppose you are implementing a platform-agnostic storage system,11 which could run

across WebDAV, NFS, Samba, NTFS, ReiserFS, files in a database, you name it. You may

have the idea of storage locations, which may contain sublocations (think of directories

containing files and more directories, for instance). You could have an interface like this:

public interface IStorageLocation

{

Stream OpenForRead();

...

IEnumerable GetSublocations();

}

That all seems reasonable and easy to implement. The problem comes when your

implementation (FabulousStorageLocation for instance) stores its list of sublocations for any particular location as List. You might

expect to be able to either return the list reference directly, or possibly call AsReadOnly to avoid clients tampering with your list, and return the result—but that would

be an implementation of IEnumerable instead of an

IEnumerable.

Here are some options:

■

■

■

■

■

11

12

Make your list a List instead. This is likely to mean you need

to cast every time you fetch an entry in order to get at your implementationspecific behavior. You might as well not be using generics in the first place.

Implement GetSublocations using the funky new iteration features of C# 2, as

described in chapter 6. That happens to work in this example, because the

interface uses IEnumerable. It wouldn’t work if we had to

return an IList instead. It also requires each implementation to have the same kind of code. It’s only a few lines, but it’s still inelegant.

Create a new copy of the list, this time as List. In some

cases (particularly if the interface did require you to return an IList

), this would be a good thing to do anyway—it keeps the

list returned separate from the internal list. You could even use List.ConvertAll to do it in a single line. It involves copying everything in the list, though,

which may be an unnecessary expense if you trust your callers to use the

returned list reference appropriately.

Make the interface generic, with the type parameter representing the actual type

of storage sublocation being represented. For instance, FabulousStorageLocation might implement IStorageLocation.

It looks a little odd, but this recursive-looking use of generics can be quite useful

at times.12

Create a generic helper method (preferably in a common class library) that

converts IEnumerator to IEnumerator, where TSource

derives from TDest.

Yes, another one.

For instance, you might have a type parameter T with a constraint that any instance can be compared to another

instance of T for equality—in other words, something like MyClass where T : IEquatable.

Limitations of generics in C# and other languages

105

When you run into covariance issues, you may need to consider all of these options

and anything else you can think of. It depends heavily on the exact nature of the situation. Unfortunately, covariance isn’t the only problem we have to consider. There’s

also the matter of contravariance, which is like covariance in reverse.

WHERE CONTRAVARIANCE WOULD BE USEFUL

Contravariance feels slightly less intuitive than covariance, but it does make sense.

Where covariance is about declaring that we will return a more specific object from a

method than the interface requires us to, contravariance is about being willing to

accept a more general parameter.

For instance, suppose we had an IShape interface13 that contained the Area property. It’s easy to write an implementation of IComparer that sorts by area.

We’d then like to be able to write the following code:

IComparer areaComparer = new AreaComparer();

List circles = new List();

circles.Add(new Circle(20));

circles.Add(new Circle(10));

circles.Sort(areaComparer);

That won’t work, though, because the Sort method on List effectively takes

an IComparer. The fact that our AreaComparer can compare any shape

rather than just circles doesn’t impress the compiler at all. It considers IComparer

and IComparer to be completely different types. Maddening, isn’t

it? It would be nice if the Sort method had this signature instead:

void Sort(IComparer comparer) where T : S

Unfortunately, not only is that not the signature of Sort, but it can’t be—the constraint is invalid, because it’s a constraint on T instead of S. We want a derivation type

constraint but in the other direction, constraining the S to be somewhere up the

inheritance tree of T instead of down.

Given that this isn’t possible, what can we do? There are fewer options this time

than before. First, you could create a generic class with the following declaration:

ComparisonHelper : IComparer

where TDerived : TBase

You’d then create a constructor that takes (and stores) an IComparer as a

parameter. The implementation of IComparer would just return the result

of calling the Compare method of the IComparer. You could then sort the

List by creating a new ComparisonHelper that uses the

area comparison.

The second option is to make the area comparison class generic, with a derivation

constraint, so it can compare any two values of the same type, as long as that type

implements IShape. Of course, you can only do this when you’re able to change the

comparison class—but it’s a nice solution when it’s available.

13

You didn’t really expect to get through the whole book without seeing a shape-related example, did you?

106

CHAPTER 3

Parameterized typing with generics

Notice that the various options for both covariance and contravariance use more

generics and constraints to express the interface in a more general manner, or to provide generic “helper” methods. I know that adding a constraint makes it sound less

general, but the generality is added by first making the type or method generic. When

you run into a problem like this, adding a level of genericity somewhere with an

appropriate constraint should be the first option to consider. Generic methods (rather

than generic types) are often helpful here, as type inference can make the lack of variance invisible to the naked eye. This is particularly true in C# 3, which has stronger

type inference capabilities than C# 2.

NOTE

Is this really the best we can do?—As we’ll see later, Java supports covariance

and contravariance within its generics—so why can’t C#? Well, a lot of it

boils down to the implementation—the fact that the Java runtime

doesn’t get involved with generics; it’s basically a compile-time feature.

However, the CLR does support limited generic covariance and contravariance, just on interfaces and delegates. C# doesn’t expose this feature

(neither does VB.NET), and none of the framework libraries use it. The

C# compiler consumes covariant and contravariant interfaces as if they

were invariant. Adding variance is under consideration for C# 4,

although no firm commitments have been made. Eric Lippert has written

a whole series of blog posts about the general problem, and what might

happen in future versions of C#: http://

blogs.msdn.com/ericlippert/

archive/tags/Covariance+and+Contravariance/default.aspx.

This limitation is a very common cause of questions on C# discussion groups. The

remaining issues are either relatively academic or affect only a moderate subset of the

development community. The next one mostly affects those who do a lot of calculations (usually scientific or financial) in their work.

3.6.2

Lack of operator constraints or a “numeric” constraint

C# is not without its downside when it comes to heavily mathematical code. The need

to explicitly use the Math class for every operation beyond the simplest arithmetic and

the lack of C-style typedefs to allow the data representation used throughout a program to be easily changed have always been raised by the scientific community as barriers to C#’s adoption. Generics weren’t likely to fully solve either of those issues, but

there’s a common problem that stops generics from helping as much as they could

have. Consider this (illegal) generic method:

public T FindMean(IEnumerable data)

{

T sum = default(T);

int count = 0;

foreach (T datum in data)

{

sum += datum;

count++;

}

Limitations of generics in C# and other languages

107

return sum/count;

}

Obviously that could never work for all types of data—what could it mean to add one

Exception to another, for instance? Clearly a constraint of some kind is called for…

something that is able to express what we need to be able to do: add two instances of T

together, and divide a T by an integer. If that were available, even if it were limited to

built-in types, we could write generic algorithms that wouldn’t care whether they were

working on an int, a long, a double, a decimal, and so forth. Limiting it to the builtin types would have been disappointing but better than nothing. The ideal solution

would have to also allow user-defined types to act in a numeric capacity—so you could

define a Complex type to handle complex numbers, for instance. That complex number could then store each of its components in a generic way as well, so you could

have a Complex, a Complex, and so on.14

Two related solutions present themselves. One would be simply to allow constraints on operators, so you could write a set of constraints such as

where T : T operator+ (T,T), T operator/ (T, int)

This would require that T have the operations we need in the earlier code. The other

solution would be to define a few operators and perhaps conversions that must be supported in order for a type to meet the extra constraint—we could make it the

“numeric constraint” written where T : numeric.

One problem with both of these options is that they can’t be expressed as normal

interfaces, because operator overloading is performed with static members, which

can’t implement interfaces. It would require a certain amount of shoehorning, in

other words.

Various smart people (including Eric Gunnerson and Anders Hejlsberg, who

ought to be able to think of C# tricks if anyone can) have thought about this, and with

a bit of extra code, some solutions have been found. They’re slightly clumsy, but they

work. Unfortunately, due to current JIT optimization limitations, you have to pick

between pleasant syntax (x=y+z) that reads nicely but performs poorly, and a methodbased syntax (x=y.Add(z)) that performs without significant overhead but looks like a

dog’s dinner when you’ve got anything even moderately complicated going on.

The details are beyond the scope of this book, but are very clearly presented at

http:/

/www.lambda-computing.com/publications/articles/generics2/ in an article on

the matter.

The two limitations we’ve looked at so far have been quite practical—they’ve been

issues you may well run into during actual development. However, if you’re generally

curious like I am, you may also be asking yourself about other limitations that don’t

necessarily slow down development but are intellectual curiosities. In particular, just

why are generics limited to types and methods?

14

More mathematically minded readers might want to consider what a Complex> would

mean. You’re on your own there, I’m afraid.

108

3.6.3

CHAPTER 3

Parameterized typing with generics

Lack of generic properties, indexers, and other member types

We’ve seen generic types (classes, structs, delegates, and interfaces) and we’ve seen

generic methods. There are plenty of other members that could be parameterized.

However, there are no generic properties, indexers, operators, constructors, finalizers, or events. First let’s be clear about what we mean here: clearly an indexer can have

a return type that is a type parameter—List is an obvious example. KeyValuePair provides similar examples for properties. What you can’t have is

an indexer or property (or any of the other members in that list) with extra type

parameters. Leaving the possible syntax of declaration aside for the minute, let’s look

at how these members might have to be called:

SomeClass instance = new SomeClass("x");

int x = instance.SomeProperty;

byte y = instance.SomeIndexer["key"];

instance.Click += ByteHandler;

instance = instance + instance;

I hope you’ll agree that all of those look somewhat silly. Finalizers can’t even be called

explicitly from C# code, which is why there isn’t a line for them. The fact that we can’t

do any of these isn’t going to cause significant problems anywhere, as far as I can

see—it’s just worth being aware of it as an academic limitation.

The one exception to this is possibly the constructor. However, a static generic

method in the class is a good workaround for this, and the syntax with two lists of type

arguments is horrific.

These are by no means the only limitations of C# generics, but I believe they’re the

ones that you’re most likely to run up against, either in your daily work, in community

conversations, or when idly considering the feature as a whole. In our next two sections we’ll see how some aspects of these aren’t issues in the two languages whose features are most commonly compared with C#’s generics: C++ (with templates) and Java

(with generics as of Java 5). We’ll tackle C++ first.

3.6.4

Comparison with C++ templates

C++ templates are a bit like macros taken to an extreme level. They’re incredibly powerful, but have costs associated with them both in terms of code bloat and ease of

understanding.

When a template is used in C++, the code is compiled for that particular set of template arguments, as if the template arguments were in the source code. This means that

there’s not as much need for constraints, as the compiler will check whether you’re

allowed to do everything you want to with the type anyway while it’s compiling the code

for this particular set of template arguments. The C++ standards committee has recognized that constraints are still useful, though, and they will be present in C++0x (the

next version of C++) under the name of concepts.

The C++ compiler is smart enough to compile the code only once for any given set

of template arguments, but it isn’t able to share code in the way that the CLR does with

Limitations of generics in C# and other languages

109

reference types. That lack of sharing does have its benefits, though—it allows typespecific optimizations, such as inlining method calls for some type parameters but not

others, from the same template. It also means that overload resolution can be performed separately for each set of type parameters, rather than just once based solely

on the limited knowledge the C# compiler has due to any constraints present.

Don’t forget that with “normal” C++ there’s only one compilation involved, rather

than the “compile to IL” then “JIT compile to native code” model of .NET. A program

using a standard template in ten different ways will include the code ten times in a C++

program. A similar program in C# using a generic type from the framework in ten different ways won’t include the code for the generic type at all—it will refer to it, and the

JIT will compile as many different versions as required (as described in section 3.4.2) at

execution time.

One significant feature that C++ templates have over C# generics is that the template

arguments don’t have to be type names. Variable names, function names, and constant

expressions can be used as well. A common example of this is a buffer type that has the

size of the buffer as one of the template arguments—so a buffer will always

be a buffer of 20 integers, and a buffer will always be a buffer of 35 doubles.

This ability is crucial to template metaprogramming 15 —an15advanced C++ technique the

very idea of which scares me, but that can be very powerful in the hands of experts.

C++ templates are more flexible in other ways, too. They don’t suffer from the

problem described in 3.6.2, and there are a few other restrictions that don’t exist in

C++: you can derive a class from one of its type parameters, and you can specialize a

template for a particular set of type arguments. The latter ability allows the template

author to write general code to be used when there’s no more knowledge available

but specific (often highly optimized) code for particular types.

The same variance issues of .NET generics exist in C++ templates as well—an

example given by Bjarne Stroustrup16 is that there are no implicit conversions

between Vector and Vector with similar reasoning—in this case,

it might allow you to put a square peg in a round hole.

For further details of C++ templates, I recommend Stroustrup’s The C++

Programming Language (Addison-Wesley, 1991). It’s not always the easiest book to

follow, but the templates chapter is fairly clear (once you get your mind around C++

terminology and syntax). For more comparisons with .NET generics, look at the blog

post by the Visual C++ team on this topic: http:/

/blogs.msdn.com/branbray/

archive/2003/11/19/51023.aspx.

The other obvious language to compare with C# in terms of generics is Java, which

introduced the feature into the mainstream language for the 1.5 release,17 several

years after other projects had compilers for their Java-like languages.

15

16

17

http://

en.wikipedia.org/wiki/Template_metaprogramming

The inventor of C++.

Or 5.0, depending on which numbering system you use. Don’t get me started.

110

3.6.5

CHAPTER 3

Parameterized typing with generics

Comparison with Java generics

Where C++ includes more of the template in the generated code than C# does, Java

includes less. In fact, the Java runtime doesn’t know about generics at all. The Java

bytecode (roughly equivalent terminology to IL) for a generic type includes some

extra metadata to say that it’s generic, but after compilation the calling code doesn’t

have much to indicate that generics were involved at all—and certainly an instance of

a generic type only knows about the nongeneric side of itself. For example, an

instance of HashSet doesn’t know whether it was created as a HashSet or

a HashSet

6 Limitations of generics in C# and other languages

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về