Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.14 MB, 519 trang )
Modifying Ruby’s core classes and modules
387
It may be tempting to do something like this, in order to avoid the error:
class Regexp
alias __old_match__ match
def match(string)
__old_match__(string) || []
end
end
B
This code first sets up an alias for match, courtesy of the alias keyword B. Then, the
code redefines match. The new match hooks into the original version of match
(through the alias) and then returns either the result of calling the original version or
(if that call returns nil) an empty array.
NOTE
An alias is a synonym for a method name. Calling a method by an alias
doesn’t involve any change of behavior or any alteration of the methodlookup process. The choice of alias name in the previous example is
based on a fairly conventional formula: the addition of the word old plus
the leading and trailing underscores. (A case could be made that the formula is too conventional and that you should create names that are less
likely to be chosen by other overriders who also know the convention!)
You can now do this:
/abc/.match("X")[1]
Even though the match fails, the program won’t blow up, because the failed match
now returns an empty array rather than nil. The worst you can do with the new match
is try to index an empty array, which is legal. (The result of the index operation will be
nil, but at least you’re not trying to index nil.)
The problem is that the person using your code may depend on the match operation to return nil on failure:
if regexp.match(string)
do something
else
do something else
end
Because an array (even an empty one) is true, whereas nil is false, returning an array
for a failed match operation means that the true/false test (as embodied in an if/
else statement) always returns true.
Maybe changing Regexp#match so as not to return nil on failure is something your
instincts would tell you not to do anyway. And no one advocates doing it; it’s more that
some new Ruby users don’t connect the dots and therefore don’t see that changing a
core method in one place changes it everywhere.
Another common example, and one that’s a little more subtle (both as to what it
does and as to why it’s not a good idea), involves the String#gsub! method.
Licensed to sam kaplan
388
CHAPTER 13
Object individuation
THE RETURN VALUE OF STRING#GSUB! AND WHY IT SHOULD STAY THAT WAY
As you’ll recall, String#gsub! does a global replace operation on its receiver, saving
the changes in the original object:
>>
=>
>>
=>
>>
=>
string = "Hello there!"
"Hello there!"
string.gsub!(/e/, "E")
"HEllo thErE!"
string
"HEllo thErE!"
B
C
As you can see, the return value of the call to gsub! is the string object with the
changes made B. (And examining the object again via the variable string confirms
that the changes are indeed permanent C.)
Interestingly, though, something different happens when the gsub! operation
doesn’t result in any changes to the string:
>>
=>
>>
=>
>>
=>
string = "Hello there!"
"Hello there!"
string.gsub!(/zzz/, "xxx")
nil
string
"Hello there!"
There’s no match on /zzz/, so the string isn’t changed—and the return value of the
call to gsub! is nil.
Like the nil return from a match operation, the nil return from gsub! has the
potential to make things blow up when you’d rather they didn’t. Specifically, it means
you can’t use gsub! reliably in a chain of methods:
>>
=>
>>
=>
string = "Hello there!"
"Hello there!"
string.gsub!(/e/, "E").reverse!
"!ErEht ollEH"
B
C
>> string = "Hello there!"
=> "Hello there!"
>> string.gsub!(/zzz/, "xxx").reverse!
NoMethodError: undefined method `reverse!' for nil:NilClass
D
This example does something similar (but not quite the same) twice. The first time
through, the chained calls to gsub! and reverse! B return the newly gsub!’d and
reversed string C. But the second time, the chain of calls results in a fatal error D:
the gsub! call didn’t change the string, so it returned nil—which means we called
reverse! on nil rather than on a string.
One possible way of handling the inconvenience of having to work around the nil
return from gsub! is to take the view that it’s not usually appropriate to chain method
calls together too much anyway. And you can always avoid chain-related problems if
you don’t chain:
Licensed to sam kaplan
Modifying Ruby’s core classes and modules
389
The tap method
The tap method (callable on any object) performs the somewhat odd but potentially
useful task of executing a code block, yielding the receiver to the block, and returning
the receiver. It’s easier to show this than to describe it:
>> "Hello".tap {|string| puts string.upcase }.reverse
HELLO
=> "olleH"
Called on the receiver “Hello”, the tap method yields that string back to its code
block, as confirmed by the printing out of the uppercased version of the string. Then,
tap returns the entire string—so the reverse operation is performed on the string. If
you call gsub! on a string inside a tap block, it doesn’t matter whether it returns nil,
because tap returns the string. Be careful, though. Using tap to circumvent the nil
return of gsub! (or of other similarly behaving bang methods) can introduce complexities of its own, especially if you do multiple chaining where some methods perform
in-place operations and others return object copies.
>>
=>
>>
=>
>>
=>
string = "Hello there!"
"Hello there!"
string.gsub!(/zzz/, "xxx")
nil
string.reverse!
"!ereht olleH"
Still, a number of Ruby users have been bitten by the nil return value, either because
they expected gsub! to behave like gsub (the non-bang version, which always returns
its receiver, whether there’s been a change or not) or because they didn’t anticipate a
case where the string wouldn’t change. So gsub! and its nil return value became a
popular candidate for change.
The change can be accomplished like this:
class String
alias __old_gsub_bang__ gsub!
def gsub!(*args, &block)
__old_gsub_bang__(*args, &block)
self
end
end
First, the original gsub! gets an alias; that will enable us to call the original version
from inside the new version. The new gsub! takes any number of arguments (the
arguments themselves don’t matter; we’ll pass them along to the old gsub!) and a
code block, which will be captured in the variable block. If no block is supplied—and
gsub! can be called with or without a block—block is nil.
Now, we call the old version of gsub!, passing it the arguments and reusing the
code block. Finally, the new gsub! does the thing it’s being written to do: it returns
self (the string), regardless of whether the call to __old_gsub_bang__ returned the
string or nil.
Licensed to sam kaplan
390
CHAPTER 13
Object individuation
And now, the reasons not to do this.
Changing gsub! this way is probably less likely, as a matter of statistics, to get you in
trouble than changing Regexp#match is. Still, it’s possible that someone might write
code that depends on the documented behavior of gsub!, in particular on the returning of nil when the string doesn’t change. Here’s an example—and although it’s contrived (as most examples of this scenario are bound to be), it’s valid Ruby and
dependent on the documented behavior of gsub!:
>> states = { "NY" => "New York", "NJ" => "New Jersey",
"ME" => "Maine" }
=> {"NY"=>"New York", "NJ"=>"New Jersey", "ME"=>"Maine"}
>> string = "Eastern states include NY, NJ, and ME."
=> "Eastern states include NY, NJ, and ME."
>> if string.gsub!(/\b([A-Z]{2})\b/) { states[$1] }
>> puts "Substitution occurred"
>> else
?> puts "String unchanged"
>> end
Substitution occurred
B
C
D
E
We start with a hash of state abbreviations and full names B. Then comes a string that
uses state abbreviations C. The goal is to replace the abbreviations with the full
names, using a gsub! operation that captures any two consecutive uppercase letters
surrounded by word boundaries (\b) and replaces them with the value from the hash
corresponding to the two-letter substring D. Along the way, we take note of whether
any such replacements are made. If any are, gsub returns the new version of string. If
no substitutions are made, gsub! returns nil. The result of the process is printed out
at the end E.
The damage here is relatively light, but the lesson is clear: don’t change the documented behavior of core Ruby methods. Here’s another version of the states-hash
example, using sub! rather than gsub!. In this version, failure to return nil when the
string doesn’t change triggers an infinite loop. Assuming we have the states hash and
the original version of string, we can do a one-at-a-time substitution where each substitution is reported:
>> while string.sub!(/\b([A-Z]{2})\b/) { states[$1] }
>> puts "Replacing #{$1} with #{states[$1]}..."
>> end
Replacing NY with New York...
Replacing NJ with New Jersey...
Replacing ME with Maine...
If string.sub! always returns a non-nil value (a string), then the while condition
will never fail, and the loop will execute forever.
What you should not do, then, is rewrite core methods so that they don’t do what
others expect them to do. There’s no exception to this. It’s something you should
never do, even though you can.
That leaves us with the question of how to change Ruby core functionality
safely. We’ll look at three techniques that you can consider: additive change, hook
Licensed to sam kaplan
Modifying Ruby’s core classes and modules
391
or pass-through change, and per-object change. Only one of them is truly safe,
although all three are safe enough to use in many circumstances.
Along the way, we’ll look at custom-made examples as well as some examples from
the ActiveSupport library. ActiveSupport provides good examples of the first two
kinds of core change: additive and pass-through. We’ll start with additive.
13.2.2 Additive changes
The most common category of changes to built-in Ruby classes is the additive change:
adding a method that doesn’t exist. The benefit of additive change is that it doesn’t
clobber existing Ruby methods. The danger inherent in it is that if two programmers
write added methods with the same name, and both get included into the interpreter
during execution of a particular library or program, one of the two will clobber the
other. There’s no way to reduce that risk to zero.
Added methods often serve the purpose of providing functionality that a large
number of people want. In other words, they’re not all written for specialized use in
one program. There’s safety in numbers: if people have been discussing a given
method for years, and if a de facto implementation of the method is floating around
the Ruby world, the chances are good that if you write the method or use an existing
implementation, you won’t collide with anything that someone else may have written.
Some of the methods you’ll see traded around on mailing lists and in blog posts
are perennial favorites.
SOME OLD STANDARDS: MAP_WITH_INDEX AND SINGLETON_CLASS
In chapter 10, you learned about enumerables, enumerators, and the with_index
method. In the days before with_index allowed indexes to be part of almost any enumerable iteration, we had only each_with_index; and people often asked that there
be added to the Enumerable module a map_with_index method, which would be similar to each_with_index (it would yield one element and one integer index number
on each iteration) but would return an array representing iterative executions of the
code block, as map does.
The method was never added, and it became a common practice for people to
write their own versions of it. A typical implementation might look like this:
class Array
def map_with_index
mapping = []
each_with_index do |e,i|
mapping << yield(e,i)
end
mapping
end
end
B
C
D
E
The method starts by creating an array in which it will accumulate the mapping of the
self-array B. Then, it iterates over the array using each_with_index C. Each time
through, it yields the current element and the current index and saves the result to
the accumulator array mapping D. Finally, it returns the mapping E.
Licensed to sam kaplan
392
CHAPTER 13
Object individuation
Here’s an example of map_with_index in action:
cardinals = %w{ first second third fourth fifth }
puts [1,2,3,4,5].map_with_index {|n,i|
"The #{cardinals[i]} number is #{n}."
}
The output is
The first number is 1.
The second number is 2.
# etc.
In Ruby 1.9 the map_with_index scenario is handled by map.with_index. But even 1.9
doesn’t have all the old favorite add-on methods. Another commonly implemented
method, and one which hasn’t been added to 1.9, is Object#singleton_class.
It’s not unusual to want to grab hold of an object’s singleton class in a variable.
Once you do so, it’s possible to manipulate it from the outside, so to speak, in ways
that go beyond what you can do by entering the class-definition context. To get an
object’s singleton class as an object, you need a way to evaluate that class at least long
enough to assign it to a variable. The technique for doing this depends on three facts
you already know.
First, it’s possible to get into a class-definition block for a singleton class:
str = "Hello"
class << str
# We're in str's singleton class!
end
Second, the actual value of any class-definition block is the value of the last expression
evaluated inside it. Third, the value of self inside a class-definition block is the class
object itself.
Putting all this together, we can write the singleton_class method as follows.
class Object
def singleton_class
class << self
self
end
end
end
All this method does is open the singleton class of whatever object is calling it, evaluate self, and close the definition block. Because self in a class-definition block is the
class, in this case it’s the given object’s singleton class. The result is that you can now
grab any object’s singleton class.
You’ll see this method in use later, but even now you can test it and see the effect of
having a singleton class available in a variable. Given the previous definition of
singleton_class, here’s a testbed for it:
class Person
end
B
Licensed to sam kaplan
Modifying Ruby’s core classes and modules
C
david = Person.new
def david.talk
puts "Hi"
end
393
D
dsc = david.singleton_class
E
if dsc.instance_methods.include?(:talk)
puts "Yes, we have a talk method!"
end
F
First, we create a Person test class B as well as an instance of it C. Next, we “teach”
the object a new method: talk D. (It doesn’t matter what the method is called or
what it does; its purpose is to illustrate the workings of the singleton_class method.)
Now, we grab the singleton class of the object and store it in a variable E. Once
we’ve done this, we can, among other things, query the class as to its methods. In the
example, the class is queried as to whether it has an instance method called talk F.
The output from the program is a resounding
Yes, we have a talk method!
The singleton_class method thus lets you capture a singleton class and address it
programmatically the way you might address any other class object. It’s a handy technique, and you’ll see definitions of this method (possibly with a different name) in
many Ruby libraries and programs.
Another way to add functionality to existing Ruby classes and modules is with a passive hooking or pass-through technique.
13.2.3 Pass-through overrides
A pass-through method change involves overriding an existing method in such a way
that the original version of the method ends up getting called along with the new version. The new version does whatever it needs to do and then passes its arguments
along to the original version of the method. It relies on the original method to provide a return value. (As you know from the match and gsub! override examples, calling the original version of a method isn’t enough if you’re going to change the basic
interface of the method by changing its return value.)
You can use pass-through overrides for a number of purposes, including logging
and debugging:
class String
alias __old_reverse__ reverse
def reverse
$stderr.puts "Reversing a string!"
__old_reverse__
end
end
puts "David".reverse
The output of this snippet is as follows:
Licensed to sam kaplan
394
CHAPTER 13
Object individuation
Reversing a string!
divaD
The first line is printed to STDOUT, and the second line is printed to STDERR. The example depends on creating an alias for the original reverse and then calling that alias at
the end of the new reverse.
Aliasing and its aliases
In addition to the alias keyword, Ruby has a method called alias_method, which is
a private instance method of Module. The upshot is that you can create an alias for
a method either like this:
class String
alias __old_reverse__ reverse
or like this:
class String
alias_method :__old_reverse__, :reverse
Because it’s a method and not a keyword, alias_method needs objects rather than
bare method names as its arguments. It can take symbols or strings. Note also that
the arguments to alias do not have a comma between them. Keywords get to do
things like that, but methods don’t.
Here’s another example: hooking into the Hash#[]= method so as to do something
with the key and value being added to the hash while not interfering with the basic
process of adding them to the hash:
B
require "yaml"
class Hash
alias __old_set__ []=
C
def []=(key, value)
__old_set__(key, value)
File.open("hash_contents", "w") do |f|
f.puts(self.to_yaml)
end
value
end
end
D
E
The idea here is to write the hash out to a file in YAML format every time a key is set
with []=. YAML, which stands for “YAML Ain’t a Markup Language,” is a specification
for a data-serialization format. In other words, the YAML standard describes a text format for the representation of data. The YAML library in Ruby (and many other languages also have YAML libraries; YAML is not Ruby-specific) has facilities for serializing
data into YAML strings and turning YAML strings into Ruby objects.
In order to intercept hash operations and save the hash in YAML format, we first
need to require the YAML extension B. Then, inside the Hash class, we create an alias
for the []= method C. Inside the new definition of []=, we start by calling the old
Licensed to sam kaplan
Modifying Ruby’s core classes and modules
395
version of []=, via the __old_set__ alias D. At the end of the method, we return the
assigned value (which is the normal behavior of the original []= method). In between
lies the writing to file of the YAML serialization of the hash E.
To try the program, save it to a file and add the following sample code at the
bottom:
states = {}
states["NJ"] = "New Jersey"
states["NY"] = "New Yorrk"
puts File.read("hash_contents")
puts
states["NY"] = "New York"
puts File.read("hash_contents")
If you run the file, you’ll see two YAML-ized hashes printed out. The first has the
wrong spelling of York; the second has the corrected spelling. What you’re seeing are
two YAML serializations. The pass-through alteration of Hash#[]= has allowed for the
recording of the hash in various states, as serialized by YAML.
It’s possible to write methods that combine the additive and pass-through philosophies. Some examples from ActiveSupport will demonstrate how to do this.
ADDITIVE/PASS-THROUGH HYBRIDS
An additive/pass-through hybrid is a method that has the same name as an existing core
method, calls the old version of the method (so it’s not an out-and-out replacement),
and adds something to the method’s interface. In other words, it’s an override that
offers a superset of the functionality of the original method.
The ActiveSupport library, which is part of the Rails web application development
framework and includes lots of additions to Ruby core classes, features a number of
additive/pass-through hybrid methods. A good example is the to_s method of the
Time class. Unchanged, Time#to_s provides a nice human-readable string representing the time:
>> Time.now.to_s
=> "2008-08-25 07:41:40 -0400"
ActiveSupport adds to the method so that it can take an argument indicating a specific kind of formatting. For example, you can format a Time object in a manner suit-
able for database insertion like this:
>> Time.now.to_s(:db)
=> "2008-08-25 07:46:25"
If you want the date represented as a number, ask for the :number format:
>> Time.now.to_s(:number)
=> "20080825074638"
The :rfc822 argument nets a time formatted in RFC822 style, the standard date format for dates in email headers. It’s similar to the Time#rfc822 method:
>> Time.now.to_s(:rfc822)
=> "Mon, 25 Aug 2008 07:46:41 -0400"
Licensed to sam kaplan
396
CHAPTER 13
Object individuation
The various formats added to Time#to_s work by using strftime, which wraps the system call of the same name and lets you format times in a large number of ways. So
there’s nothing in the modified Time#to_s that you couldn’t do yourself. The
optional argument is added for your convenience (and of course the database-friendly
:db format is of interest mainly if you’re using ActiveSupport in conjunction with an
object-relational library, such as ActiveRecord). The result is a superset of Time#to_s.
You can ignore the add-ons, and the method will work like it always did.
The kind of superset-driven override of core methods represented by ActiveSupport runs some risks: specifically, the risk of collision. Is it likely that you’ll end up
loading two libraries that both add an optional :db argument to Time#to_s? No; it’s
unlikely—but it’s possible. To some extent, a library like ActiveSupport is protected
by its high profile: if you load it, you’re probably familiar with what it does and will
know not to override the overrides. Still, it’s remotely possible that another library you
load might clash with ActiveSupport. As always, it’s difficult or impossible to reduce
the risk of collision to zero. You need to protect yourself by familiarizing yourself with
what every library does and by testing your code sufficiently.
The last major approach to overriding core Ruby behavior we’ll look at—and the
safest way to do it—is the addition of functionality on a strictly per-object basis, using
Object#extend.
13.2.4 Per-object changes with extend
Object#extend is a kind of homecoming in terms of topic flow. We’ve wandered to
the outer reaches of modifying core classes—and extend brings us back to the central
process at the heart of all such changes: changing the behavior of an individual
object. It also brings us back to an earlier topic from this chapter: the mixing of a
module into an object’s singleton class. That’s essentially what extend does.
ADDING TO AN OBJECT’S FUNCTIONALITY WITH EXTEND
Have another look at section 13.1.3 and in particular the Person example where we
mixed the Secretive module into the singleton classes of some Person objects. As a
reminder, the technique was this (where ruby is a Person instance):
class << ruby
include Secretive
end
Here’s how the Person example would look, using extend instead of explicitly opening up the singleton class of the ruby object. Let’s also use extend for david (instead
of the singleton method definition with def):
module Secretive
def name
"[not available]"
end
end
class Person
attr_accessor :name
end
Licensed to sam kaplan
Modifying Ruby’s core classes and modules
397
david = Person.new
david.name = "David"
matz = Person.new
matz.name = "Matz"
ruby = Person.new
ruby.name = "Ruby"
david.extend(Secretive)
ruby.extend(Secretive)
B
puts "We've got one person named #{matz.name}, " +
"one named #{david.name}, "
+
"and one named #{ruby.name}."
Most of this program is the same as the first version. The key difference is the use of
extend B, which has the effect of adding the Secretive module to the lookup paths
of the individual objects david and ruby by mixing it into their respective singleton
classes. That inclusion process happens when you extend a class object, too.
ADDING CLASS METHODS WITH EXTEND
If you write a singleton method on a class object, like so
class Car
def self.makes
%w{ Honda Ford Toyota Chevrolet Volvo }
end
end
or like so
class Car
class << self
def makes
%w{ Honda Ford Toyota Chevrolet Volvo }
end
end
end
or with any of the other notational variants available, you’re adding an instance
method to the singleton class of the class object. It follows that you can achieve this, in
addition to the other ways, by using extend:
module Makers
def makes
%w{ Honda Ford Toyota Chevrolet Volvo }
end
end
class Car
extend Makers
end
If it’s more appropriate in a given situation, you can extend the class object after it
already exists:
Car.extend(Makers)
Licensed to sam kaplan