Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.65 MB, 448 trang )
Figure 5-1. An iterator yielding to its invoking method
chars = "hello world".tap
.each_char
.tap
.to_a
.tap
.map {|c| c.succ } .tap
.sort
.tap
{|x|
{|x|
{|x|
{|x|
{|x|
puts
puts
puts
puts
puts
"original object: #{x.inspect}"}
"each_char returns: #{x.inspect}"}
"to_a returns: #{x.inspect}"}
"map returns: #{x.inspect}" }
"sort returns: #{x.inspect}"}
Another common function for iterators is automatic resource deallocation. The
File.open method can be used as an iterator, for example. It opens the named file,
creating a File object to represent it. If no block is associated with the invocation, it
simply returns the File object and leaves the responsibility for closing the file with the
calling code. If there is a block associated with the File.open call, however, it passes
the new File object to that block and then automatically closes the file when the block
returns. This ensures that files will always be closed and frees programmers from this
housekeeping detail. In this case, when a block is associated with the call to
File.open, the return value of method is not a File object but whatever value the block
returned.
5.3.1 Numeric Iterators
The core Ruby API provides a number of standard iterators. The Kernel method loop
behaves like an infinite loop, running its associated block repeatedly until the block
executes a return, break, or other statement that exits from the loop.
The Integer class defines three commonly used iterators. The upto method invokes its
associated block once for each integer between the integer on which it is invoked and
the integer which is passed as an argument. For example:
5.3 Iterators and Enumerable Objects | 131
4.upto(6) {|x| print x}
# => prints "456"
As you can see, upto yields each integer to the associated block, and it includes both
the starting point and the end point in the iteration. In general, n.upto(m) runs its block
m-n+1 times.
The downto method is just like upto but iterates from a larger number down to a smaller
number.
When the Integer.times method is invoked on the integer n, it invokes its block n times,
passing values 0 through n-1 on successive iterations. For example:
3.times {|x| print x }
# => prints "012"
In general, n.times is equivalent to 0.upto(n-1).
If you want to do a numeric iteration using floating-point numbers, you can use the
more complex step method defined by the Numeric class. The following iterator, for
example, starts at 0 and iterates in steps of 0.1 until it reaches Math::PI:
0.step(Math::PI, 0.1) {|x| puts Math.sin(x) }
5.3.2 Enumerable Objects
Array, Hash, Range, and a number of other classes define an each iterator that passes
each element of the collection to the associated block. This is perhaps the most commonly used iterator in Ruby; as we saw earlier, the for loop only works for iterating
over objects that have each methods. Examples of each iterators:
[1,2,3].each {|x| print x }
(1..3).each {|x| print x }
# => prints "123"
# => prints "123" Same as 1.upto(3)
The each iterator is not only for traditional “data structure” classes. Ruby’s IO class
defines an each iterator that yields lines of text read from the Input/Output object. Thus,
you can process the lines of a file in Ruby with code like this:
File.open(filename) do |f|
f.each {|line| print line }
end
# Open named file, pass as f
# Print each line in f
# End block and close file
Most classes that define an each method also include the Enumerable module, which
defines a number of more specialized iterators that are implemented on top of the
each method. One such useful iterator is each_with_index, which allows us to add line
numbering to the previous example:
File.open(filename) do |f|
f.each_with_index do |line,number|
print "#{number}: #{line}"
end
end
Some of the most commonly used Enumerable iterators are the rhyming methods
collect, select, reject, and inject. The collect method (also known as map) executes
132 | Chapter 5: Statements and Control Structures
its associated block for each element of the enumerable object, and collects the return
values of the blocks into an array:
squares = [1,2,3].collect {|x| x*x}
# => [1,4,9]
The select method invokes the associated block for each element in the enumerable
object, and returns an array of elements for which the block returns a value other than
false or nil. For example:
evens = (1..10).select {|x| x%2 == 0} # => [2,4,6,8,10]
The reject method is simply the opposite of select; it returns an array of elements for
which the block returns nil or false. For example:
odds = (1..10).reject {|x| x%2 == 0}
# => [1,3,5,7,9]
The inject method is a little more complicated than the others. It invokes the associated
block with two arguments. The first argument is an accumulated value of some sort
from previous iterations. The second argument is the next element of the enumerable
object. The return value of the block becomes the first block argument for the next
iteration, or becomes the return value of the iterator after the last iteration. The initial
value of the accumulator variable is either the argument to inject, if there is one, or
the first element of the enumerable object. (In this case, the block is invoked just once
for the first two elements.) Examples make inject more clear:
data = [2, 5, 3, 4]
sum = data.inject {|sum, x| sum + x }
floatprod = data.inject(1.0) {|p,x| p*x }
max = data.inject {|m,x| m>x ? m : x }
# => 14
(2+5+3+4)
# => 120.0 (1.0*2*5*3*4)
# => 5
(largest element)
See §9.5.1 for further details on the Enumerable module and its iterators.
5.3.3 Writing Custom Iterators
The defining feature of an iterator method is that it invokes a block of code associated
with the method invocation. You do this with the yield statement. The following
method is a trivial iterator that just invokes its block twice:
def twice
yield
yield
end
To pass argument values to the block, follow the yield statement with a commaseparated list of expressions. As with method invocation, the argument values may
optionally be enclosed in parentheses. The following simple iterator shows a use of
yield:
# This method expects a block. It generates n values of the form
# m*i + c, for i from 0..n-1, and yields them, one at a time,
# to the associated block.
def sequence(n, m, c)
5.3 Iterators and Enumerable Objects | 133
i = 0
while(i < n)
yield m*i + c
i += 1
end
end
# Loop n times
# Invoke the block, and pass a value to it
# Increment i each time
# Here is an invocation of that method, with a block.
# It prints the values 1, 6, and 11
sequence(3, 5, 1) {|y| puts y }
Nomenclature: yield and Iterators
Depending on your programming background, you may find the terms “yield” and
“iterator” confusing. The sequence method shown earlier is a fairly clear example of
why yield has the name it does. After computing each number in the sequence, the
method yields control (and yields the computed number) to the block, so that the block
can work with it. It is not always this clear, however; in some code it may seem as if it
is the block that is yielding a result back to the method that invoked it.
A method such as sequence that expects a block and invokes it multiple times is called
an iterator because it looks and behaves like a loop. This may be confusing if you are
used to languages like Java in which iterators are objects. In Java, the client code that
uses the iterator is in control and “pulls” values from the iterator when it needs them.
In Ruby, the iterator method is in control and “pushes” values to the block that wants
them.
This nomenclature issue is related to the distinction between “internal iterators” and
“external iterators,” which is discussed later in this section.
Here is another example of a Ruby iterator; it passes two arguments to its block. It is
worth noticing that the implementation of this iterator uses another iterator internally:
# Generate n points evenly spaced around the circumference of a
# circle of radius r centered at (0,0). Yield the x and y coordinates
# of each point to the associated block.
def circle(r,n)
n.times do |i|
# Notice that this method is implemented with a block
angle = Math::PI * 2 * i / n
yield r*Math.cos(angle), r*Math.sin(angle)
end
end
# This invocation of the iterator prints:
# (1.00, 0.00) (0.00, 1.00) (-1.00, 0.00) (-0.00, -1.00)
circle(1,4) {|x,y| printf "(%.2f, %.2f) ", x, y }
Using the yield keyword really is a lot like invoking a method. (See Chapter 6 for
complete details on method invocation.) Parentheses around the arguments are optional. You can use * to expand an array into individual arguments. yield even allows
you to pass a hash literal without the curly braces around it. Unlike a method
134 | Chapter 5: Statements and Control Structures
invocation, however, a yield expression may not be followed by a block. You cannot
pass a block to a block.
If a method is invoked without a block, it is an error for that method to yield, because
there is nothing to yield to. Sometimes you want to write a method that yields to a block
if one is provided but takes some default action (other than raising an error) if invoked
with no block. To do this, use block_given? to determine whether there is a block
associated with the invocation. block_given?, and its synonym iterator?, are Kernel
methods, so they act like global functions. Here is an example:
# Return an array with n elements
# If a block is given, also yield
def sequence(n, m, c)
i, s = 0, []
#
while(i < n)
#
y = m*i + c
#
yield y if block_given?
#
s << y
#
i += 1
end
s
#
end
of the form m*i+c
each element to the block
Initialize variables
Loop n times
Compute value
Yield, if block
Store the value
Return the array of values
5.3.4 Enumerators
An enumerator is an Enumerable object whose purpose is to enumerate some other
object. To use enumerators in Ruby 1.8, you must require 'enumerator'. In Ruby 1.9
(and also 1.8.7), enumerators are built-in and no require is necessary. (As we’ll see
later, the built-in enumerators have substantially more functionality than that provided
by the enumerator library.)
Enumerators are of class Enumerable::Enumerator. Although this class can be instantiated directly with new, this is not how enumerators are typically created. Instead, use
to_enum or its synonym enum_for, which are methods of Object. With no arguments,
to_enum returns an enumerator whose each method simply calls the each method of the
target object. Suppose you have an array and a method that expects an enumerable
object. You don’t want to pass the array object itself, because it is mutable, and you
don’t trust the method not to modify it. Instead of making a defensive deep copy of the
array, just call to_enum on it, and pass the resulting enumerator instead of the array
itself. In effect, you’re creating an enumerable but immutable proxy object for your
array:
# Call this method with an Enumerator instead of a mutable array.
# This is a useful defensive strategy to avoid bugs.
process(data.to_enum) # Instead of just process(data)
You can also pass arguments to to_enum, although the enum_for synonym seems more
natural in this case. The first argument should be a symbol that identifies an iterator
method. The each method of the resulting Enumerator will invoke the named method
5.3 Iterators and Enumerable Objects | 135
of the original object. Any remaining arguments to enum_for will be passed to that
named method. In Ruby 1.9, the String class is not Enumerable, but it defines three
iterator methods: each_char, each_byte, and each_line. Suppose we want to use an
Enumerable method, such as map, and we want it to be based on the each_char iterator.
We do this by creating an enumerator:
s = "hello"
s.enum_for(:each_char).map {|c| c.succ }
# => ["i", "f", "m", "m", "p"]
In Ruby 1.9 (and 1.8.7), it is usually not even necessary to use to_enum or enum_for
explicitly as we did in the previous examples. This is because the built-in iterator methods of Ruby 1.9 (which include the numeric iterators times, upto, downto, and step, as
well as each and related methods of Enumerable) automatically return an enumerator
when invoked with no block. So, to pass an array enumerator to a method rather than
the array itself, you can simply call the each method:
process(data.each_char)
# Instead of just process(data)
This syntax is even more natural if we use the chars alias in place of each_char. To map
the characters of a string to an array of characters, for example, just use .chars.map:
"hello".chars.map {|c| c.succ }
# => ["i", "f", "m", "m", "p"]
Here are some other examples that rely on enumerator objects returned by iterator
methods. Note that it is not just iterator methods defined by Enumerable that can return
enumerator objects; numeric iterators like times and upto do the same:
enumerator = 3.times
enumerator.each {|x| print x }
# An enumerator object
# Prints "012"
# downto returns an enumerator with a select method
10.downto(1).select {|x| x%2==0} # => [10,8,6,4,2]
# each_byte iterator returns an enumerator with a to_a method
"hello".each_byte.to_a
# => [104, 101, 108, 108, 111]
You can duplicate this behavior in your own iterator methods by returning
self.to_enum when no block is supplied. Here, for example, is a version of the twice
iterator shown earlier that can return an enumerator if no block is provided:
def twice
if block_given?
yield
yield
else
self.to_enum(:twice)
end
end
In Ruby 1.9, enumerator objects define a with_index method that is not available in the
Ruby 1.8 enumerator module. with_index simply returns a new enumerator that adds
an index parameter to the iteration. For example, the following returns an enumerator
that yields the characters of a string and their index within the string:
136 | Chapter 5: Statements and Control Structures
enumerator = s.each_char.with_index
Finally, keep in mind that enumerators, in both Ruby 1.8 and 1.9, are Enumerable
objects that can be used with the for loop. For example:
for line, number in text.each_line.with_index
print "#{number+1}: #{line}"
end
5.3.5 External Iterators
Our discussion of enumerators has focused on their use as Enumerable proxy objects.
In Ruby 1.9, (and 1.8.7, though the implementation is not as efficient) however, enumerators have another very important use: they are external iterators. You can use an
enumerator to loop through the elements of a collection by repeatedly calling the
next method. When there are no more elements, this method raises a StopIteration
exception:
iterator = 9.downto(1)
begin
print iterator.next while true
rescue StopIteration
puts "...blastoff!"
end
#
#
#
#
#
An enumerator as external iterator
So we can use rescue below
Call the next method repeatedly
When there are no more values
An expected, nonexceptional condition
Internal versus External Iterators
The “gang of four” define and contrast internal and external iterators quite clearly in
their design patterns book:*
A fundamental issue is deciding which party controls the iteration, the iterator or
the client that uses the iterator. When the client controls the iteration, the iterator
is called an external iterator, and when the iterator controls it, the iterator is an
internal iterator. Clients that use an external iterator must advance the traversal
and request the next element explicitly from the iterator. In contrast, the client
hands an internal iterator an operation to perform, and the iterator applies that
operation to every element....
External iterators are more flexible than internal iterators. It’s easy to compare
two collections for equality with an external iterator, for example, but it’s practically impossible with internal iterators…. But on the other hand, internal iterators
are easier to use, because they define the iteration logic for you.
In Ruby, iterator methods like each are internal iterators; they control the iteration and
“push” values to the block of code associated with the method invocation. Enumerators
have an each method for internal iteration, but in Ruby 1.9 and later, they also work
as external iterators—client code can sequentially “pull” values from an enumerator
with next.
* Design Patterns: Elements of Reusable Object-Oriented Software, by Gamma, Helm, Johnson, and
Vlissides (Addison-Wesley).
5.3 Iterators and Enumerable Objects | 137
External iterators are quite simple to use: just call next each time you want another
element. When there are no more elements left, next will raise a StopIteration exception. This may seem unusual—an exception is raised for an expected termination
condition rather than an unexpected and exceptional event. (StopIteration is a descendant of StandardError and IndexError; note that it is one of the only exception
classes that does not have the word “error” in its name.) Ruby follows Python in this
external iteration technique. By treating loop termination as an exception, it makes
your looping logic extremely simple; there is no need to check the return value of
next for a special end-of-iteration value, and there is no need to call some kind of
next? predicate before calling next.
To simplify looping with external iterators, the Kernel.loop method includes (in Ruby
1.9) an implicit rescue clause and exits cleanly when StopIteration is raised. Thus, the
countdown code shown earlier could more easily be written like this:
iterator = 9.downto(1)
loop do
print iterator.next
end
puts "...blastoff!"
# Loop until StopIteration is raised
# Print next item
Many external iterators can be restarted by calling the rewind method. Note, however,
that rewind is not effective for all enumerators. If an enumerator is based on an object
like a File which reads lines sequentially, calling rewind will not restart the iteration
from the beginning. In general, if new invocations of each on the underlying
Enumerable object do not restart the iteration from the beginning, then calling rewind
will not restart it either.
Once an external iteration has started (i.e., after next has been called for the first time),
an enumerator cannot be cloned or duplicated. It is typically possible to clone an enumerator before next is called, or after StopIteration has been raised or rewind is called.
Normally, enumerators with next methods are created from Enumerable objects that
have an each method. If, for some reason, you define a class that provides a next method
for external iteration instead of an each method for internal iteration, you can easily
implement each in terms of next. In fact, turning an externally iterable class that implements next into an Enumerable class is as simple as mixing in (with include—see
§7.5) a module like this:
module Iterable
include Enumerable
def each
loop { yield self.next }
end
end
# Define iterators on top of each
# And define each on top of next
Another way to use an external iterator is to pass it to an internal iterator method like
this one:
def iterate(iterator)
loop { yield iterator.next }
138 | Chapter 5: Statements and Control Structures
end
iterate(9.downto(1)) {|x| print x }
The earlier quote from Design Patterns alluded to one of the key features of external
iterators: they solve the parallel iteration problem. Suppose you have two Enumerable
collections and need to iterate their elements in pairs: the first elements of each collection, then the second elements, and so on. Without an external iterator, you must
convert one of the collections to an array (with the to_a method defined by
Enumerable) so that you can access its elements while iterating the other collection with
each.
Example 5-1 shows the implementation of three iterator methods. All three accept an
arbitrary number of Enumerable objects and iterate them in different ways. One is a
simple sequential iteration using only internal iterators; the other two are parallel
iterations and can only be done using the external iteration features of Ruby 1.9.
Example 5-1. Parallel iteration with external iterators
# Call the each method of each collection in turn.
# This is not a parallel iteration and does not require enumerators.
def sequence(*enumerables, &block)
enumerables.each do |enumerable|
enumerable.each(&block)
end
end
# Iterate the specified collections, interleaving their elements.
# This can't be done efficiently without external iterators.
# Note the use of the uncommon else clause in begin/rescue.
def interleave(*enumerables)
# Convert enumerable collections to an array of enumerators.
enumerators = enumerables.map {|e| e.to_enum }
# Loop until we don't have any more enumerators.
until enumerators.empty?
begin
e = enumerators.shift
# Take the first enumerator
yield e.next
# Get its next and pass to the block
rescue StopIteration
# If no more elements, do nothing
else
# If no exception occurred
enumerators << e
# Put the enumerator back
end
end
end
# Iterate the specified collections, yielding tuples of values,
# one value from each of the collections. See also Enumerable.zip.
def bundle(*enumerables)
enumerators = enumerables.map {|e| e.to_enum }
loop { yield enumerators.map {|e| e.next} }
end
# Examples of how these iterator methods work
a,b,c = [1,2,3], 4..6, 'a'..'e'
5.3 Iterators and Enumerable Objects | 139
sequence(a,b,c) {|x| print x}
# prints "123456abcde"
interleave(a,b,c) {|x| print x} # prints "14a25b36cde"
bundle(a,b,c) {|x| print x}
# '[1, 4, "a"][2, 5, "b"][3, 6, "c"]'
The bundle method of Example 5-1 is similar to the Enumerable.zip method. In Ruby
1.8, zip must first convert its Enumerable arguments to arrays and then use those arrays
while iterating through the Enumerable object it is called on. In Ruby 1.9, however,
the zip method can use external iterators. This makes it (typically) more efficient in
space and time, and also allows it to work with unbounded collections that could not
be converted into an array of finite size.
5.3.6 Iteration and Concurrent Modification
In general, Ruby’s core collection of classes iterate over live objects rather than private
copies or “snapshots” of those objects, and they make no attempt to detect or prevent
concurrent modification to the collection while it is being iterated. If you call the
each method of an array, for example, and the block associated with that invocation
calls the shift method of the same array, the results of the iteration may be surprising:
a = [1,2,3,4,5]
a.each {|x| puts "#{x},#{a.shift}" }
# prints "1,1\n3,2\n5,3"
You may see similarly surprising behavior if one thread modifies a collection while
another thread is iterating it. One way to avoid this is to make a defensive copy of the
collection before iterating it. The following code, for example, adds a method
each_in_snapshot to the Enumerable module:
module Enumerable
def each_in_snapshot &block
snapshot = self.dup
# Make a private copy of the Enumerable object
snapshot.each &block
# And iterate on the copy
end
end
5.4 Blocks
The use of blocks is fundamental to the use of iterators. In the previous section, we
focused on iterators as a kind of looping construct. Blocks were implicit to our discussion but were not the subject of it. Now we turn our attention to the block themselves.
The subsections that follow explain:
•
•
•
•
The syntax for associating a block with a method invocation
The “return value” of a block
The scope of variables in blocks
The difference between block parameters and method parameters
140 | Chapter 5: Statements and Control Structures
5.4.1 Block Syntax
Blocks may not stand alone; they are only legal following a method invocation. You
can, however, place a block after any method invocation; if the method is not an iterator
and never invokes the block with yield, the block will be silently ignored. Blocks are
delimited with curly braces or with do and end keywords. The opening curly brace or
the do keyword must be on the same line as the method invocation, or else Ruby interprets the line terminator as a statement terminator and invokes the method without
the block:
# Print the numbers 1 to 10
1.upto(10) {|x| puts x }
# Invocation and block on one line with braces
1.upto(10) do |x|
# Block delimited with do/end
puts x
end
1.upto(10)
# No block specified
{|x| puts x }
# Syntax error: block not after an invocation
One common convention is to use curly braces when a block fits on a single line, and
to use do and end when the block extends over multiple lines.This is not completely a
matter of convention, however; the Ruby parser binds { tightly to the token that precedes it. If you omit the parentheses around method arguments and use curly brace
delimiters for a block, then the block will be associated with the last method argument
rather than the method itself, which is probably not what you want. To avoid this case,
put parentheses around the arguments or delimit the block with do and end:
1.upto(3) {|x| puts x }
# Parens and curly braces work
1.upto 3 do |x| puts x end # No parens, block delimited with do/end
1.upto 3 {|x| puts x }
# Syntax Error: trying to pass a block to 3!
Blocks can be parameterized, just as methods can. Block parameters are separated with
commas and delimited with a pair of vertical bar (|) characters, but they are otherwise
much like method parameters (see §5.4.5 for details):
# The Hash.each iterator passes two arguments to its block
hash.each do |key, value|
# For each (key,value) pair in the hash
puts "#{key}: #{value}"
# Print the key and the value
end
# End of the block
It is a common convention to write the block parameters on the same line as the method
invocation and the opening brace or do keyword, but this is not required by the syntax.
5.4.2 The Value of a Block
In the iterator examples shown so far in this chapter, the iterator method has yielded
values to its associated block but has ignored the value returned by the block. This is
not always the case, however. Consider the Array.sort method. If you associate a block
with an invocation of this method, it will yield pairs of elements to the block, and it is
the block’s job to sort them. The block’s return value (–1, 0, or 1) indicates the ordering
5.4 Blocks | 141