Wednesday, December 02, 2009

each() in Perl can be dangerous

I'm a Java programmer by trade and one of the first things that is hammered into us (right after never using + for multiple string concatenation) is how to correctly iterate through a Map. Now the reason why is if you want to iterate through a Map's contents using keys only and then performing the value retrieval using its get() method then that is really inefficient. Java's hash implementations are good but considering the majority of the time keys are stored with their values in the backing datastructures; to get the pair out at the same time makes sense. So the first time you come to Perl you think
how on earth do I iterate through a hash?
and then you see the each() method. This gives rise to the following code:


while( my ($key, $value) = each(%my_hash)) {
print $key, '->', $value, "\n";
}


This all seems fine & dandy until we want to do something like this:


our %my_hash = (a=>1,b=>2);
my @elements = get_from_somewhere();
print "Starting\n";
OUTERLOOP: foreach my $el (@elements) {
while( my $key, $value) = each(%my_hash)) {
print $key, "\n";
if($key eq 'a') {
last OUTERLOOP;
}
}
}


Still seems okay right? Wrong! See the problem here comes from iteration of the EntrySet belonging to the hash & not to another data structure so on multiple invocations of this code the iteration of the map continues from where it left-off. So if the elements from the hash are returned in the order a,b & we execute it twice you will see:


Starting
a
Starting
b
a


To get to the correct expected output you need:


our %my_hash = (a=>1,b=>2);
my @elements = get_from_somewhere();
print "Starting\n";
OUTERLOOP: foreach my $el (@elements) {
foreach my $key (keys %my_hash) {
my $value = $my_hash{$key};
print $key, "\n";
if($key eq 'a') {
last OUTERLOOP;
}
}
}


So what's the solution? Well if I said never use each then I would be stupid & wrong. each makes sense. What I would recommend is if you might break out of a loop using a data structure where you may require early termination of a hash structure which is persistent between calls then do not use each. Or use one of the Iterator functions/modules in CPAN to provide a more Java like method of access to hashes; but then that wouldn't be Perl :).

2 comments:

Anonymous said...

Or if you do break out of an each loop, call keys %hash in void context to reset the iterator for subsequent calls.

Andy Y said...

Indeed good call smylers. One thing I would always be worried about is what that's doing behind the scenes and if you're using while(each()) for memory concerns is that going to cause just as many problems.

Really what I want is the iterator pattern with different strategies for cleaning up the hash. Something like natatime from List::MoreUtils is a nice way of doing it for Arrays :)