Iterators 101: make your data go “loopty-loop”

ALoopll developers constantly come into contact with amounts of data stored in arrays or objects. Getting a hold of the particular chunk of data depends on the type of input you’re dealing with and the way you want manage and output it. If you’re dealing with arrays all is well and you can easily loop through. In some cases though it isn’t always that clear how you can get a hold of your input and how you can actually iterate through it all.

A lot of people would say: “what’s all the fuss about, just write some conversion function that gets the input and stores it to an array”. Yes, that is true, but in a lot of cases this can turn really ugly. I know … I’ve done it myself a lot of times.

As professional developer you always want set a certain standard and make  your code:

  • Readable
  • Easy to understand
  • Testable
  • Easy to refactor

Iterator

Definition

In the process of finding a best practice, I stumbled upon a mighty design pattern called “Iterator”. Wikipedia defines it as:

Wikipedia says:

In object-oriented programming, the Iterator pattern is a design pattern in which iterators are used to access the elements of an aggregate object sequentially without exposing its underlying representation. An Iterator object encapsulates the internal structure of how the iteration occurs.

How it works

When you iterate through your data you usually write some while or for loop. Some might use a foreach. But you still want to use your objects because they’re intuitive. Do you really want to write some garbage code that converts them to arrays of data? NO is the anwser to that one, but you kind of figured that out when reading the Wikipedia definition.

The primary goal of the iterator design pattern is to inform your iteration structure how to approach the data an object contains. So basically, you treat an object as a simple array. PHP (and other languages) offer this information by defining a number of information methods which are enforced by implementing the Iterator interface.

In PHP we owe all of this to a man called Marcus Börger who made a PECL extension called “SPL” (Standard PHP Library). SPL contains the Iterator interface and a set of other useful classes. Before SPL became common PHP material, Marcus (also known as Helly) put all the information on his PHP homedirectory. Nowadays the PHP manual itself contains the information needed to use SPL although they still tend to refer to certain pages on Marcus’ homedirectory. I would most definitely advise you to take a look at it as it might even set your brains on fire. Please be careful though.

For the purpose of this article I’d like to refer you to the extract of the documentation regarding the Iterator interface. When you take a closer look at your common array iteration, there are a couple of conditions:

  • There is an index pointer that keeps track of the active array element
  • If you assign a certain value to your index pointer, you should make sure the offset exists
  • There should be a way to reset the index pointer when you want to go back to your first element
  • There should be a way to go to the next element

The interface

The Iterator interface has these bases covered and enforces the following methods for that matter:

Iterator implements Traversable {
/* Methods */
abstract public mixed Iterator::current ( void )
abstract public scalar Iterator::key ( void )
abstract public void Iterator::next ( void )
abstract public void Iterator::rewind ( void )
abstract public boolean Iterator::valid ( void )
}
  • Current: returns the current element in the iteration process
  • Key: return the key of the current element. This is actually the value of the index pointer
  • Next: moves the index pointer to the next element
  • Rewind: rewinds the iteration to the first element by resetting the index pointer to zero
  • Valid: checks if the current or next element are a valid offset. This method is also used to end the iteration process.

An example

You guys probably want an example which demonstrates the usage. It could have figured one out myself but instead I’ve justed ripped one off the PHP manual. Check it out

Source code

<?php
class myIterator implements Iterator {
    private $position = 0;
    private $array = array(
        "firstelement",
        "secondelement",
        "lastelement",
    );  
 
    public function __construct() {
        $this->position = 0;
    }
 
    function rewind() {
        var_dump(__METHOD__);
        $this->position = 0;
    }
 
    function current() {
        var_dump(__METHOD__);
        return $this->array[$this->position];
    }
 
    function key() {
        var_dump(__METHOD__);
        return $this->position;
    }
 
    function next() {
        var_dump(__METHOD__);
        ++$this->position;
    }
 
    function valid() {
        var_dump(__METHOD__);
        return isset($this->array[$this->position]);
    }
}
 
$it = new myIterator;
 
foreach($it as $key => $value) {
    var_dump($key, $value);
    echo "\n";
}

Output

string(18) "myIterator::rewind"
string(17) "myIterator::valid"
string(19) "myIterator::current"
string(15) "myIterator::key"
int(0)
string(12) "firstelement"

string(16) "myIterator::next"
string(17) "myIterator::valid"
string(19) "myIterator::current"
string(15) "myIterator::key"
int(1)
string(13) "secondelement"

string(16) "myIterator::next"
string(17) "myIterator::valid"
string(19) "myIterator::current"
string(15) "myIterator::key"
int(2)
string(11) "lastelement"

string(16) "myIterator::next"
string(17) "myIterator::valid"

Do you want an encore?

If you think you’ve reached iterator walhalla by applying this design pattern you’re dead wrong. It offers a general solution for some problems, but there are also several other Iterators which do inherit from the regular one, but offer you more benefits.

The ones I like in particular are:

Because I see myself as a nice guy I’ll cover them as well.

ArrayIterator

I like Iterators very much, but sometimes it’s such a drag to fill all the obligatory methods. In a lot of cases the data that needs to be iterated resides in your object as an array. With the ArrayIterator you no longer need to do it manually. As you can see in the class definition below it implements a set of other Interfaces which adds some bonus value to the package

ArrayIterator implements Iterator , Traversable , ArrayAccess , SeekableIterator , Countable {
/* Methods */
mixed ArrayIterator::current ( void )
mixed ArrayIterator::key ( void )
void ArrayIterator::next ( void )
void ArrayIterator::rewind ( void )
void ArrayIterator::seek ( int $position )
bool ArrayIterator::valid ( void )
}

Because the ArrayIterator implements the ArrayAccess Interface you don’t always have to iterate your object, you can also treat it as an array by just getting an offset. There’s also counting support because of the Countable Interface.

Here’s a simple usage example:

<?php
$data = array('a'=>1,'b'=>2,'c'=>3);
$iterator = new ArrayIterator($data);
foreach($iterator as $key=>$value){
    echo "Key: $key - Value: $value ".PHP_EOL;
}

And here’s the output:

Key: a - Value: 1
Key: b - Value: 2
Key: c - Value: 3

DirectoryIterator

In most circumstances your workable data will be stored in objects and arrays, but where does all that stuff originate from? Probably a database or a file or even user input and might even be fetched from a webservice. As a developer you’re responsable to populate your working datasets with the input available.

Reading from files and directories can be a particularly annoying chore. You can fiddle about with opendir() and readdir(), but I personally prefer the DirectoryIterator. It’s an easy-to-use class that extend the SplFileInfo class and implements the basic Iterator interface. You not only implement an iteration model, but you allow your iteration elements to be treated als files or directories. There are a set of useful methods implemented by the parent SPlFileInfo class that allow you to get the necessary information on a file or directory. If you check out the class API below you’ll see what these advantages are:

DirectoryIterator extends SplFileInfo implements Iterator , Traversable {
/* Methods */
DirectoryIterator::__construct ( string $path )
DirectoryIterator DirectoryIterator::current ( void )
int DirectoryIterator::getATime ( void )
int DirectoryIterator::getCTime ( void )
string DirectoryIterator::getFilename ( void )
int DirectoryIterator::getGroup ( void )
int DirectoryIterator::getInode ( void )
int DirectoryIterator::getMTime ( void )
int DirectoryIterator::getOwner ( void )
string DirectoryIterator::getPath ( void )
string DirectoryIterator::getPathname ( void )
int DirectoryIterator::getPerms ( void )
int DirectoryIterator::getSize ( void )
string DirectoryIterator::getType ( void )
bool DirectoryIterator::isDir ( void )
bool DirectoryIterator::isDot ( void )
bool DirectoryIterator::isExecutable ( void )
bool DirectoryIterator::isFile ( void )
bool DirectoryIterator::isLink ( void )
bool DirectoryIterator::isReadable ( void )
bool DirectoryIterator::isWritable ( void )
string DirectoryIterator::key ( void )
void DirectoryIterator::next ( void )
void DirectoryIterator::rewind ( void )
string DirectoryIterator::valid ( void )
}

A small example will show you how to use it in a simple way:

<?php
$iterator = new DirectoryIterator(dirname(__FILE__));
foreach($iterator as $fileInfo){
    if(!$fileInfo->isDot()){
        echo $fileInfo->getFilename() . PHP_EOL;
    }
}

So this example retrieves all files and directories located in the actual directory from which you call the script. On Linux systems there are also system folders stored in a directory. These are in fact the dot and double dot which respectively store information about the current and the parent directory. The little if structure in the example filters these elements out because they aren’t useful.

FilterIterator

The FilterIterator is cool for a very specific reason: it allows you to iterate an existing iterator and gives you the added value by allowing you to filter elements that don’t belong in your data. The accept actually takes care if the filtering part, but unfortunately it’s not included in PHP.net’s API documentation. What I did find is the info below:

abstract FilterIterator extends IteratorIterator implements OuterIterator , Traversable , Iterator {
/* Methods */
mixed FilterIterator::current ( void )
Iterator FilterIterator::getInnerIterator ( void )
mixed FilterIterator::key ( void )
void FilterIterator::next ( void )
void FilterIterator::rewind ( void )
bool FilterIterator::valid ( void )
}

I think an example is in order here:

<?php
class MyFilterIterator extends FilterIterator
{
    public function __construct($data)
    {
        parent::__construct(new ArrayIterator($data));
    }
    public function accept()
    {
        return ($this->current()%2 == 0);
    }
}
 
$data  = array(1,2,3,4,5);
$iterator = new MyFilterIterator($data);
foreach($iterator as $item){
	echo $item.PHP_EOL;
}

This example creates a custom FilterIterator that only allows even numbers and will drop odd by returning false in the accept method.

This is the output of the example script:

2
4

Added bonus

We’ve now seen how Iterators can help us, but as an added bonus we can combine our DirectoryIterator and our FilterIterator to create a custom iterator that filters a directory and only lists for example XML files. I must admit, I didn’t figure this one out myself, again I owe it to Marcus Börger who mentions this in his FindFile & RegexFindFile examples.

So what are we gonna do? Well … we’re gonna make a custom iterator which extends the FilterIterator. The constructor will take a path and will pass it to a ”RecursiveDirectoryIterator”. The results will be passed back to the parent FilterIterator. Using the accept method we’re gonna validate each item and only allow XML files using a regular expression.

<?php
class XMLDirectoryIterator extends FilterIterator
{
    public function __construct($path)
    {
        parent::__construct(new RecursiveDirectoryIterator($path));
    }
    public function accept()
    {
        return preg_match('/^.*\.xml$/',$this->current()->getFileName()) && is_readable($this->current()->getPathName());
    }
}
 
$iterator = new XMLDirectoryIterator(dirname(__FILE__));
foreach($iterator as $item){
	echo $item->getFilename().PHP_EOL;
}

You might have noticed the RecursiveDirectoryIterator. This not only retrieves files and directories in the current directory, but it will also scan subdirectories for matching files. Pretty neat huh!

No Comments

Leave a Reply

Your email is never shared.Required fields are marked *