ntoll.org

(Everything I say is false...)
home | about | articles | cv | contact

Test Driven Development Cargo Cult

Monday 5th November 2012 (08:00AM)

The London Python Code Dojo is a great place to share ideas and collaborate with an enthusiastic and friendly group of peers. I'm going to tell the story of an interesting outcome from last Thursday's dojo - organised with great aplomb by Python core developer and Windows guru Tim Golden.

We departed from the usual democratic method of choosing a dojo task: Tim decided that as we regularly solved problems concerning board games it would be good to produce a "one true board" code library to re-use in future dojos. To this end Tim created a Github repository containing a single README.rst file containing requirements. We were to fork the repository, create a directory for our team's code and send a pull request after the dojo so posterity could have a good laugh at our solutions.

I found myself in a team with general all-round-nice-guy-and-man-about-the-technological-town Robert Rees. He's a hugely experienced, well read and thoughtful senior developer / architect at The Guardian. He's also always ready to step up to share his successes, failures and discoveries with others (he opened the evening with an excellent lightning talk on functional programming with Python). Once again, he proved to have an excellent eye for a learning opportunity...

During the team discussion Robert suggested we work using strict test driven development (TDD): focus on a specific thing that needed implementing, write a test for just that thing, make it pass then rinse and repeat until all requirements are met with no failing tests. We took it in turns to write code. At the end of each turn the "programmer" wrote a failing test for the next person in the group to solve, and so the cycle continued. Given that we had very clear requirements in the form of Tim's README file this was an excellent suggestion on Robert's part.

Thanks to Robert, what happened next was both funny and educational...

I set up the basic scaffolding for the project and Robert quickly created a "Board" class containing some stubbed out yet obvious methods and an associated test stub. Team-mate and Pygame core developer Rene Dudfield read out the next requirement,

"You should be able to put something at a coordinate on the board."

A suitable test was written and the next team member (another Nick) sat in the hot seat. Rather than implement a complex or complete solution to the requirement he was encouraged, in true TDD fashion, to do the minimum to make the test pass. So, he hard coded the return value expected by the test, in this case the string "Foo".

It was at this point that I made the mistake of opening my big mouth (again). I said something along the lines of, "the problem with unit tests are that they're behaviourist in outlook: they don't ensure that the internal quality of the solution is any good - simply that the solution passes some tests" i.e. it displays such-and-such a behaviour.

I was referring to the behaviourist movement founded by J.B.Watson where things can only be defined in terms of observable behaviours - there is no room for introspection. When taken to an extreme, behaviourist philosophy claims that a person has no consciousness but is simply a thing that behaves. Of course, there are problems with this outlook. The obvious question of what is meant by "behaviour" is far more difficult to answer than it at first appears. More problematic, behaviourism seems counter to personal "internal" experience: pain, love, sadness and joy are common feelings yet may not be manifested in externally observable behaviour. Furthermore, behaviours associated with such feelings can be easily and convincingly aped (for example, by actors or con artists) which may even be a requirement for more complex interactions such as irony (where you act one way but mean something else - often the opposite to how you act).

Given that we were enjoying the silly exercise of making tests pass with the minimum of effort Robert stepped up with a new challenge: he would write a test to force the development of something sensible. This would show that test driven development inevitably leads to a good solution by forcing us to make the failing test pass.

We found Robert had written the following:

def test_pieces_are_placed_and_restored(self):
    width = 4000
    height = 4000
    board = Board(width, height)
    pieces = [(random.randint(0, width), random.randint(0, height),
              {"i": random.randint(0, 5000)}) for x in
              range(0, random.randint(0, 57000))]
    for piece in pieces:
        board.place(piece[0], piece[1], piece[2])
        self.assertEquals(board.get(piece[0], piece[1])["i"], piece[2])
    self.assertEquals(len(board.contents()), len(pieces))

His test creates a huge board, places random pieces at random locations on the board, checks that expected pieces are found in the correct location on the board and finally ensures the board's "contents" contain the same number of pieces that the test randomly created.

Surely we had to write a quality solution to make such a test pass?

I'm ashamed to say that we met Robert's challenge by intentionally creating a monster (including suggestions from Robert himself). At the top of our board module we wrote this:

import random
random.seed('robert rees was here')

Then we updated the Board class's get method to:

def get(self, x, y):
    if self.max_x == 4000:
       # Take that @rrees
       if not hasattr(self, 'rrees'):
           random.seed('robert rees was here')
           width = self.max_x
           height = self.max_y
           self.rrees = [(random.randint(0, width), random.randint(0, height),
                         {"i": random.randint(0, 5000)}) for x in
                         range(0, random.randint(0, 57000))]
           self.counter = 0
       result = self.rrees[self.counter]
       self.counter += 1
       return {'i': result[2]}
   else:
       return "Foo"

Our solution re-set the random number generator's seed to "robert rees was here" so the resulting sequence of "random" numbers was deterministic (i.e. they would always be the same set of random-ish numbers - one of the reasons why it's more correct to call these sorts of things a pseudo-random number generator). Then we copied the code from Robert's unit test to return the result that the test expected. The outer if statement ensured that we return the correct result depending on different tests that set the board to different sizes - Robert's test set the dimensions to 4000x4000 whereas another, as was mentioned before, expected the result "Foo".

This is, of course, an abomination.

However, it allowed us to play our parts in the ruse that started our "show-and-tell" presentation. We explained we had carefully linked tests to requirements in Tim's README file and talked through the programming cycle I explained earlier to ensure everyone had a go at writing code. We even audited the unit tests in front of the assembled dojo and ran the test suite so people could see that everything was working in accordance with the tests. It was only until we revealed the code for the actual Board class that the joke was discovered.

And so we get to the point of this blog post: test driven development is promoted as the one true way to great code. Our activity in the dojo shows that this is not the case. Although we were nefarious in our intent a similar result (abominable code) could easily be produced through inexperience, laziness or lack of care and attention while still doing TDD.

The "test-driven" dogma is dangerous: it claims to improve the quality of code so long as you follow its modus operandi. At best this fosters a false sense of security, at worse it's simply a cargo cult.

I think having well tested code is a useful and often essential element of great software. I'm not saying "you shouldn't test code!" (you most definitely should). Rather, I'm concerned that TDD apologists do more harm than good by promoting a cargo cult. Testing software is a good thing and sometimes writing tests before implementing a feature is the right thing to do. But this is simply evidence of the most important element that leads to great code - an element that, unfortunately, is not mentioned by TDD - the wisdom of the programmer.

What do I mean?

I mean the state where knowledge, experience and trust coincide with the understanding and confidence to do the "right thing" rather than follow a list of precepts.