Write Fewer Tests in Python

Sqlite has 1000X more code for testing than in actual source code! That’s phenomenal, but it’s common for companies to have 5X-10X more testing code. This seems like an area that programming languages and development tools should tackle. That is, think of ways to reduce the huge amount of extra code written for testing.

As an experiment, I wrote a simple Skiplist in Python 3.5 in a stream-of-consciousness burst of sloppy coding. I used pylint to catch many obvious mistakes. It’s about as good as any decent IDE. After some back and forth I had code that could be loaded into the interpreter.

I next used hypothesis to write tests using the unittest library. This tool is a Python implementation of QuickCheck, a brilliant library for automated property based testing. Consider the following code to test insertion into a skiplist:

def test_insert_integers(self, nums):
    sorted_nums = sorted(nums)
    self.assertEqual(0, len(self.skip))

    for i in nums:

    for i in sorted_nums:
        self.assertIn(i, self.skip)
    for i in self.skip:
        self.assertIn(i, sorted_nums)

    self.assertEqual(len(nums), len(self.skip))

The given decorator tells hypothesis to generate random lists of integers to feed into this function. The function inserts all elements into the skiplist and asserts some properties. hypothesis was able to find lots of interesting bugs in my code by printing a small example that caused the error. Unfortunately, hypothesis doesn’t play well with unittest because this single test method actually runs lots of tests. So it doesn’t run setUp and tearDown correctly. But that’s easy to get around for now.

The next step was to use Python 3.5 type annotations and mypy to do static type checking. You can either write unittests to check all this, or let the tools handle it for you. The type annotations are a bit verbose, and the syntax for adding type declarations for fields is atrocious. Nevertheless, it works really well and caught a few corner cases in untested code. There is a problem, though. I can’t find a way to add constraints on generic types. In my code, I want to say the generic type T must support the __lt__ operator. Right now it seems to work somehow.

Finally, I used PyContracts to write design-by-contract style code. It allows one to write additional constraints on your code. For example, in choosing how many levels a skiplist has, I added a constraint to the constructor @contract(max_level='>0') which verifies the input. I didn’t see many opportunities to add contracts to my code because it’s supposed to allow nearly anything, like a list. While contracts in .NET are fantastic, PyContracts are good enough. Though it needs to play better with 3.5’s typing syntax.

Despite all this I found a bug that could only be discovered by a code review. Searching through a skiplist is supposed to be O(lg n). However, I had failed to begin each level’s search where the previous level left off. So my search was correct, but O(n). How could a test discover bugs that give correct results? It’s really a performance issue that would only be revealed for large N.

Overall using types, contracts and hypothesis seems to catch quite a few errors. The only minor issue is it would be nice if all these things cooperated better. For example, if I state the type of a function parameter then both the contracts and hypothesis should use this information. if I add a contract, then hypothesis should use that to craft random input more efficiently. What techniques could I add next to better test my code?