Property based testing — A practical approach in Python with Hypothesis and Pandas
Property based tests are a type of tests that check properties of our functions that should always be true. They allow us to define a group of inputs following some rules and check that property on them in a single test.
Categories: Coding best practices
Tags: testing code quality
What Property based tests are
Property based tests are a type of tests that check properties of our functions that should always be true. They allow us to define a group of inputs following some rules and check that property on them in a single test.
Let’s see how they work
Imagine that we want to test this function:
The function we want to test (implementation is not important for this article)
The function is going to delete the duplicates on the key_column
of the df DataFrame. The point is that if
there is a value duplicated, it should be solved checking the priority_column
. The priority_values
will tell us
the record we should choose.
Well, this function should have a kind of algorithm inside that will:
- find all the duplicates in the
key_column
- check the value they have in the
priority_column
- choose the one that has the highest priority value based of
priority_values
and drop the rest of them.
I am going to use Property Based Testing to check one property that the result must cover: it shouldn’t have duplicates in the key_column. Never. It does not matter witch are the inputs.
Let’s build our test!
Previous work: imports and definitions
I need to import some stuff from the Hypothesis library:
I will define some variables that I need:
The @given
decorator
This decorator allows us to define random variables based on a set of rules that we write using the Hypothesis notation. It’s very readable, so I guess that you will understand easily what I am defining here:
Basically I have expressed how I need a random DataFrame with 2 columns, both of string type. To define how the values should be I have used regex.
Let’s see how it works executing it directly in the Python console with the Hypothesis example()
method:
Example of our random DataFrame built using Hypothesis
Let’s execute it again:
Another execution of Hipothesis code
In the first execution, it has generated an 8-rows DataFrame with random values based on my regex. In the second execution, a 2-row DataFrame, coincidentally being both rows equal. As you can see, each execution will generate a random variable based on our rules, with varying length and different content.
The test with @given
decorator
This basic test runs our function passing the random DataFrame an checks that the result has no duplicates in the
key_column
:
When we execute it, the test is going to be launched 200 times with a different random variable in each execution. Well, it’s not really true that is exactly 200 times, you can read how many times will Hypothesis run your test article to understand it better.
We can put a print sentence in our code to see in the console how it works:
My test with some traces for the console
Obviously I cannot show here all the output, but this last screenshot should be enough to understand the idea.
Last traces in my console after launching my test
It is cool, isn’t it?
When to use them
These tests are a great option to ensure a basic (or not) property. They cover a lot of edge cases because they are going to execute a lot of instances with random values. Each time your CI/CD runs will generate a new set of random executions.
But, they have some limitations. I think that property-based tests complement normal unit tests, but they don’t replace them.
Firstly because in most situations you won’t be able to define a property that is sufficient to ensure that your
code works. For example, we can not easily define a property to ensure that our function is choosing the right row
in each case using the priority_values
, but we have been able to define a more basic property that we know must be
satisfied.
Another limitation is that when a Property Based Test fails it is not easy to understand what is happening because you need to investigate witch concrete edge case is failing. It is a very good suggestion to generate a unit test with the concrete case that caused the failure.
Conclusion
I encourage you to use them in your next development. Think about the small feature you are generating and about the properties it should cover. Design easy Property Based Tests to ensure them. Tough you think you are checking very simple things… Property Based Tests often surprise us.
If you don’t work with Python, find a library in your preferred language to develop Property Based Tests.