Debugging Python Code

Learn some of the most common debugging techniques in this tutorial by Fabrizio Romano, an author, computer scientist, Reiki master, and teacher.

In the life of a professional coder, debugging and troubleshooting take up a significant amount of time. Even if you work on the most beautiful codebase ever written by man, there will still be bugs in it. This article will take you through some of the most common debugging techniques; however, this list is not exhaustive.

Debugging with print

This is probably the easiest technique of all. It’s not very effective, it cannot be used everywhere, and it requires access to both the source code and a terminal that will run it (and therefore show the results of the print function calls).

However, in many situations, this is still a quick and useful way to debug. For example, if you are developing a Django website and what happens on a page is not what would you expect; you can fill the view with prints and keep an eye on the console while you reload the page.

Debugging with a custom function

Having a custom function in a snippet that you can quickly grab and paste into the code and then use to debug can be very useful. The important thing is to code it in a way that it won’t leave stuff around when you eventually remove the calls and its definition; therefore it’s important to code it in a way that is completely self-contained.

Another good reason for this requirement is that it will avoid potential name clashes with the rest of the code. Here’s an example of such a function—custom.py:

def debug(*msg, print_separator=True):
    print(*msg)
    if print_separator:
        print('-' * 40)

debug('Data is ...')
debug('Different', 'Strings', 'Are not a problem')
debug('After while loop', print_separator=False)

In this case, a keyword-only argument is used to be able to print a separator, which is a line of 40 dashes.

The function is very simple, you redirect whatever is in msg to a call to print and, if print_separator is True, you print a line separator. Running the code will show the following:

$ python custom.py 
Data is ...
----------------------------------------
Different Strings Are not a problem
----------------------------------------
After while loop

As you can see, there is no separator after the last line. This is just one easy way to somehow augment a simple call to the print function. Now, see how you can calculate a time difference between calls, using one of Python’s tricky features to your advantage—custom_timestamp.py:

from time import sleep

def debug(*msg, timestamp=[None]):
    print(*msg)
    from time import time  # local import
    if timestamp[0] is None:
        timestamp[0] = time()  #1
    else:
        now = time()
        print(' Time elapsed: {:.3f}s'.format(
            now - timestamp[0]))
        timestamp[0] = now  #2

debug('Entering nasty piece of code...')
sleep(.3)
debug('First step done.')
sleep(.5)
debug('Second step done.')

This is a bit trickier, but still quite simple. First, you import the time function from the time module in the debug function. This allows you to avoid having to add that import outside of the function and maybe forget it there.

Take a look at how timestamp is defined. It’s a list, but it is also a mutable object. This means that it will be set up when Python parses the function, and it will retain its value throughout different calls. Therefore, if you put a timestamp in it after each call, you can keep track of time without having to use an external global variable.

After printing the message you want to print and the importing time, you can then inspect the content of the only item in timestamp. If it is None, you have no previous reference, therefore you set the value to the current time (#1).

On the other hand, if you have a previous reference, you can calculate the difference and insert the current time again in timestamp (#2). Running this code shows this result:

$ python custom_timestamp.py 
Entering nasty piece of code...
First step done.
 Time elapsed: 0.300s
Second step done.
 Time elapsed: 0.501s

Inspecting a traceback

A traceback gives you information about what happened in your application that went wrong. You get great help from reading it. Here’s a very small example—traceback_simple.py:

d = {'some': 'key'}
key = 'some-other'
print(d[key])

You have a dict and you tried to access a key which isn’t in it. You should remember that this will raise a KeyError exception. Now run the code:

$ python traceback_simple.py 
Traceback (most recent call last):
  File "traceback_simple.py", line 3, in <module>
    print(d[key])
KeyError: 'some-other'

You can see that you get all the information you need: the module name, the line that caused the error (both the number and the instruction), and the error itself. With this information, you can go back to the source code and try and understand what’s going wrong.

Now create a more interesting example that builds on this and exercises a feature that is only available in Python 3. Imagine that you’re validating a dict, working on mandatory fields; therefore you expect them to be there. If not, you need to raise a custom ValidationError that you’ll trap further upstream in the process that runs the validator (which is not shown here, it could be anything, really). It should be something like this—traceback_validator.py:

class ValidatorError(Exception):
    """Raised when accessing a dict results in KeyError. """

d = {'some': 'key'}
mandatory_key = 'some-other'
try:
    print(d[mandatory_key])
except KeyError:
    raise ValidatorError(
        '`{}` not found in d.'.format(mandatory_key))

You need to define a custom exception that is raised when the mandatory key isn’t there. Note that its body consists of its documentation string, so you don’t need to add any other statements.

Simply put, you define a dummy dict and try to access it using mandatory_key. You trap KeyError and raise ValidatorError when that happens. The purpose of doing this is that you may also want to raise ValidatorError in other circumstances, not necessarily as a consequence of a mandatory key is missing. This technique allows you to run the validation in a simple try/except that only cares about ValidatorError.

The thing is, in Python 2, this code will only display the last exception (ValidatorError), which means that you would lose the information about the KeyError that precedes it. In Python 3, this behavior has changed, and exceptions are now chained so that you have a much better information report when something happens. The code produces this result:

$ python traceback_validator.py 
Traceback (most recent call last):
  File "traceback_validator.py", line 7, in <module>
    print(d[mandatory_key])
KeyError: 'some-other'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "traceback_validator.py", line 10, in <module>
    '`{}` not found in d.'.format(mandatory_key))
__main__.ValidatorError: `some-other` not found in d.

This is brilliant because you can see the traceback of the exception that led you to raise ValidationError, as well as the traceback for the ValidationError itself.

Using the Python debugger

Another very effective way of debugging Python is to use the Python debugger: pdb. If you are addicted to the IPython console, you should definitely check out the ipdb library. ipdb augments the standard pdb interface like IPython does with the Python console.

There are several different ways of using this debugger, but the most common one consists of simply setting a breakpoint and running the code. When Python reaches the breakpoint, execution is suspended and you get console access to that point so that you can inspect all the names and so on. You can also alter data on the fly to change the flow of the program.

As a toy example, pretend you have a parser that is raising a KeyError because a key is missing in a dict. The dict is from a JSON payload that you can’t control, and you want, for the time being, to cheat and pass that control, since you’re interested in what comes afterward. See how you could intercept this moment, inspect the data, fix it, and get to the bottom, with ipdb—ipdebugger.py:

# d comes from a JSON payload we don't control
d = {'first': 'v1', 'second': 'v2', 'fourth': 'v4'}
# keys also comes from a JSON payload we don't control
keys = ('first', 'second', 'third', 'fourth')

def do_something_with_value(value):
    print(value)

for key in keys:
    do_something_with_value(d[key])

print('Validation done.')

As you can see, this code will break when key gets the value third, which is missing in the dict. Remember, you’re pretending that both d and keys come dynamically from a JSON payload that you don’t control, so you need to inspect them in order to fix d and pass the for loop. If you run the code as it is, you’ll get the following:

$ python ipdebugger.py 
v1
v2
Traceback (most recent call last):
  File "ipdebugger.py", line 10, in <module>
    do_something_with_value(d[key])
KeyError: 'third'

So you see that the key is missing from the dict, but every time you run this code, you may get a different dict or keys tuple. Hence, this information doesn’t really help you. Now inject a call to ipdb—ipdebugger_ipdb.py:

# d comes from a JSON payload we don't control
d = {'first': 'v1', 'second': 'v2', 'fourth': 'v4'}
# keys also comes from a JSON payload we don't control
keys = ('first', 'second', 'third', 'fourth')

def do_something_with_value(value):
    print(value)

import ipdb
ipdb.set_trace()  # we place a breakpoint here

for key in keys:
    do_something_with_value(d[key])

print('Validation done.')

If you now run this code, things get interesting (note that your output may vary a little):

$ python ipdebugger_ipdb.py
> /home/fab/srv/l.p/ch11/ipdebugger_ipdb.py(12)<module>()
     11 
---> 12 for key in keys:  # this is where the breakpoint comes
     13     do_something_with_value(d[key])

ipdb> keys  # let's inspect the keys tuple
('first', 'second', 'third', 'fourth')
ipdb> !d.keys()  # now the keys of d
dict_keys(['first', 'fourth', 'second'])  # we miss 'third'
ipdb> !d['third'] = 'something dark side...'  # let's put it in
ipdb> c  # ... and continue
v1
v2
something dark side...
v4
Validation done.

First, when you reach a breakpoint, you’re served a console that tells you where you are (the Python module) and the line that will be executed next.

You can, at this point, perform a bunch of exploratory actions, such as inspecting the code before and after the next line, printing a stacktrace, and interacting with the objects. In this case, first, inspect the keys tuple. After this, you should inspect the keys of d.

Have you noticed the exclamation mark prepended to d? It’s needed because d is a command in the pdb interface that moves the frame (d)own. In order to tell pdb that you’re not yielding a (d)own command, you put “!” in front of d.

After seeing the keys of d, you can see that third is missing, so you can put it in yourself. Finally, now that all the keys are in, type c, which means (c)ontinue.

pdb also gives you the ability to proceed with your code one line at a time using (n)ext, to (s)tep into a function for deeper analysis, or handling breaks with (b)reak. For a complete list of commands, type (h)elp in the console.

You can see from the output that you can finally get to the end of the validation.

In the Python official documentation, there is a section dedicated to debugging and profiling, where you can read up about the bdb debugger framework, and about modules such as faulthandlertimeittracetracemallock, and of course pdb. Head to the standard library section in the documentation and you’ll find all this information very easily.