Profiling Python Code

1 minute read

One of my projects at work recently has been transitioning one of our pipelines to use a new Python class. Upon testing it out, I discovered that the new class is much slower than the old one. But the reason why wasn’t immediately obvious.

So, I looked into profiling the code. A profile is a set of statistics that describes how often and for how long various parts of the program executed.

I first tried cProfile, which returns stats on:

  • The total number of function calls
  • The total number of primitive (not induced by recursion) function calls
  • The number of calls per function
  • The time spent per function

cProfile can be called in-line on an individual function using simple syntax:

import cProfile

cProfile.run('2 + 2')
# or
cProfile.run('func_to_profile()')

Or in the command line for an entire script:

python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)

The results can be printed, or formatted into reports via the pstats module. There are lots of other fancy options if you choose to use cProfile as a class, or use it with other packages to create nifty tree graph visualizations.

In my case it looked like the slowdown was largely because pandas was running many built-in methods. It was tricky to diagnose exactly where this was happening, though, so I turned to the line_profiler package instead, which gives execution time information on line-by-line.

from line_profiler import LineProfiler

lp = LineProfiler()
lp_wrapper = lp(func_to_profile)
lp_wrapper(func_args)
lp.print_stats()

This was useful, and I was able to fix some obvious slowdowns using this method.

Tags:

Updated: