# Easy Proof Omitted

A sink for random thoughts & experiments

0%

Python has been always the ultimate glue language in various fields due to it's intuitive syntax and a huge community that support different high-level tools and libraries. But it's no secret that python is extremely slow compared to compiled languages like C.

Luckily, Python can easily interface with low-level languages to gain the best of both worlds. This post will show how to find and elevate some of the performance bottlenecks in your code.

In mathematics, the Fibonacci numbers, commonly denoted $$F_n$$, form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is,

\begin{align*} &\quad F_0=0, F_1= 1,\\ &F_n=F_{n-1} + F_{n-2}, &&\textbf{n} > 1 \end{align*}

The beginning of the sequence is thus: $$0,\;1,\;1,\;2,\;3,\;5,\;8,\;13,\;21,\;34,\;55,\;89,\;144,\; \ldots$$

Fibonacci numberWikipedia

For the demonstration purposes , A grossly inefficient implementation of Fibonacci sequence will be used as a toy example:

# Identify bottlenecks

Python has a couple of built-in profilers that can be used to benchmark the performance of your code: profile and cProfile. profile is written in pure python which in return adds a significant overhead and wastes more time than cProfile, which is a C extension with minimal overhead.

To run cProfile on your script, simply type:

To calculate the 40th fibonacci number in the sequence, the script spent 76.602 seconds in f(x) strongly hinting that this causes a serious bottleneck. 1

What can we do about it?

# Cython

Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.

In a nutshell, Cython will transpile our python code to C then compile it into a shared object.

First, Write the function in a new file fib.pyx providing as much type hints as possible

Create a new file setup.py with the following:

Finally run setup.py with the following arguments:

The previous step generated the required shared object and the necessary interfaces that integrates seamlessly with python using a regular import statement.

re-run cProfile

Went down from 76.602 seconds to 4.156 almost 18x speed-up.

Although this approach is much easier, but the generated C code is part of large generic boilerplate and results in a noticeable overhead. Therefore, it's highly recommended to provide as much type hints as possible before compilation process.

# ctypes

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

First, Rewrite the performance critical functions in C or C++.

extern "C" tells the compiler to avoid mangling the function name. which will work with simple function prototypes, but one-to-one mapping between C++ classes and python classes is not possible without an interface library like pybind11.

Then compile it into a shared object:

Lastly we can call any function from the shared object inside python:

Running cProfile again yields a few extra system calls that loads the shared object:

We went from 76.602 seconds to 0.612 almost 125x speed-up!

1. obviously since it's the only function.↩︎