Unlocking Rust to Build a Better Python Lib

Over the last 5 years there has been an explosion in the popularity of the Rust programming language. Touted as the language that “took all the bad parts of its predecessors and got rid of them, but kept some of the good parts”. It focused on memory safety and speed, while also providing a readable language that was easy to understand. Rust is seen as one of the most admired programming languages, with many developers wanting to use it in their day jobs.

🔍 Click to enlarge image

Source: https://survey.stackoverflow.co/2023/#technology-admired-and-desired

With the proof in the proverbial pudding. Rust is a fast and safe language, amongst other things. Rust has found its way into the Linux Kernel and other core computing systems. And of course, as we will discuss further in this article, the usage of Rust has found its way into many of the leading Python libraries, including Bytewax. But before we go there, let’s cover some background on how languages can interoperate and why Python, in particular, is exceptionally friendly to other languages.

How do programming languages talk to one another?

The most common way programming languages interface is via a foreign function interface or FFI. The main runtime calls the foreign function by using a bridge or binding layer that handles the communication between the two languages. This typically involves linking against shared libraries or using language-specific interoperability features. Python loves talking to other languages

Python is a very flexible language in terms of interfacing with existing libraries in other languages. This is a feature, not a bug :).

Guido, the creator of Python, documented many of his aspirations of Python as he developed it. To make it readable, like a previous language he worked on, ABC, and to make it appeal to unix/C hackers (https://www.python.org/doc/essays/foreword/). In fact, Python was written in C explicitly so it could interface with C libraries, but without the overhead of other less desirable traits of the C language. So it follows that over the years a lot of work was done to allow Python to interact with legacy C libraries and for Python objects to be manipulated from C libraries. Over many iterations, different tools and capabilities were developed to enhance Python’s performance through leveraging C/C++. Most commonly known is the work around Cython that was developed for the purpose of making writing C extensions for Python as easy as writing Python code and to allow developers to add static typing to Python code to improve performance.

Python has attracted users with the simplicity of the language and the readability as well as the power that came from using lower level languages like C and C++. The scientific community in particular was drawn to Python for the reason that you could interface with existing libraries written in C and for the fact that it was open source and much more generally applicable than the specialized tools previously available. As a result there has been a flywheel effect of Python becoming the defacto standard for scientific computing and the ability to leverage other programming languages for legacy reasons or performance reasons continues to drive adoption and increasing the number of interfaces to other languages.

Some examples:

Facebook’s Prophet - Uses Stan to provide incredibly fast forecast model fit and prediction
SciPy - Uses C and Fortran Libraries for scientific
Numpy - Uses C and Fortran Libraries for scientific
Pandas - Uses Cython extensions to include C data types and functions
Matplotlib - C++ backend
PyTorch - C++ Core
Tensorflow - C++ Backend
Scikit-learn - Cpython and C extensions as well as C libraries
Psycopg2 - C postgres client

This list could easily get unwieldy, the point being that Python exists to reach into other languages and the reason there is a Python client for almost anything is because it is so much easier to leverage other languages from Python than most.

Why Rust over C++ or C?

If there is already so much prior art on leveraging C++ and C from Python, why would we use Rust from Python?

Trendiness and Exciting

Rust is new and exciting. This alone can be an enticing reason to use a new language. If everyone is doing it, there must be something in it. The wisdom of the crowds? The latest hype train.

Performance

Rust is performant, more performant than many languages, but it might not be as performant as C++, so this might not be the reason. That being said, it is nearly as performant as Rust in most instances.

Developer Quality of Life.

The tradeoff a developer makes for the performance of C++ is it’s difficult syntax and manual memory allocation and clean up, which easily lead to headaches during development and in production. Rust has made decisions that make the headaches associated with c/c++ that improve the developer quality of life. There are many design decisions that impact this, some of the most often mentioned are:

The rust compiler and its typechecker help catch a lot of potentially dangerous and annoying errors early.
- Memory Safety without Garbage Collection.
- Ownership - Rust uses a concept of ownership where each value can only have a single owner at a time. This is to prevent common memory errors like dangling pointers, double frees and data races. This is enforced at compile time.
- Borrow checker: The borrow checker ensures that references don’t outlive the data that the represent. Mutable and immutable references must follow the borrowing rules.
Strong static typing
Concurrency Safety

Why would Rust and Python live well together?

We have already covered some of the background of how we ended up with modern day Python. Python is loved by many, it has long been one of the fastest growing languages and now sits at the top of the rankings in terms of most widely used as reported by GitHub.

🔍 Click to enlarge image Source: https://github.blog/news-insights/research/the-state-of-open-source-and-ai/

Personally, I was attracted to Python because it is fast to develop with, it’s flexible and it’s easy to read. But, it is also these same reasons why it is considered slow, dangerous and hard to maintain. For this reason, Python has historically leveraged other languages, notably C and C++ to outsource tasks for performance. But as we saw before, Rust solves some of the difficulties of C and C++ making it very appealing for Python developers looking for a speed boost. But might be wary of C or C++ and it also might be appealing to library authors looking to stop fighting with C memory allocation.

Comparing Python and Rust Practically

Let’s look at a somewhat contrived, but real-ish scenario where we can see the differences between Rust and Python. Python has been relatively optimized for IO, so the best way to look at where Rust gives us a boost is to look at doing some computations. This is especially relevant in the scientific python community where computations really add up to some drastic differences in performance.

For our example, let’s compute the recursive fibonacci sequence. Although we are computing an arbitrary number, this might be a function working in a graph or network to compute some value by recursing through it.

Our program will take a number and then compute the sum of the fibonacci sequence. Effectively computing the following.

0 + 1 + (0+1) + (1 + (0+1)) + ((0+1) + (1 + (0+1)))...

And produce the sum.

Python Fibonacci

import sys

def fib(n):
    if n <= 1:
        return n
    return fib(n - 1) + fib(n - 2)

def main():
    n = int(sys.argv[1])
    result = fib(n)
    print(f"fib({n}) = {result}")

if __name__ == "__main__":
    main()

Python Performance

Running this timed with a value of 35

$ /usr/bin/time -l python fib.py 35
fib(35) = 9227465
        1.13 real     	1.11 user     	0.01 sys
            11669504  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                4301  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  39  involuntary context switches
         26052090528  instructions retired
          4248837208  cycles elapsed
        7300160  peak memory footprint

Rust Fibonacci

use std::env;

fn fib(n: u64) -> u64 {
    if n <= 1 {
        n
    } else {
        fib(n - 1) + fib(n - 2)
    }
}

fn main() {
    let args: Vec<String> = env::args().collect();
    let n: u64 = args[1].parse().expect("Please provide a valid integer");
    let result = fib(n);
    println!("fib({}) = {}", n, result);
}

Rust Performance

First we need to compile

rustc -O fib.rs -o fib

And then run it with the same value as our Python script.

/usr/bin/time -l ./fib 35    	 
fib(35) = 9227465
        0.08 real     	0.04 user     	0.00 sys
             1310720  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                 178  page reclaims
                  13  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  24  involuntary context switches
           269957255  instructions retired
           103605348  cycles elapsed
              918080  peak memory footprint

What we can see comparing both the code itself and the output of our timed run are some of the reasons why developers have started writing some of their code in Rust. The Rust implementation was 14 times faster for our example and used 1/8th of the memory footprint.

So how do we take advantage of our Rust fibonacci function from Python? Enter PyOxide When interfacing with Rust from Python, the ingeniously named PyOxide or PyO3 is the leading project for Python developers to use with expansive capabilities as well as a fairly mature tooling for compiling the code to be used as a Python library.

PyO3 works in two ways, you can make a simple rust function exposed as a Python function, but you can also allow for a Python interpreter to be run from Rust. Cool, Right! Here is a simple example of how you would call a Rust function from Python.

Example of Python calling our Rust fibonacci function.

use pyo3::prelude::*;

#[pyfunction]
fn fib(n: u64) -> PyResult<u64> {
    Ok(fib_recursive(n))
}

/// Helper function to calculate Fibonacci number recursively.
fn fib_recursive(n: u64) -> u64 {
    if n <= 1 {
        n
    } else {
        fib_recursive(n - 1) + fib_recursive(n - 2)
    }
}

/// Python module for fibonacci.
#[pymodule]
fn fibonacci_pyo3(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(fib, m)?)?;
    Ok(())
}

And correspondingly the code to use the fibonacci module from Python.

import sys
import fibonacci_pyo3

def main():
    n = int(sys.argv[1])
    result = fibonacci_pyo3.fib(n)
    print(f"fib({n}) = {result}")

if __name__ == "__main__":
    main()

This is a fairly simple invocation of a Rust function from Python as a Python module, but we can do a lot more

These are the two ways we combine Rust with Python at Bytewax.

Building Bytewax

Bytewax is a stateful stream processor that we built. Our idea for the project came from our experience with machine learning where quite often you need to process streams of data into features that can be used by the model at inference time. A lot of those transformations to get those features involve stateful operators. A count of users in a window of time, the running average value of a sensor, normalized clicks over a duration, etc.

In 2022 when we were brainstorming how we could solve this without directing people to Flink or Spark, which we believed provided a subpar developer experience for data science and machine learning. We landed on the idea to use a dataflow processing engine called Timely Dataflow to provide a different experience. Timely Dataflow was written in Rust, it had a unique way of distributing work that lent itself to a nice developer experience and it was performant. Around this time, PyO3 was becoming popular too and we thought about how we could combine the technology to provide a Python native experience built-on Timely Dataflow.

We started by exposing the high-level Timely Dataflow Rust primitives in Python, but leaving most of the underlying code we developed to make the project more production ready in Rust. This was a great way to bring a Python native stream processor to the masses very quickly. It also was how we thought we could offer a more performant version of Bytewax by keeping many of the input and output connectors in Rust and the operators too.

Most of our users had a desire to customize operations and connectors and they valued this more than raw performance. We had already made some performance concessions around pickling objects and contending with the GIL. So we moved the API for inputs and outputs (connectors) and the core building blocks of operators into the Python layer so that users could build more complex dataflows. Below is a sketch of how the Bytewax layers work together.

🔍 Click to enlarge image

We have had great success using PyO3 to interface with Rust code and I would highly recommend checking out the library and trying out Rust yourself. That being said, there are still many Python gotchas that you need to be aware of developing a Python Library with Rust internals.

You still have to deal with the GIL! You have more fine grained control of the GIL, but you still need to manage it and not break anything. There are some performance wins with this ability to have more control of letting go and grabbing the GIL.
Python types != Rust types. Managing the interface requires handling types appropriately.
Handling stack traces. Hunting down errors and debugging user code is one of the most important parts of the developer experience. Python exceptions can get swallowed up and you need to make sure you are passing the right exception and the reason for the exception back to the user.
Distributing the software. Since the code is compiled, you need to build it for all the different versions of operating systems, architectures and Python versions. A user cannot easily build from source without Rust and the Maturin build tool. I once was leading a workshop on Google Colab Notebook and they change the distro over night and nothing worked for everyone in the session!
Modifying the underlying Rust code. In a similar vein to the point above, you can't easily modify the Rust code without building and releasing it yourself. The user will also have to know how to write Rust as well as work with PyO3, which is a greater barrier to contribution than plain Python code.

That's all for now!

Do Python's Rust: Reaching into Rust to Create a Better Python Library

How do programming languages talk to one another?

Why Rust over C++ or C?

Trendiness and Exciting

Performance

Developer Quality of Life.

Why would Rust and Python live well together?

Comparing Python and Rust Practically

Python Fibonacci

Python Performance

Rust Fibonacci

Rust Performance

Building Bytewax

Stay updated with our newsletter

Extend The Open Source Bytewax Library with Modules

Zander Matheson

The Rise of The Streaming Data Lakehouse

Other posts you may find interesting

Shift-Left Architecture with Bytewax for Real-Time Intelligence

Announcing the Bytewax Connector for S2!

🐝 +🔍 = ❤️ Bytewax and Lenses.io Integration Announcement