By Eric Burden | April 7, 2021

I’ve recently (since the beginning of 2021) been trying my hand learning and using Rust, and so far it has been a really good experience. Rust has a lot to recommend it, including top-notch tooling, inherent memory safety, and blazing speed. That last part comes from the fact that Rust is a compiled, systems programming language and was the inspiration for picking up Rust in the first place.

You see, I absolutely love R. In my opinion (admittedly biased), there is no
better set of tools available in any programming language for working directly
with data sets. And, if some other language does have a tool you need, R even
has support for leveraging those tools as well.^{1}.
The R ecosystem contains some truly excellent packages (especially
rmarkdown, knitr,
and shiny) for making the results of your analysis
available and approachable to others. Where R excels is in *ad hoc* data manipulation,
analysis, and reporting. Where R struggles (like other garbage-collected,
interpreted languages) is with regards to performance, especially in
high-throughput, large-scale settings. A major offset for these performance
penalties comes from leveraging functionality written in a lower-level language
like C, C++^{2}, Java^{3}, FORTRAN, etc.
Recently, Rust has begun to make headway in this space with the advent of the
extendr project.

Rust is a very attractive option, especially for developers and analysts who may
not be fluent in a systems programming language already. Rust and the Rust compiler
are designed such that memory-safety is enforced by default. From a practical
perspective, this means you are **much** less likely to get random, hard-to-diagnose
crashes in Rust code (as compared to clumsy C/C++) and are therefore much **more**
likely to be able to successfully leverage Rust, even if you are not an experienced
Rust developer. Personally, I also find the tooling around Rust to be much more
approachable that Java (Interminably nested directories and complicated build
files, anyone? No, thank you.) With `extendr`

, we now have a straightforward
path for writing Rust code and accessing it in our R packages. In order to explore
this strategy, I’ve written an R package I’m calling
rustbind, which is intended to be
something of a showcase and sample implementation for calling Rust functions
from R code. Let’s walk through just how that works.

## Adventures in Parallel Processing

To demonstrate the sorts of benefits (particularly speed gains) that can be found
by falling back to Rust code from R, we will tackle a problem that is difficult
to handle natively in R: parallelism. To do so, I’ve set up a somewhat artificial
benchmark that, while overly simplistic, can be representative of real problems
that benefit for parallel processing. That problem is calculating square roots
using the Babylonian Method.
The naive approach involves (for a given number `\(N\)`

):

- Guessing an initial value
`\(x_0\)`

. - Improving the guess by applying the formula
`\(x_1 = \frac{(x_0 + N/x_0)}{2}\)`

. This new`\(x_1\)`

is*closer*to`\(\sqrt{N}\)`

. - Iterate on step 2 until the difference between
`\(x_{n+1}\)`

and`\(x_n\)`

is small enough to satisfy you that your estimate is close enough.

Calculating square roots is deceptively computationally expensive (not as expensive
as cryptographic hashing, of course), especially if your margin of error is small.
Now, for a single value, the time taken to calculate a square root using this
algorithm is essentially negligible. However, if you need to perform this
operation on one **million** values, then it can become time consuming. In general,
in R, we rely on vectorization to speed up these kinds of calculations, but there
are cases in which this strategy is difficult or impossible to implement (processing
a data set where the calculation for one record depends on a value in a different
record, say). Implementing this algorithm in R and Rust, respectively, looks like
this:

### Naive Square Root Algorithm (R)

```
# R Code
naive_sqrt <- function(n) {
if (length(n) > 1) { stop("Vectors longer than one not supported") }
current_guess <- n
adjustment <- 1
error <- 0.000001
while (current_guess - adjustment > error) {
current_guess <- (current_guess + adjustment) / 2
adjustment <- n / current_guess
}
current_guess
}
```

Since this algorithm doesn’t support vectorized operations, let’s add a friendly error message.

### Naive Square Root Algorithm (Rust)

```
// Rust Code
fn naive_sqrt(n: &f64) -> f64 {
let mut current_guess = *n;
let mut adjustment = 1.0;
let error = 0.000001;
while current_guess - adjustment > error {
current_guess = (current_guess + adjustment) / 2.0;
adjustment = n / current_guess;
}
current_guess
}
```

So, you can see, these two algorithms are functionally (and almost lexically) identical.

### Parallelizing in R

For comparison, we will use two versions of the R function that can operate on
a vector of doubles and return their square roots. This first one is a simple
wrapper around `sapply`

:

```
# R Code
sapply_naive_sqrt <- function(n) {
sapply(n, naive_sqrt)
}
```

The parallel version is an almost equally simple wrapper around `future_sapply`

from the `future.apply`

package:

```
# R Code
future_apply_naive_sqrt <- function(n) {
future.apply::future_sapply(n, naive_sqrt)
}
```

Astute readers (and experienced users of `future.apply`

) will note that we’re not
tweaking the settings here, which could probably improve performance. If
you’re an R programmer, this is pretty straightforward code. Don’t worry, it gets
worse…

### Parallelizing in Rust

Since we’re ostensibly interested in Rust for multithreading (for this example, anyway), let’s skip over implementing the sequential operation in Rust and go straight for threads:

```
// Rust Code
fn multithreaded_naive_sqrt(floats: &[f64]) -> Vec<f64> {
// Calculate the number of threads we want to use, giving us the size of the
// chunks to break our input 'vector' into.
let thread_count = match floats.len() {
0 => return Vec::new(),
1..=1_000 => 1,
1_001..=10_000_000 => floats.len() / 1_000,
_ => 10_000,
};
let mut threads = Vec::with_capacity(thread_count);
let chunk_size = (floats.len() / thread_count) + 1;
// For each chunk of numbers (Rust calls them floats, R calls them doubles)...
for chunk in floats.chunks(chunk_size) {
// Pass that chunk to a thread, iterate over the items in the chunk,
// calculate the square root of each, return the result.
let chunk = chunk.to_vec();
let thread = std::thread::spawn(move || {
let sqrts: Vec<_> = chunk.iter().map(naive_sqrt).collect();
sqrts
});
threads.push(thread); // Keep our thread handles in a list
}
// Gather up all the results from all the threads, take the results from
// each thread, put them all in the output variable
let mut output: Vec<f64> = Vec::with_capacity(floats.len());
for thread in threads {
match thread.join() {
Ok(squared_floats) => output.extend(&squared_floats),
Err(e) => std::panic::resume_unwind(e),
}
}
output
}
```

That’s quite a bit more code, but we are implementing parallel processing using
only the standard library, so we can feel pretty good about it. Other than the
thread spawning, if you’re passingly familiar with Rust, you shouldn’t have any
trouble parsing this function. Now, as with most hard problems, it’s probably
better to let someone else solve it for you. There is a Rust crate,
`rayon`

, that provides a number of
functions that allow iterating over a vector in Rust in parallel fashion, with
much less work involved.

```
// Rust Code
fn rayon_naive_sqrt(floats: &[f64]) -> Vec<f64> {
floats.par_iter().map(naive_sqrt).collect()
}
```

That `par_iter()`

function provides a parallelized iterator, and the rest of it
should look pretty familiar. Clearly, this is just as easy to write as the
relevant R operation.

## Speed Test

Equipped with our algorithms and functions, we can compare performance. I am
skipping over the implementation details of actually getting to those Rust
functions from the R interface, that’s pretty well covered in the
README of my `rustbind`

GitHub
repository (`extendr`

makes it relatively easy). Suffice it to say, I am
calling all the relevant functions from the R interface and behchmarking
using the `rbenchmark`

package. The following code creates a sample data set of one million random
numbers, passes that dataset to each of the four named functions 100 times, and
calculates the average run time for each.

```
# R Code
test_data <- runif(1000000) * 1000000
rbenchmark::benchmark(
r_sapply = rustbind::sapply_naive_sqrt(test_data),
r_future_sapply = rustbind::future_apply_naive_sqrt(test_data),
rust_multithreaded = rustbind::multithreaded_naive_sqrt(test_data),
rust_rayon = rustbind::rayon_naive_sqrt(test_data)
)
```

So, what’s the difference? Pretty significant…

```
test replications elapsed relative user.self
2 r_future_sapply 100 402.377 366.464 400.041
1 r_sapply 100 326.108 297.002 324.171
3 rust_multithreaded 100 2.975 2.709 10.329
4 rust_rayon 100 1.098 1.000 10.223
```

There are other metrics we could look at here, but let’s focus on that ‘relative’
column. Our baseline is ‘rust_rayon’ with a value of ‘1.000’. ‘rust_multithreaded’ was
~2.7 times *slower*, which makes sense, there were definitely some improvements that
could be made to my hand-coded attempt at multithreading. Comparing to the R functions,
though, is where it gets really interesting. The sequential R operation ‘r_sapply’
took almost **300** times as long to run as ‘rust_rayon’, and the parallel operation
in R using `future.apply`

took ~**366** times as long to run. This makes the
parallel operation in R *slower* than the sequential operation. A bit of Googling
around showed me that this was not an uncommon finding, and while there are certainly
settings that can be tweaked to improve the `future.apply`

performance, doing so
seems *fiddly* and likely to incur unanticipated costs. Speaking of incurring costs,
consider how much more responsive a `plumbr`

API backed by expensive operations
in Rust could be… That translates to the potential for using smaller instances
in cloud environments, which can mean substantial cost savings at scale.

## Wrap-Up

So, as you can see, Rust runs *significantly* faster that R. This shouldn’t be at
all surprising. What might be surprising is (1) how much more approachable Rust is
as a systems language for implementing R functionality than C/C++/Java and (2) how
much help `extendr`

gives you in calling Rust functions from R packages. If you’re
looking for ways to speed up your R code, and you’ve been considering C/C++, I
encourage you to give Rust a second look. It could be well worth your time.