Auditing popular Rust crates: how a one-line unsafe has nearly ruined everything

Sergey "Shnatsel" Davidoff on 2018-07-19

Following the actix-web incident (which is fixed now, at least mostly) I decided to poke other popular Rust libraries and see what comes of it.

The good news is I’ve poked at 6 popular crates now, and I’ve got not a single actually exploitable vulnerability. I am impressed. When I poked popular C libraries a few years ago it quickly ended in tears. The bad news is I’ve found one instance that was not a security vulnerability by sheer luck, plus a whole slew of denial-of-service bugs. And I can’t fix all of them by myself. Read on to find out how I did it, and how you can help!

My workflow was roughly like this:

  1. See if the crate has been fuzzed yet to identify low-hanging fruit.
  2. If it has been fuzzed, check sanity of fuzzing harness.
  3. If something is amiss, fuzz the crate.
  4. In case fuzzing turns up no bugs, eyeball the unsafes and try to check them for memory errors.
  5. If no horrific memory errors turn up, try to replace whatever’s under unsafe with safe code without sacrificing performance.

Turns out Rust community is awesome and not only has excellent integration for all three practical fuzzers along with a quick start guide for each, but also a huge collection of fuzz targets that covers a great deal of popular crates. Ack! Getting low-hanging fruit at step 1 is foiled!

So I’ve started checking whether fuzzing targets were written properly. Specifically, I’ve started looking for stuff that could block fuzzing — like checksums. A lot of formats have them internally, and PNG has not one but two — crc32 in png format and adler32 in deflate. And lo and behold, none of the crates were actually disabling checksums when fuzzing! This means that random input from fuzzer was rejected early (random data does not have a valid checksum in it, duh) and never actually reached the interesting decoding bits. So I’ve opened PRs for disabling checksums during fuzzing in miniz_oxide, png, lodepng-rust, and ogg, and then fuzzed them with checksums disabled. This got me:

inflate crate was the first where fuzzing has turned up nothing at all, so I've started eyeballing its unsafes and trying to rewrite them into safe code. I've added a benchmarking harness and started measuring whether reverting back to safe code hurts performance. cargo bench was too noisy, but I've quickly discovered criterion which got me the precision I needed (did I mention Rust tooling is awesome?). I got lucky - there were two unsafes with two-line safe equivalent commented out, and reverting back to safe code created no measurable performance difference. Apparently the compiler got smarter since that code was written, so I've just reverted back to safe code.

This left just one unsafe with a single line in it. Spot the security vulnerability. I would have missed it if the crate maintainer hadn’t pointed it out. If you can’t, there are hints at the end of this post.

By sheer luck the rest of the crate just so happens to be structured in a way that never passes input parameters that trigger the vulnerability, so it is not really exploitable. Probably. I could not find a way to exploit it, and the crate maintainer assures me it’s fine. Perhaps we just haven’t figured out how to do it yet. After all, almost everything is exploitable if you try hard enough.

Sadly, simply replacing the unsafe .set_len() with .resize() regressed the decompression performance by 10%, so instead I've added an extra check preventing this particular exploit from happening, and then liberally sprinkled the function with asserts that panic on every other way this unsafe could go wrong that I could think of.

Is the function secure now? Well, maybe. Maybe not. Unless we either rewrite it in safe rust (or prove its correctness, which is a lot harder) we will never know.

The thing is, I’m pretty sure it’s possible to rewrite this in safe Rust without performance penalty. I’ve tried some local optimizations briefly, to no avail. Just like with high-level languages, writing fast safe Rust requires staying on the optimizer’s happy paths, and I have not found any documentation or tooling for doing that. https://godbolt.org/ that lets you inspect the LLVM IR as well as assembler and shows what line of Rust turned into what line of assembly, but you can’t feed your entire project to it. (As pointed out in comments, cargo-asm can do that to an entire project). And you also need tools to understand why a certain optimization was not applied by rustc. LLVM flags -Rpass-missed and -Rpass-analysis seem to be capable of doing that, but there is literally no documentation on them in conjunction with Rust.

Discussing the vulnerability further would be spoilerrific (seriously, try to locate it yourself), so I’ll leave further technical discussion until the end of the post. I want to say that I was very satisfied with how the crate maintainer reacted to the potential vulnerability — he seemed to take it seriously and investigated it promptly. Coming from C ecosystem it is refreshing to be taken seriously when you point out those things.

By contrast, nobody seems to care about denial of service vulnerabilities. In the 3 crates I’ve reported such vulnerabilities for, after 3 weeks not a single one was investigated or fixed by maintainers of those crates, or anyone else really. And the DoS bugs are not limited to panics that you can just isolate into another thread and forget about.

After not getting any reaction from crate maintainers for a while I tried fixing those bugs myself, starting with the png crate. In stark contrast to C, it is surprisingly easy to jump into an existing Rust codebase and start hacking on it, even if it does rather involved things like PNG parsing. I've fixed all the panics that fuzzers discovered based on nothing but debug mode backtraces, and I don't even know Rust all that well. Also, this is why there are 4 distinct panics listed for PNG crate: I've fixed one and kept fuzzing until I discovered the next one. lewton probably has many more panics in it, I just didn't got beyond the first one. Sadly, three weeks later my PR is still not merged, reinforcing the theme of "nobody cares about denial of service". And png still has a much nastier DoS bug that cannot be isolated in a thread.

(To be clear, this is not meant as bashing any particular person or team; there may be perfectly valid reasons for why it is so. But this does seem to be the trend throughout the ecosystem, and I needed some examples to illustrate it).

Also, shoutout to tungstenite — it was the only crate that did not exhibit any kinds of bugs when being fuzzed for the first time. Kudos.

Conclusions:

Originally I thought this would be a fun exercise for a few weekends, but the scope of the work quickly grew way beyond what I can hope to achieve alone. This is where you come in, though! Here’s a list of things you can try, in addition to the hard tooling tasks listed above:

  1. Fuzz all the things! It takes 15 minutes to set up per crate, there is no reason not to. Also, there is a trophy case.
  2. Fix bugs already discovered. For example: panic in lewton (easy), unbounded memory consumption in png (intermediate), lodepng memory leak (C-hard). You can also fuzz lewton afterwards to get more panics, just don’t forget to use ogg dependency from git. You can reuse my fuzz harnesses if you wish.
  3. Refactor unsafes in popular crates into safe code, ideally without sacrificing performance. For example, inflate crate has just one unsafe block remaining, png has two. There are many more crates like that out there.
  4. There are easy tasks on docs and tooling too: AFL.rs documentation is outdated and describes only version 0.3. Version 0.4 has added in-process fuzzing that’s ~10x faster, it needs to be mentioned. Also, AFL could use more Rusty integration with Cargo, closer to what cargo-fuzz does. Also, disabling checksums is a common pitfall that needs to be mentioned.

I’d love to keep fixing all the things, but at least in the coming month I will not able to dedicate any time to the project. I hope I’ve managed to at least lead by example.

And now, details on that vulnerability! If you haven’t found it yourself, here’s a hint: similar bugs in C libraries.

If you still haven’t found it, see the fix.

Spoilerrific discussion of the vulnerability below.

Vulnerable code from git history for reference

The function run_len_dist() does a fairly trivial thing: resizes a vector to fit a specified amount of data and copies data from element i to element i+dist until i+dist hits the end of the vector. For performance, contents of the vector are not initialized to zeroes when resizing, as it would have been done by vec.resize(); instead, vec.set_len() is used, creating a vector with a number of elements set to uninitialized memory at the end.

The function never checks that dist is not zero. Indeed, if you call it with dist set to 0, it will simply read uninitialized memory and write it right back, exposing memory contents in the output.

If this vulnerability were actually exploitable from the external API (which it isn’t, probably), inflate would have output contents of uninitialized memory in the decompressed output. inflate crate is used in png crate to decompress PNGs. So if png crate was used in a web browser (e.g. servo) to decode images, an attacker could pass a crafted PNG to the client, then read the decoded image using javascript. This lets the attacker read memory contents from the browser - cookies, passwords, you name it. This is not quite as bad as Heartbleed or Meltdown, but it's up there.

Sadly, regular fuzzing would not have discovered this vulnerability. If it were actually exploitable, at least one way to trigger it would involve setting several distinct bytes in the input to very specific values. And even the best current generation fuzzers cannot trigger any behavior that requires changing more than one byte simultaneously, except in rare cases or if you explicitly tell what consecutive byte strings it should try. And there is nothing in the code that would guide the fuzzers to these specific values.

Even if fuzzers did discover such an input by random chance, they would not have recognized it as a vulnerability, unless you do either of these things:

This just goes to show that fuzzing unsafe code does not actually guarantee absence of bugs.

Safe Rust, however, does guarantee absence of memory errors that lead to arbitrary code execution exploits and other unspeakable horrors. So let’s use it.

Join the discussion this article on Reddit