Negative Results

blog

gallery

contact

rss

Negative Results

A few months ago, I decided to try my hand at writing a little bit of multithreaded Rust, running in Web Workers. While I have not been met with success, I feel it is important to mention what I've done so others can save some time and sanity – plus, I think the story is at least mildly interesting. All code is open-source and all examples should be fairly easy to run.

Besides, if nothing else, you can take great schadenfreude in my hobbies.

Backstory

It's currently June 2023. In November 2020, I decided, hey, you know what would be cool?

A multithreaded falling-sand game.

So, naturally, the tools I choose to do this with are HTML5 for the front-end. It's easy to package, I just have to throw a couple files up on my web server and there we go. Then, for the core, I'll write the simulation in Rust, and run it on all the cores of my CPU with a shared-memory model. I know we've got atomics in JS and WASM now, so let's try to use it to build something simple. Heh. Heh, I tell you.

1^st Attempt - wasm-bindgen and Why FFI Is Not Your Friend For Speed

This approach attempted to use wasm-bindgen to read/write shared memory in the web-worker from WASM, generated from Rust, in a fairly naïeve manner. But why not just set the memory directly from WASM?

The Core Problem

Web assembly has one linear memory.¹ This is, by default, not shared. And if it is shared, all memory is shared - you can't just share a portion of it.

To share the memory, you also literally have to share the memory. Generally, you initialize your WASM in a web worker, the memory for the worker is allocated there. So, if we want to have shared memory, we must send it to the worker in a message and pass it in to whatever is compiling our WASM. This isn't, or at least wasn't in 2020, supported by wasm-bindgen as far as I could tell.² So, what do we do?

To get something working, I simply called back out to Javascript whenever I wanted to modify a value in the shared memory. The way this ended up working out was that, to read, say, the x-velocity of a falling particle of sand, I'd go something like Reflect::get_u32(Reflect::get(Reflect::get(world_obj, "velocity"), "x"), particle_index), which is a fairly direct translation of JS' world.particles.x[particle_index]. I don't think the idiom translates very well, especially including that this rust allocates two strings and a number as part of the FFI operation which then needs to get GC'd. I did try allocating the strings statically to speed things up, but it didn't speed things up that much. I could peg every core of my computer and hit, maybe, 5fps doing nothing. So this approach was, as suspected, hot garbage and FFI is slow when you're doing a few million operations a second.

In total, the render thread takes 128ms on Chrome and 59ms on Firefox to render out a 300x150 playfield. Aiming for a 120hz framerate, that gives us a budget of 8.3ms/frame, which is not particularly in the neighbourhood we need. Effectively all the time is spent creating and destroying our internal representation of a particle, which consists of world, thread_id, x, y, and maybe w/h if it references a real particle. The allocation and deallocation of one of these data structures takes about 30% of the processing time, and we usually end up creating a few of them as they're what we use to work with other particles as well. Another 20% of the time was spent reading data from the JS side of things, since we can't map the data we're working with in from the shared array buffer passed to the web-worker.

Rust interop with JS in this case has also proved rather awkward; while I'm sure it would work for other projects, for the sort of high-performance access we're looking at it's not suitable. Right now, WASM is more suited to the sort of workload where a little data is passed in to do a lot of work on, rather than a lot of data passed in to do a little work on.

One alternative might be to copy the raw memory in to the WASM process in the worker thread, thus avoiding the lookups. More sensibly, I think the best solution is just to avoid using WASM for this at all, and use Javascript or Typescript in the worker.

—Attempt 1's readme.md

Oh, and the kicker? Back on the main thread, it seems you can't paint shared array buffers directly to canvas - you need to copy them into a new ImageData() first, because ImageData will only accept non-shared array buffers. So our zero-copy goal is kinda hosed at this point, if we're being pure about it. Let's ignore that and continue on. It's certainly not an ill omen of things to come... right?

2^nd Attempt - Can't Read That Here

This was a fairly intense yet short-lived branch, because I ran up square against the core problem mentioned above. Diving in, this was when I figured out what was happening, and why I couldn't pass in a chunk of shared memory directly as I'd first assumed I could. Or, rather, I can pass it in, but I can't read it out.

The issue is that [multiple linear memories are] not a value which is represented in linear memory. That thing which Rust and C++ are based around. So it's kind of a new concept for them, and they just... don't support it, according to this GitHub issue from 2019.

—Attempt 2's readme.md

So, now we know what we're up against, what do?

3^rd Attempt - Can't Pass Array References

Yeah, that doesn't exist as a concept. You can't pass arrays from JS to WASM, because WASM only works on what is in linear memory. The array isn't in linear memory because we didn't copy it there, and we can only invoke functions and provide numbers as args to them.

A few months burned reading refactoring, moving on.

4^th Attempt - Memory Synchronisation Issues

This brings us up to today, in mid-2023. I've managed to make my Rust generate - at least theoretically - with shared memory multithreading support, by adding atomics and mutable-globals to the feature list and linking with --shared-memory on Nightly. bulk-memory and --import-memory allow for the import of our shared memory object from JS. Coss'… this doesn't actually work. And I have no idea why! The documentation I've read says it darn well should, but it doesn't. My threads are sharing memory too well now - non-shared local variables appear to be getting allocated over top of each other in shared memory.

And this is where I give up. I can't figure this out. Save yourself some time and learn from my mistakes, and avoid using multithreaded Rust on the web. Even if someone hands you a fully-baked module, it's more trouble than it's worth - you'll wind up fiddling with it when something inevitably breaks, like browsers starting to require site-isolation headers. It's under-documented and very few people understand it. Certainly none than I can find asking around on various forums and Discords over the years this has been ongoing. You will not be able to get help when things go wrong, and things will go wrong.

On the upside, it wasn't a total loss - we filed a few browser bugs. But at the same time, that shows no one's been poking around this area much.

¹: There's loose plans to allow multiple memory objects to be provided to WASM, but that is not high priority because nothing expects to operate on more than one memory. We have based all our technology on things which can be pointed to with a numerical pointer, and web assembly memories are named, not numbered. So currently, we only have one memory, which is the default the number points into.

²: wasm-bindgen does have a proper mechanism to multithread things, but as far as I can tell it works by copying memory around which isn't what we want. If it does share memory, I can't figure out how.

tags: web dev, rust, wasm, multithreading, html5, negative result

Negative Results

Backstory

1st Attempt - wasm-bindgen and Why FFI Is Not Your Friend For Speed

The Core Problem

2nd Attempt - Can't Read That Here

3rd Attempt - Can't Pass Array References

4th Attempt - Memory Synchronisation Issues

1^st Attempt - wasm-bindgen and Why FFI Is Not Your Friend For Speed

2^nd Attempt - Can't Read That Here

3^rd Attempt - Can't Pass Array References

4^th Attempt - Memory Synchronisation Issues