Skip to content Skip to sidebar Skip to footer

Why Is Webassembly Function Almost 300 Time Slower Than Same Js Function

Find length of line 300* slower First of I have read the answer to Why is my WebAssembly function slower than the JavaScript equivalent? But it has shed little light on the problem

Solution 1:

Andreas describes a number of good reasons why the JavaScript implementation was initially observed to be x300 faster. However, there are a number of other issues with your code.

  1. This is a classic 'micro benchmark', i.e. the code that you are testing is so small, that the other overheads within your test loop are a significant factor. For example, there is an overhead in calling WebAssembly from JavaScript, which will factor in your results. What are you trying to measure? raw processing speed? or the overhead of the language boundary?
  2. Your results vary wildly, from x300 to x2, due to small changes in your test code. Again, this is a micro benchmark issue. Others have seen the same when using this approach to measure performance, for example this post claims wasm is x84 faster, which is clearly wrong!
  3. The current WebAssembly VM is very new, and an MVP. It will get faster. Your JavaScript VM has had 20 years to reach its current speed. The performance of the JS <=> wasm boundary is being worked on and optimised right now.

For a more definitive answer, see the joint paper from the WebAssembly team, which outlines an expected runtime performance gain of around 30%

Finally, to answer your point:

Whats the point of WebAssembly if it does not optimise

I think you have misconceptions around what WebAssembly will do for you. Based on the paper above, the runtime performance optimisations are quite modest. However, there are still a number of performance advantages:

  1. Its compact binary format mean and low level nature means the browser can load, parse and compile the code much faster than JavaScript. It is anticipated that WebAssembly can be compiled faster than your browser can download it.
  2. WebAssembly has a predictable runtime performance. With JavaScript the performance generally increases with each iteration as it is further optimised. It can also decrease due to se-optimisation.

There are also a number of non-performance related advantages too.

For a more realistic performance measurement, take a look at:

Both are practical, production codebases.

Solution 2:

The JS engine can apply a lot of dynamic optimisations to this example:

  1. Perform all calculations with integers and only convert to double for the final call to Math.sqrt.

  2. Inline the call to the len function.

  3. Hoist the computation out of the loop, since it always computes the same thing.

  4. Recognise that the loop is left empty and eliminate it entirely.

  5. Recognise that the result is never returned from the testing function, and hence remove the entire body of the test function.

All but (4) apply even if you add the result of every call. With (5) the end result is an empty function either way.

With Wasm an engine cannot do most of these steps, because it cannot inline across language boundaries (at least no engine does that today, AFAICT). Also, for Wasm it is assumed that the producing (offline) compiler has already performed relevant optimisations, so a Wasm JIT tends to be less aggressive than one for JavaScript, where static optimisation is impossible.

Solution 3:

Serious answer

It seemed like

  1. WebAssembly is far from a ready technology.

actually did play a role in this, and performance of calling WASM from JS in Firefox was improved in late 2018. Running your benchmarks in a current FF/Chromium yields results like "Calling the WASM implementation from JS is 4-10 times slower than calling the JS implementation from JS". Still, it seems like engines don't inline across WASM/JS borders, and the overhead of having to call vs. not having to call is significant (as the other answers already pointed out).

Mocking answer

Your benchmarks are all wrong. It turns out that JS is actually 8-40 times (FF, Chrome) slower than WASM. WTF, JS is soo slooow.

Do I intend to prove that? Of course (not).

First, I re-implement your benchmarking code in C:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

static double lengthC(double x, double y, double x1, double y1) {
    double nx = x1 - x;
    double ny = y1 - y;
    returnsqrt(nx * nx + ny * ny);
}
double lengthArrayC(double* a, size_t length) {
    double c = 0;
    for (size_t i = 0; i < length; i++) {
        double a1 = a[i + 0];
        double a2 = a[i + 1];
        double a3 = a[i + 2];
        double a4 = a[i + 3];
        c += lengthC(a1,a2,a3,a4);
        c += lengthC(a2,a3,a4,a1);
        c += lengthC(a3,a4,a1,a2);
        c += lengthC(a4,a1,a2,a3);
    }
    return c;
}

#ifdef __wasm__
__attribute__((import_module("js"), import_name("len")))
double lengthJS(double x, double y, double x1, double y1);
double lengthArrayJS(double* a, size_t length) {
    double c = 0;
    for (size_t i = 0; i < length; i++) {
        double a1 = a[i + 0];
        double a2 = a[i + 1];
        double a3 = a[i + 2];
        double a4 = a[i + 3];
        c += lengthJS(a1,a2,a3,a4);
        c += lengthJS(a2,a3,a4,a1);
        c += lengthJS(a3,a4,a1,a2);
        c += lengthJS(a4,a1,a2,a3);
    }
    return c;
}

__attribute__((import_module("bench"), import_name("now")))
double now();

__attribute__((import_module("bench"), import_name("result")))
voidprinttime(int benchidx, double ns);
#elsevoidprinttime(int benchidx, double ns) {
    if (benchidx == 1) {
        printf("C: %f ns\n", ns);
    } elseif (benchidx == 0) {
        printf("avoid the optimizer: %f\n", ns);
    } else { 
        fprintf(stderr, "Unknown benchmark: %d", benchidx);
        exit(-1);
    }
}
double now() {
    struct timespec ts;
    if (clock_gettime(CLOCK_MONOTONIC, &ts) == 0) {
        return (double)ts.tv_sec + (double)ts.tv_nsec / 1e9;
    } else {
        returnsqrt(-1);
    }
}
#endif

#define iters 1000000
double a[iters+3];

int main() {
    int bigCount = 0;
    srand(now());
    for (size_t i = 0; i < iters + 3; i++)
         a[i] = (double)rand()/RAND_MAX*2e5-1e5;
    
    for (int i = 0; i < 10; i++) {
        double startTime, endTime;
        double c;
        startTime = now();
        c = lengthArrayC(a, iters);
        endTime = now();
        bigCount = (bigCount + (int64_t)c) % 1000;
        printtime(1, (endTime - startTime) * 1e9 / iters / 4);
#ifdef __wasm__
        startTime = now();
        c = lengthArrayJS(a, iters);
        endTime = now();
        bigCount = (bigCount + (int64_t)c) % 1000;
        printtime(2, (endTime - startTime) * 1e9 / iters / 4);
#endif
    }
    printtime(0, bigCount);
    return0;
}

Compile it with clang 12.0.1:

clang -O3 -target wasm32-wasi --sysroot /opt/wasi-sdk/wasi-sysroot/ foo2.c -o foo2.wasm

And provide it with a length function from JS via imports:

"use strict";
(async (wasm) => {
    const wasmbytes = newUint8Array(wasm.length);
    for (var i in wasm)
        wasmbytes[i] = wasm.charCodeAt(i);
    (awaitWebAssembly.instantiate(wasmbytes, {
        js: {
            len: function (x,y,x1,y1) {
                var nx = x1 - x;
                var ny = y1 - y;
                returnMath.sqrt(nx * nx + ny * ny);
            }
        },
        bench: {
            now: () =>window.performance.now() / 1e3,
            result: (bench, ns) => {
                let name;
                if (bench == 1) { name = "C" }
                elseif (bench == 2) { name = "JS" }
                elseif (bench == 0) { console.log("Optimizer confuser: " + ns); /*not really necessary*/; return; }
                else { throw"unknown bench"; }
                console.log(name + ": " + ns + " ns");
            },
        },
    })).instance.exports._start();
})(atob('AGFzbQEAAAABFQRgBHx8fHwBfGAAAXxgAn98AGAAAAIlAwJqcwNsZW4AAAViZW5jaANub3cAAQViZW5jaAZyZXN1bHQAAgMCAQMFAwEAfAcTAgZtZW1vcnkCAAZfc3RhcnQAAwr2BAHzBAMIfAJ/An5BmKzoAwJ/EAEiA0QAAAAAAADwQWMgA0QAAAAAAAAAAGZxBEAgA6sMAQtBAAtBAWutNwMAQejbl3whCANAQZis6ANBmKzoAykDAEKt/tXk1IX9qNgAfkIBfCIKNwMAIAhBmKzoA2ogCkIhiKe3RAAAwP///99Bo0QAAAAAAGoIQaJEAAAAAABq+MCgOQMAIAhBCGoiCA0ACwNAEAEhBkGQCCsDACEBQYgIKwMAIQRBgAgrAwAhAEQAAAAAAAAAACECQRghCANAIAQhAyABIgQgAKEiASABoiIHIAMgCEGACGorAwAiAaEiBSAFoiIFoJ8gACAEoSIAIACiIgAgBaCfIAAgASADoSIAIACiIgCgnyACIAcgAKCfoKCgoCECIAMhACAIQQhqIghBmKToA0cNAAtBARABIAahRAAAAABlzc1BokQAAAAAgIQuQaNEAAAAAAAA0D+iEAICfiACmUQAAAAAAADgQ2MEQCACsAwBC0KAgICAgICAgIB/CyALfEQAAAAAAAAAACECQYDcl3whCBABIQMDQCACIAhBgKzoA2orAwAiBSAIQYis6ANqKwMAIgEgCEGQrOgDaisDACIAIAhBmKzoA2orAwAiBBAAoCABIAAgBCAFEACgIAAgBCAFIAEQAKAgBCAFIAEgABAAoCECIAhBCGoiCA0AC0ECEAEgA6FEAAAAAGXNzUGiRAAAAACAhC5Bo0QAAAAAAADQP6IQAkLoB4EhCgJ+IAKZRAAAAAAAAOBDYwRAIAKwDAELQoCAgICAgICAgH8LIAp8QugHgSELIAlBAWoiCUEKRw0AC0EAIAuntxACCwB2CXByb2R1Y2VycwEMcHJvY2Vzc2VkLWJ5AQVjbGFuZ1YxMS4wLjAgKGh0dHBzOi8vZ2l0aHViLmNvbS9sbHZtL2xsdm0tcHJvamVjdCAxNzYyNDliZDY3MzJhODA0NGQ0NTcwOTJlZDkzMjc2ODcyNGE2ZjA2KQ=='))

Now, calling the JS function from WASM is unsurprisingly a lot slower than calling the WASM function from WASM. (In fact, WASM→WASM it isn't calling. You can see the f64.sqrt being inlined into _start.)

(One last interesting datapoint is that WASM→WASM and JS→JS seem to have about the same cost (about 1.5 ns per inlined length(…) on my E3-1280). Disclaimer: It's entirely possible that my benchmark is even more broken than the original question.)

Conclusion

WASM isn't slow, crossing the border is. For now and the foreseeable future, don't put things into WASM unless they're a significant computational task. (And even then, it depends. Sometimes, JS engines are really smart. Sometimes.)

Post a Comment for "Why Is Webassembly Function Almost 300 Time Slower Than Same Js Function"