Tuesday, July 30, 2013

SIMD on the Web July 2013 Update (ARM, JavaScript, and AVX-512)

Accessing the SIMD instruction sets available in both desktop and mobile CPUs can greatly speedup 3D graphics (and other) applications . Until recently, web programmers have been unable to access this part of the CPU losing out on performance and battery savings. In the six months since I gave my talk, Bringing SIMD to the Web via Dart, a lot of things have happened. Read on for a complete status update on SIMD on the web including the latest on ARM, JavaScript, and AVX-512 support.

A demonstration of the speed gain you get when using Float32x4 for 3D games can be seen in Web Languages and VMs: Fast Code is Always in Fashion. Now on to the status update:

1. Status of Dart implementation

Dart's implementation of the Float32x4, Uint32x4, and Float32x4List types is complete. The API may change slightly in the future but any changes will be minor and easy to adapt to.

2. Status of Dart acceleration

Dart fully accelerates Float32x4, Uint32x4, and Float32x4List types on IA32, X64, and ARM (with NEON) CPUs. Thanks to Zachary Anderson for the ARM implementation.

3. Status of dart2js support

The Dart to JavaScript compiler fully supports the Float32x4, Uint32x4, and Float32x4List types. When compiled to JavaScript the types are implemented in software and do not give speedups. A flag will be introduced allowing your program to detect when SIMD will be slow. If your program executing on an ARM CPU without NEON (most current generation smart phones have NEON support, but older ones may not.) support or via JavaScript it is recommend that you use non-SIMD code paths.

4. What about JavaScript acceleration?

I've proposed these types for ECMAScript 7. I've created a polyfill for those interested in what the API will look like. Time will tell how serious the ECMAScript members are about SIMD and how fast the JavaScript engines can implement support for it.

5. What about AVX and AVX-512?

For those of you who don't follow the latest CPU instruction sets, AVX is the successor to SSE and has 256-bit wide registers (YMM). AVX-512 is a follow up to AVX and adds 512-bit wide registers (ZMM) and doubles (32 instead of 16) the number of register names available. Exciting stuff. AVX exists in the wild and I plan on implementing Float32x8 later this year. AVX-512 was only just announced and no chips support it (yet), once AVX-512 becomes closer to reality Dart will get a Float32x16 type.