Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
Operation
1 2 3 4 5
FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:256] := 0
3.
1
void _mm256_storeu_pd (double * mem_addr, __m256d a)
Description
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.