Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
Operation
1 2 3 4 5
FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:256] := 0
3.stream的作用:绕过缓存直接写入内存
1
void _mm256_stream_pd (double * mem_addr, __m256d a)
Description
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.