AVX向量化学习(一)

AVX指令集的简单操作

使用AVX指令集进行2个double型的数组相加操作

使用到的AVX函数介绍

1.

1
__m256 _mm256_loadu_ps (float const * mem_addr)

Description

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr does not need to be aligned on any particular boundary.

Operation

1
2
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0

2.

1
__m256d _mm256_add_pd (__m256d a, __m256d b)

Description

Add packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.

Operation

1
2
3
4
5
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:256] := 0

3.

1
void _mm256_storeu_pd (double * mem_addr, __m256d a)

Description

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

Operation

1
MEM[mem_addr+255:mem_addr] := a[255:0]

未进行AVX向量化的情况

程序源代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include<stdio.h>
int main()
{
double a[9] = {1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1};
double b[9] = {2.1,3.2,6.4,8.6,3.7,9.9,5.1,4.2,6.6};
double c[9] = {0};

for(int i=0 ;i<9;i++)
{
c[i]=a[i]+b[i];

}

printf("this is c.\n");
for(int i=0;i<9;i++)
{
printf("%lf\n",c[i]);
}

return 0;
}

程序输出

1
2
3
4
5
6
7
8
9
10
this is c.
3.200000
5.400000
9.700000
13.000000
9.200000
16.500000
12.800000
13.000000
8.700000

进行AVX向量化的情况

程序源代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include<stdio.h>
#include <immintrin.h>
int main()
{
double a[9] = {1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1};
double b[9] = {2.1,3.2,6.4,8.6,3.7,9.9,5.1,4.2,6.6};
double c[9] = {0};
__m256d v0;
__m256d v1;
__m256d v2;
int i=0;
for(;i<9-4;i+=4)
{
v0 = _mm256_loadu_pd(a+i);
v1 = _mm256_loadu_pd(b+i);
v2=_mm256_add_pd(v0,v1);
_mm256_storeu_pd(c+i,v2);

}
for(;i<9;i++)
{
c[i]=a[i]+b[i];

}
printf("this is c with AVX.\n");
for(int i=0;i<9;i++)
{
printf("%lf\n",c[i]);
}

return 0;
}

程序输出

1
2
3
4
5
6
7
8
9
10
this is c with AVX.
3.200000
5.400000
9.700000
13.000000
9.200000
16.500000
12.800000
13.000000
8.700000

相关链接

[https://software.intel.com/sites/landingpage/IntrinsicsGuide/]: “ Intel® Intrinsics Guide”


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!