AVX指令集的简单操作(内存对齐版)

使用AVX指令集进行2个double型的数组相加操作

常用的内存对齐函数

因为AVX中要求mem__addr必须在32字节边界上对齐，否则可能会产生通用保护异常。

1.

1	`double* a =(double)memalign(32,9sizeof(double));`

2.

1	`double* a =(double)_mm_malloc(9sizeof(double),32);`

3.

1	`double* a =(double)aligned_alloc(32,9sizeof(double));`

4.

1	`__attribute__ ((aligned(32)))double a[9] ={1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1};`

使用到的AVX函数介绍

1.

1	`__m256d _mm256_load_pd (double const * mem_addr)`

Description

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

Operation

1 2	`dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0`

2.

1	`__m256 _mm256_add_ps (__m256 a, __m256 b)`

Description

Add packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.

Operation

FOR j := 0 to 7
	i := j*32
	dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:256] := 0

3.stream的作用：绕过缓存直接写入内存

1	`void _mm256_stream_pd (double * mem_addr, __m256d a)`

Description

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

Operation

1	`MEM[mem_addr+255:mem_addr] := a[255:0]`

样例程序举例：

#include<stdio.h>
#include<malloc.h>
#include <immintrin.h>
int main()
{
	double*	a =(double*)memalign(32,9*sizeof(double));
	double*	b =(double*)memalign(32,4*sizeof(double));
	double af[9]={1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8,2.1} ;
	double bf[9]={2.1,3.2,6.4,8.6,3.7,9.9,5.1,4.2,6.6};
	double*	c =(double*)memalign(32,4*sizeof(double));
	for(int i =0;i<9;i++)
	{
		a[i]=af[i];
		b[i]=bf[i];
	}
	int i=0;
	__m256d v0;
	__m256d v1;
	__m256d v2;
	for(;i<9-4;i+=4)
	{	
			v0 = _mm256_load_pd(a+i);
			v1 = _mm256_load_pd(b+i);
			v2=_mm256_add_pd(v0,v1);
		 	_mm256_stream_pd(c+i,v2);
			
	}
	for(;i<9;i++)
	{
		c[i]=a[i]+b[i];
	
	}
	printf("this is c.\n");
		for(int i=0;i<9;i++)
	{
		printf("%lf\n",c[i]);
	}
	return 0;
 }

样例程序输出：

AVX向量化学习(二)-内存对齐的应用

AVX指令集的简单操作(内存对齐版)

常用的内存对齐函数

1.

2.

3.

4.

使用到的AVX函数介绍

1.

2.

3.stream的作用：绕过缓存直接写入内存

样例程序举例：

样例程序输出：

相关链接