用户定制指令(custom instruction)
使用Altera Nios II嵌入式处理器,系统设计者可以通过添加定制指令到Nios II指令集中,来加速处理对时间要求苛刻的软件算法。
使用定制指令,用户可以将一个包含多条标准指令的指令序列减少为硬件实现的一条指令。
应用:
1、优化数字信号处理的软件的内部循环。
2、信息包头的处理。
3、计算密集的应用。
操作:
Nios II配置向导提供了图形化的用户界面用来添加多达256条定制指令到Nios II处理器。
在Qsys中实现用户指令
■ “Opening the Component Editor” 打开组件编辑器
■ “Adding the HDL Files” 添加HDL文件
■ “Configuring the Custom Instruction Signal Type” 配置用户指令信号类型
■ “Setting Up the Custom Instruction Interfaces” 设置用户指令接口
■ “Setting the Details” 设置其它细节
■ “Saving and Adding the Custom Instruction” 保存和添加用户指令
■ “Generating the System and Compiling in the Quartus II Software” 生成系统,在Quartus II软件中编译
分析代码:(自上而下)
BSP
system.h
/*
* Custom instruction macros
*
*/
#define ALT_CI_CRC_0(A) __builtin_custom_ini(ALT_CI_CRC_0_N,(A))
#define ALT_CI_CRC_0_1(A,B) __builtin_custom_inii(ALT_CI_CRC_0_1_N,(A),(B))
#define ALT_CI_CRC_0_1_N 0x9 //1001
#define ALT_CI_CRC_0_2(A,B) __builtin_custom_inii(ALT_CI_CRC_0_2_N,(A),(B))
#define ALT_CI_CRC_0_2_N 0xa //1010
#define ALT_CI_CRC_0_3(A,B) __builtin_custom_inii(ALT_CI_CRC_0_3_N,(A),(B))
#define ALT_CI_CRC_0_3_N 0xb //1011
#define ALT_CI_CRC_0_4(n,A,B) __builtin_custom_inii(ALT_CI_CRC_0_4_N+(n&ALT_CI_CRC_0_4_N_MASK),(A),(B))
#define ALT_CI_CRC_0_4_N 0x0
#define ALT_CI_CRC_0_4_N_MASK ((1<<3)-1) //0111
#define ALT_CI_CRC_0_N 0x8 //1000
硬件实现
CRC_Custom_Instruction.v
输入管脚n是指令选择。
assign address = (n==0)?3'b000 : ((n==1)|(n==2)|(n==3))?3'b001 : (n==4)?3'b100 : (n==5)?3'b101 : (n==6)?3'b110 : 3'b111;
CRC_Component.v
assign mux_result = (address == 3'b100)? result_xored[31:0] :
((address == 3'b101)? result_xored[63:32] :
((address == 3'b110)? result_xored[95:64] : result_xored[crc_width-1:96]));
软件实现和硬件实现的比较
aa******************************************************************************
Comparison between software and custom instruction CRC32
******************************************************************************
System specification
--------------------
System clock speed = 62.5 MHz
Number of buffer locations = 16
Size of each buffer = 65535 bytes
Initializing all of the buffers with pseudo-random data
-------------------------------------------------------
Initialization completed
Running the software CRC
------------------------
Completed
Running the optimized software CRC
----------------------------------
Completed
Running the custom instruction CRC
----------------------------------
Completed
Validating the CRC results from all implementations
---------------------------------------------------
All CRC implementations produced the same results
Processing time for each implementation
---------------------------------------
Software CRC = 9938.35 ms
Optimized software CRC = 6472.89 ms
Custom instruction CRC = 132.29 ms
Processing throughput for each implementation
---------------------------------------------
Software CRC = 0.84 Mbps
Optimized software CRC = 1.30 Mbps
Custom instruction CRC = 63.41 Mbps
Speedup ratio
-------------
Custom instruction CRC vs software CRC = 75.1
Custom instruction CRC vs optimized software CRC = 48.9
Optimized software CRC vs software CRC = 1.5