原生代码与托管代码的一个简单性能对比
[email protected]
tag:托管代码,原生代码,性能对比
在网上看到一篇文章"托管代码和非托管代码效率的对比" ( http://www.cnblogs.com/wuchang/archive/2006/12/07/584997.html ),
作者用英特尔多核平台编码优化大赛的参考代码分别用非托管c、托管cpp、c#做了个简略的性能测试;但是对比有很多不公平的地方(后面会说明),
所以自己也利用大赛的参考代码(英特尔多核平台编码优化大赛 http://contest.intel.csdn.net )
来尝试对比一下原生代码与托管代码的性能(要对比它们的实际性能可能涉及到很多方面的测试,
比如整数运算、浮点运算、内存访问、商业应用、游戏应用、多媒体应用等等),这里的结果仅供参考;
参与对比的代码: C++、Delphi、C++ CLR、C#、java (前面两个属于原生代码、后面3个属于托管代码); 我测试用的CPU :AMD64x2 3600+ ( 双核CPU,但测试使用的都是单线程,而且查看CPU占用后也确定没有代码“偷偷”优化为并行执行); 32bit WindowsXP操作系统;
(2007.04.02 根据AhBian的回复,修改了C#的实现,将二维数组修改为一维数组,程序执行由5.64秒提高到3.92秒,速度提高了43%; 更新了一些相关的说明和结论分析; 谢谢AhBian :)
大赛公布的原始代码:
(组织者后来要求计算和输出精度到小数点后7位,这里的输出代码做了相应的调整。)
//
* compute the potential energy of a collection of */
//
* particles interacting via pairwise potential */
#include
<
stdio.h
>
#include
<
stdlib.h
>
#include
<
math.h
>
#include
<
windows.h
>
#include
<
time.h
>
#define
NPARTS 1000
#define
NITER 201
#define
DIMS 3
int
rand(
void
);
int
computePot(
void
);
void
initPositions(
void
);
void
updatePositions(
void
);
double
r[DIMS][NPARTS];
double
pot;
double
distx, disty, distz, dist;
int
main() {
int
i;
clock_t start, stop;
initPositions();
updatePositions();
start
=
clock();
for
( i
=
0
; i
<
NITER; i
++
) {
pot
=
0.0
;
computePot();
if
(i
%
10
==
0
) printf(
"
%5d: Potential: %10.7f /n
"
, i, pot);
updatePositions();
}
stop
=
clock();
printf (
"
Seconds = %10.9f /n
"
,(
double
)(stop
-
start)
/
CLOCKS_PER_SEC);
getchar();
}
void
initPositions() {
int
i, j;
for
( i
=
0
; i
<
DIMS; i
++
)
for
( j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
=
0.5
+
( (
double
) rand()
/
(
double
) RAND_MAX );
}
void
updatePositions() {
int
i, j;
for
( i
=
0
; i
<
DIMS; i
++
)
for
( j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
-=
0.5
+
( (
double
) rand()
/
(
double
) RAND_MAX );
}
int
computePot() {
int
i, j;
for
( i
=
0
; i
<
NPARTS; i
++
) {
for
( j
=
0
; j
<
i
-
1
; j
++
) {
distx
=
pow( (r[
0
][j]
-
r[
0
][i]),
2
);
disty
=
pow( (r[
1
][j]
-
r[
1
][i]),
2
);
distz
=
pow( (r[
2
][j]
-
r[
2
][i]),
2
);
dist
=
sqrt( distx
+
disty
+
distz );
pot
+=
1.0
/
dist;
}
}
return
0
;
}
测试代码的调整:
测试时,我把pow(x,2),改为了x*x的形式,从而避免了不同的库中pow函数的不同实现造成的问题;
( pow(x,y)的实现的时候,有一个两难的选择,就是要不要特殊处理y是整数的情况;
因为y是整数的时候就可以用一个快速算法完成(一般用2分法);
如果y有小数部分,就会采用其他方案,(不考虑正负号)一般的实现是:exp(ln(x)*y) ,这个就比较慢了
有的库设计者可能会要求调用者自己决定调用intpow(或者重载pow函数)或pow函数,从而得到最快的速度;
有的库的实现可能就把这两个函数都实现在pow里,对调用者降低要求 (这个常见一些) )
为了避免计算之外的其他干扰,我自己写了随机函数的实现,保证结果一致、可对比(运行结果完全一致)和可重现;
将代码中 x/(double)RAND_MAX 改为 x* (1.0/RAND_MAX))等; (输出几乎不占用时间);
经过这些处理,computePot的运行时间占总时间的98%左右,然后就可以测试代码的比较真实的运行时间;
先来看看原生代码主力军C++的实现:
/*
compute the potential energy of a collection of
*/
/*
particles interacting via pairwise potential
*/
#include
<
stdio.h
>
#include
<
stdlib.h
>
#include
<
math.h
>
#include
<
time.h
>
#define
NPARTS 1000
#define
NITER 201
#define
DIMS 3
int
computePot(
void
);
void
initPositions(
void
);
void
updatePositions(
void
);
double
r[DIMS][NPARTS];
double
pot;
double
distx, disty, distz, dist;
//
const int RAND_MAX = 0x7fff;
class
CMyRand
{
private
:
long
_my_holdrand;
public
:
CMyRand():_my_holdrand(
1
){ }
long
Next()
{
long
result
=
_my_holdrand
*
214013
+
2531011
;
_my_holdrand
=
result;
return
( (result
>>
16
)
&
RAND_MAX );
}
};
CMyRand random;
int
main() {
int
i;
clock_t start, stop;
initPositions();
updatePositions();
start
=
clock();
for
( i
=
0
; i
<
NITER; i
++
) {
pot
=
0.0
;
computePot();
if
(i
%
10
==
0
) printf(
"
%5d: Potential: %20.7f /n
"
, i, pot);
updatePositions();
}
stop
=
clock();
printf (
"
Seconds = %10.9f /n
"
,(
double
)(stop
-
start)
/
CLOCKS_PER_SEC);
getchar();
}
void
initPositions() {
int
i, j;
for
( i
=
0
; i
<
DIMS; i
++
)
for
( j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
=
0.5
+
( random.Next()
*
(
1.0
/
RAND_MAX) );
}
void
updatePositions() {
int
i, j;
for
( i
=
0
; i
<
DIMS; i
++
)
for
( j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
-=
0.5
+
( random.Next()
*
(
1.0
/
RAND_MAX) );
}
int
computePot() {
int
i, j;
for
( i
=
0
; i
<
NPARTS; i
++
) {
for
( j
=
0
; j
<
i
-
1
; j
++
) {
distx
=
(r[
0
][j]
-
r[
0
][i])
*
(r[
0
][j]
-
r[
0
][i]);
disty
=
(r[
1
][j]
-
r[
1
][i])
*
(r[
1
][j]
-
r[
1
][i]);
distz
=
(r[
2
][j]
-
r[
2
][i])
*
(r[
2
][j]
-
r[
2
][i]);
dist
=
sqrt( distx
+
disty
+
distz );
pot
+=
1.0
/
dist;
}
}
return
0
;
}
代码 编译环境 release下运行时间(秒)
C++ vs2005 2.11(/fp:fast) 2.45(/fp:precise) 3.70(/fp:strict) 2.359(/fp:fast /arch:SSE2)
(vc2005对于浮点运算提供了3种编译模型fast、precise、strict,所以分别进行了测试;并且测试了在打开SSE2优化的情况下的速度;)
Delphi的代码
program potential_serial;
{$APPTYPE CONSOLE}
uses
math,SysUtils;
const
NPARTS
=
1000
;
const
NITER
=
201
;
const
DIMS
=
3
;
var r:array [
0
..DIMS
-
1
] of array [
0
..NPARTS
-
1
] of
double
;
var pot:
double
;
var distx, disty, distz, dist:
double
;
const
RAND_MAX
=
$7fff;
type
CMyRand
=
class
(TObject)
private
_my_holdrand:integer;
public
constructor Create;
function Next():Integer;
end;
constructor CMyRand.Create();
begin
_my_holdrand:
=
1
;
end;
function CMyRand.Next():Integer;
begin
result:
=
_my_holdrand
*
214013
+
2531011
;
_my_holdrand :
=
result;
result:
=
( (result shr
16
) and RAND_MAX );
end;
var random:CMyRand;
procedure initPositions();
var
i, j:integer;
begin
for
i:
=
0
to DIMS
-
1
do
for
j:
=
0
to NPARTS
-
1
do
r[i][j] :
=
0.5
+
( random.Next()
*
(
1.0
/
RAND_MAX) );
end;
procedure updatePositions();
var
i, j:integer;
begin
for
i:
=
0
to DIMS
-
1
do
for
j:
=
0
to NPARTS
-
1
do
r[i][j] :
=
r[i][j]
-
(
0.5
+
( random.Next()
*
(
1.0
/
RAND_MAX) ) );
end;
function computePot():integer;
var
i, j:integer;
begin
for
i:
=
0
to NPARTS
-
1
do
begin
for
j:
=
0
to i
-
2
do
begin
distx :
=
(r[
0
][j]
-
r[
0
][i])
*
(r[
0
][j]
-
r[
0
][i]);
disty :
=
(r[
1
][j]
-
r[
1
][i])
*
(r[
1
][j]
-
r[
1
][i]);
distz :
=
(r[
2
][j]
-
r[
2
][i])
*
(r[
2
][j]
-
r[
2
][i]);
dist :
=
sqrt( distx
+
disty
+
distz );
pot :
=
pot
+
1.0
/
dist;
end;
end;
result:
=
0
;
end;
const
CLOCKS_PER_SEC
=
1.0
/
(
24
*
60
*
60
);
//
main
var
i:integer;
start, stop:Tdatetime;
tmpChar : Char;
begin
random:
=
CMyRand.Create();
initPositions();
updatePositions();
start:
=
now();
for
i:
=
0
to NITER
-
1
do
begin
pot :
=
0.0
;
computePot();
if
(i mod
10
=
0
) then Writeln( format(
'
%5d: Potential: %20.7f
'
, [i, pot]) );
updatePositions();
end;
stop:
=
now;
Writeln( format(
'
Seconds = %10.9f
'
,[(
double
(stop)
-
double
(start))
/
CLOCKS_PER_SEC]) );
read(tmpChar);
end.
代码 编译环境 release下运行时间(秒)
Delphi TurboDelphi 3.93
Delphi Delphi7 5.81
C++托管环境
代码和前面的C++代码完全一样;测试时用VC2005把前面的C++代码直接编译为托管代码;
代码 编译环境 release下运行时间(秒)
C++ CLR vs2005 2.34
C#代码
using
System;
static
class
Program
{
static
int
RAND_MAX
=
0x7fff
;
class
CMyRand
{
private
long
_my_holdrand
=
1
;
public
long
Next()
{
long
result
=
_my_holdrand
*
214013
+
2531011
;
_my_holdrand
=
result;
return
( (result
>>
16
)
&
RAND_MAX );
}
};
static
CMyRand random
=
new
CMyRand();
static
int
NPARTS
=
1000
;
static
int
NITER
=
201
;
static
int
DIMS
=
3
;
static
double
pot;
static
double
distx, disty, distz, dist;
static
double
[] r
=
new
double
[DIMS
*
NPARTS];
static
int
CLOCKS_PER_SEC
=
1000
;
static
void
Main()
{
int
i;
int
start, stop;
initPositions();
updatePositions();
start
=
Environment.TickCount;
for
(i
=
0
; i
<
NITER; i
++
)
{
pot
=
0.0
;
computePot();
if
(i
%
10
==
0
) Console.WriteLine(
"
{0}: Potential: {1:##########.#######}
"
, i, pot);
updatePositions();
}
stop
=
Environment.TickCount;
Console.WriteLine(
"
Seconds = {0:##########.#########}
"
, (
double
)(stop
-
start)
/
CLOCKS_PER_SEC);
Console.ReadLine();
}
static
void
initPositions()
{
for
(
int
i
=
0
; i
<
DIMS; i
++
)
for
(
int
j
=
0
; j
<
NPARTS; j
++
)
r[i
+
j
*
DIMS]
=
0.5
+
(random.Next()
*
(
1.0
/
RAND_MAX));
}
static
void
updatePositions()
{
for
(
int
i
=
0
; i
<
DIMS; i
++
)
for
(
int
j
=
0
; j
<
NPARTS; j
++
)
r[i
+
j
*
DIMS]
-=
0.5
+
(random.Next()
*
(
1.0
/
RAND_MAX));
}
static
int
computePot()
{
for
(
int
i
=
0
; i
<
NPARTS; i
++
)
{
for
(
int
j
=
0
; j
<
i
-
1
; j
++
)
{
distx
=
(r[
0
+
j
*
DIMS]
-
r[
0
+
i
*
DIMS])
*
(r[
0
+
j
*
DIMS]
-
r[
0
+
i
*
DIMS]);
disty
=
(r[
1
+
j
*
DIMS]
-
r[
1
+
i
*
DIMS])
*
(r[
1
+
j
*
DIMS]
-
r[
1
+
i
*
DIMS]);
distz
=
(r[
2
+
j
*
DIMS]
-
r[
2
+
i
*
DIMS])
*
(r[
2
+
j
*
DIMS]
-
r[
2
+
i
*
DIMS]);
dist
=
Math.Sqrt(distx
+
disty
+
distz);
pot
+=
1.0
/
dist;
}
}
return
0
;
}
}
代码 编译环境 release下运行时间(秒)
C# vs2005 3.92
(2007.04.02,根据AhBian的回复,修改了C#的实现,将二维数组修改为一维数组实现,程序执行由5.64秒提高到3.92秒)
java代码
public
class
CMyRand {
public
static
int
RAND_MAX
=
0x7fff
;
private
long
_my_holdrand
=
1
;
public
long
Next()
{
long
result
=
_my_holdrand
*
214013
+
2531011
;
_my_holdrand
=
result;
return
( (result
>>
16
)
&
RAND_MAX );
}
}
//////////
/
import
java.io.IOException;
import
java.lang.
*
;
public
final
class
Program {
static
CMyRand random
=
new
CMyRand();
static
int
NPARTS
=
1000
;
static
int
NITER
=
201
;
static
int
DIMS
=
3
;
static
double
pot;
static
double
distx, disty, distz, dist;
static
double
[][] r
=
new
double
[DIMS][NPARTS];
static
int
CLOCKS_PER_SEC
=
1000
;
public
static
void
main(String[] args) {
int
i;
long
start, stop;
initPositions();
updatePositions();
start
=
System.currentTimeMillis();
for
(i
=
0
; i
<
NITER; i
++
)
{
pot
=
0.0
;
computePot();
if
(i
%
10
==
0
) System.out.println(
""
+
i
+
"
: Potential:
"
+
pot);
updatePositions();
}
stop
=
System.currentTimeMillis();
System.out.println(
"
Seconds =}
"
+
((
double
)(stop
-
start)
/
CLOCKS_PER_SEC));
try
{
System.in.read();
}
catch
(IOException e) {
//
TODO 自动生成 catch 块
e.printStackTrace();
}
}
static
void
initPositions()
{
for
(
int
i
=
0
; i
<
DIMS; i
++
)
for
(
int
j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
=
0.5
+
(random.Next()
*
(
1.0
/
CMyRand.RAND_MAX));
}
static
void
updatePositions()
{
for
(
int
i
=
0
; i
<
DIMS; i
++
)
for
(
int
j
=
0
; j
<
NPARTS; j
++
)
r[i][j]
-=
0.5
+
(random.Next()
*
(
1.0
/
CMyRand.RAND_MAX));
}
static
int
computePot()
{
for
(
int
i
=
0
; i
<
NPARTS; i
++
)
{
for
(
int
j
=
0
; j
<
i
-
1
; j
++
)
{
distx
=
(r[
0
][j]
-
r[
0
][i])
*
(r[
0
][j]
-
r[
0
][i]);
disty
=
(r[
1
][j]
-
r[
1
][i])
*
(r[
1
][j]
-
r[
1
][i]);
distz
=
(r[
2
][j]
-
r[
2
][i])
*
(r[
2
][j]
-
r[
2
][i]);
dist
=
Math.sqrt(distx
+
disty
+
distz);
pot
+=
1.0
/
dist;
}
}
return
0
;
}
}
代码 编译环境 release下运行时间(秒)
java eclipse3.2.1(jre1.6) 3.34
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
数据对比和分析:
代码 编译环境 release下运行时间(秒)
C++ vs2005 2.11(/fp:fast) 2.45(/fp:precise) 3.70(/fp:strict) 2.359(/fp:fast /arch:SSE2)
Delphi TurboDelphi 3.93 (Delphi7 5.81)
C++ CLR vs2005 2.34
C# vs2005 3.92
java eclipse3.2.1(jre1.6) 3.34
从这次测试的结果来看,托管代码的性能与原生代码的性能已经在同一个水平线上了,差别不大;
C++和托管C++分别排在了原生代码和托管代码的第一位;
Delphi系列一直对浮点的优化都比较弱(Delphi7对浮点运算几乎没有任何优化),
C#和java语言为了代码的安全性和开发速度等付出了一些运行时的性能代价(比如数组访问时的边界检查),但也不是太严重,如果除去这些设计方面的因素,它们的执行性能将更好;对于托管C++的设计,看来在.net平台中还是很有存在的必要的;
一直有一种说法:托管代码在原理上完全有可能达到甚至超过原生编译的代码; 现在,托管代码和原生代码在性能上已经非常的接近! 而且让人难以相信的是托管C++生成的代码速度已经非常接近了原生C++的代码速度;(刚开始有点不相信,就去查了一下托管C++生成的程序,里面确实是.net字节码;它在运行的时候才进一步编译成本地机器码来执行的);
这对我来说是一个很大的观念转变;以前也知道托管代码的速度有可能超越原生代码,但总以为目前还差得远,没想到这一天来得这么快!
ps:我也参加了这次intel的优化大赛,使用的C++&SSE2优化,单线程执行速度0.91s,双核并行执行为0.453s;我的blog上已经公布了该优化的源代码和报告; (我的代码的速度排在参赛者里面的第2位,排第1位的代码在AMDx2 3600+上比我的代码快20%多,在酷睿2 4400上比我的代码略快(一般不超过1%) ; 我倒是很想知道在托管代码中如何做出这些“bt”的优化:)