HighTechTalks DotNet Forums  

Managed vs Unmanaged Bare Bones Performance Test

Dotnet Framework (CLR) microsoft.public.dotnet.framework.clr


Discuss Managed vs Unmanaged Bare Bones Performance Test in the Dotnet Framework (CLR) forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
adhingra
 
Posts: n/a

Default Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 04:52 PM






At our company we are currently at a decisive point to choose between managed
and unmanaged code on the basis of their performance. I have read stuff about
this on various blogs and other websites. Then I decided to take my own test
as I am more concerned with basic performance at this point.

By basic I mean, just the basic stuff inside the CLR i.e. function calling
cost, for loop, variable declaration, etc. Let us not consider GC, memory
allocation costs, etc.

To my surprise the managed code I generated in my test through C# was
lagging behind to a considerable degree when compared with the code generated
by the C++ compiler.

I was wondering if someone can take a quick look at this and tell me why is
this the case. I was under the assumption, once the JIT happens, the CLR
virtual machine and JIT will give the same performance as native C++ compiler
does (as we are talking basic stuff only - no objects, just pure language
constructs and primitive data types).

I created two sample console applications (one in C# and other in C++). They
both call a function passing an int by value from inside a for loop. Nothing
happens inside the function. I used QueryPerformance.... apis for
measurement. (Code is pasted at the bottom of this posting).

Here are the results (for release mode running from console, with default
settings in the IDE)

C# Test for loop (50000 iterations) 0.000023931 (23 micro seconds)
C++ Test for loop (50000 iterations) 0.000000350 (0.35 micro seconds)

So its like C++ compiler is about 20 times faster than the managed CLR
Jitter. And if I also remove time taken for the QueryPerf...... apis then the
diff is even more

Can anyone please elaborate.

Thanks
adhingra

===========================================
C# Code PROGRAM.CS
===========================================

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;

namespace ConsoleApp
{
class Program
{
//API declarations for frequency timers
[DllImport("kernel32.dll")]
extern static short QueryPerformanceCounter(ref long x);
[DllImport("kernel32.dll")]
extern static short QueryPerformanceFrequency(ref long x);

static long m_lStart = 0, m_lStop = 0, m_lFreq = 0;
static long m_lOverhead = 0;
static decimal m_mTotalTime = 0;

static void Main(string[] args)
{
//get the CPU frequency
QueryPerformanceFrequency(ref m_lFreq);

//record the overhead for calling the performance counter API
QueryPerformanceCounter(ref m_lStart);
QueryPerformanceCounter(ref m_lStop);

m_lOverhead = m_lStop - m_lStart;

Console.WriteLine("Starting with a simple For Loop calling a
simple function");

QueryPerformanceCounter(ref m_lStart);
for (int i = 0; i < 50000; i++)
{
Run(i);
}
QueryPerformanceCounter(ref m_lStop);

long lDiff = m_lStop - m_lStart;
Console.WriteLine(lDiff);
//Comment or Uncomment the overhead lines to see the times drop
//
//if (lDiff > m_lOverhead)
//{
// lDiff = lDiff - m_lOverhead;
//}

m_mTotalTime = ((Decimal)lDiff)/((Decimal)m_lFreq);
Console.WriteLine(m_mTotalTime);

Console.WriteLine("Press Enter to Continue");
Console.ReadLine();
}

static void Run(int i)
{
//Console.WriteLine(i);
}
}
}


===============================================
C++ Code ConsoleApp.cpp
===============================================

// ConsoleApp.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

void Run(int i)
{
//printf("%d\n",i);
}

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER m_start, m_stop, m_freq;
::QueryPerformanceFrequency(&m_freq);

//record the overhead for calling the performance counter API
::QueryPerformanceCounter(&m_start);
::QueryPerformanceCounter(&m_stop);

LONGLONG m_overhead = m_stop.QuadPart - m_start.QuadPart;
m_start.QuadPart = 0;
m_stop.QuadPart = 0;

printf("%s\n","Starting with a simple For Loop calling a simple function");

QueryPerformanceCounter(&m_start);
for (int i = 0; i < 50000; i++)
{
Run(i);
}
QueryPerformanceCounter(&m_stop);

LONGLONG lDiff = m_stop.QuadPart - m_start.QuadPart;
printf("%d\n",lDiff);
//Comment or Uncomment the overhead lines to see the times drop
//
//if (lDiff > m_overhead)
//{
// lDiff = lDiff - m_overhead;
//}

double totalTime = ((double)lDiff) / ((double)m_freq.QuadPart);
printf("%15.15f\n",totalTime);

printf("%s", "Press Enter to Continue");

int c = getchar();
return 0;
}



Reply With Quote
  #2  
Old   
Willy Denoyette [MVP]
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 05:22 PM






"adhingra" <adhingra (AT) discussions (DOT) microsoft.com> wrote

Quote:
At our company we are currently at a decisive point to choose between managed
and unmanaged code on the basis of their performance. I have read stuff about
this on various blogs and other websites. Then I decided to take my own test
as I am more concerned with basic performance at this point.

By basic I mean, just the basic stuff inside the CLR i.e. function calling
cost, for loop, variable declaration, etc. Let us not consider GC, memory
allocation costs, etc.

To my surprise the managed code I generated in my test through C# was
lagging behind to a considerable degree when compared with the code generated
by the C++ compiler.

I was wondering if someone can take a quick look at this and tell me why is
this the case. I was under the assumption, once the JIT happens, the CLR
virtual machine and JIT will give the same performance as native C++ compiler
does (as we are talking basic stuff only - no objects, just pure language
constructs and primitive data types).

I created two sample console applications (one in C# and other in C++). They
both call a function passing an int by value from inside a for loop. Nothing
happens inside the function. I used QueryPerformance.... apis for
measurement. (Code is pasted at the bottom of this posting).

Here are the results (for release mode running from console, with default
settings in the IDE)

C# Test for loop (50000 iterations) 0.000023931 (23 micro seconds)
C++ Test for loop (50000 iterations) 0.000000350 (0.35 micro seconds)

So its like C++ compiler is about 20 times faster than the managed CLR
Jitter. And if I also remove time taken for the QueryPerf...... apis then the
diff is even more

Can anyone please elaborate.

Thanks
adhingra

===========================================
C# Code PROGRAM.CS
===========================================

using System;
using System.Collections.Generic;
using System.Text;
using System.Runtime.InteropServices;

namespace ConsoleApp
{
class Program
{
//API declarations for frequency timers
[DllImport("kernel32.dll")]
extern static short QueryPerformanceCounter(ref long x);
[DllImport("kernel32.dll")]
extern static short QueryPerformanceFrequency(ref long x);

static long m_lStart = 0, m_lStop = 0, m_lFreq = 0;
static long m_lOverhead = 0;
static decimal m_mTotalTime = 0;

static void Main(string[] args)
{
//get the CPU frequency
QueryPerformanceFrequency(ref m_lFreq);

//record the overhead for calling the performance counter API
QueryPerformanceCounter(ref m_lStart);
QueryPerformanceCounter(ref m_lStop);

m_lOverhead = m_lStop - m_lStart;

Console.WriteLine("Starting with a simple For Loop calling a
simple function");

QueryPerformanceCounter(ref m_lStart);
for (int i = 0; i < 50000; i++)
{
Run(i);
}
QueryPerformanceCounter(ref m_lStop);

long lDiff = m_lStop - m_lStart;
Console.WriteLine(lDiff);
//Comment or Uncomment the overhead lines to see the times drop
//
//if (lDiff > m_lOverhead)
//{
// lDiff = lDiff - m_lOverhead;
//}

m_mTotalTime = ((Decimal)lDiff)/((Decimal)m_lFreq);
Console.WriteLine(m_mTotalTime);

Console.WriteLine("Press Enter to Continue");
Console.ReadLine();
}

static void Run(int i)
{
//Console.WriteLine(i);
}
}
}


===============================================
C++ Code ConsoleApp.cpp
===============================================

// ConsoleApp.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

void Run(int i)
{
//printf("%d\n",i);
}

int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER m_start, m_stop, m_freq;
::QueryPerformanceFrequency(&m_freq);

//record the overhead for calling the performance counter API
::QueryPerformanceCounter(&m_start);
::QueryPerformanceCounter(&m_stop);

LONGLONG m_overhead = m_stop.QuadPart - m_start.QuadPart;
m_start.QuadPart = 0;
m_stop.QuadPart = 0;

printf("%s\n","Starting with a simple For Loop calling a simple function");

QueryPerformanceCounter(&m_start);
for (int i = 0; i < 50000; i++)
{
Run(i);
}
QueryPerformanceCounter(&m_stop);

LONGLONG lDiff = m_stop.QuadPart - m_start.QuadPart;
printf("%d\n",lDiff);
//Comment or Uncomment the overhead lines to see the times drop
//
//if (lDiff > m_overhead)
//{
// lDiff = lDiff - m_overhead;
//}

double totalTime = ((double)lDiff) / ((double)m_freq.QuadPart);
printf("%15.15f\n",totalTime);

printf("%s", "Press Enter to Continue");

int c = getchar();
return 0;
}



This kind of benchmarh is meaningless..
The reason for the huge difference is that the C++ compiler hoists the loop, as it sees no
sensible reason to call an empty function 50000 times, the C# compiler does not do this, it
simply calls the function which only contains a ret.
So what you are comparing is the time taken for a return from QueryPerformanceCounter plus
the time to call QueryPerformanceCounter, against a the time taken to call 50000 times an
empty function.

Willy.






Reply With Quote
  #3  
Old   
Ben Voigt
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 05:38 PM



Quote:
This kind of benchmarh is meaningless..
The reason for the huge difference is that the C++ compiler hoists the
loop, as it sees no sensible reason to call an empty function 50000 times,
the C# compiler does not do this, it simply calls the function which only
contains a ret.
Inlining and optimizing away a call to an empty function is well within the
capabilities of the CLR JIT.

Quote:
So what you are comparing is the time taken for a return from
QueryPerformanceCounter plus the time to call QueryPerformanceCounter,
against a the time taken to call 50000 times an empty function.

Willy.







Reply With Quote
  #4  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 05:42 PM



Ben Voigt <rbv (AT) nospam (DOT) nospam> wrote:
Quote:
This kind of benchmarh is meaningless..
The reason for the huge difference is that the C++ compiler hoists the
loop, as it sees no sensible reason to call an empty function 50000 times,
the C# compiler does not do this, it simply calls the function which only
contains a ret.

Inlining and optimizing away a call to an empty function is well within the
capabilities of the CLR JIT.
That was my thought too. I suspect it'll still perform the loop
iteration, however, whereas the C++ compiler may well have removed that
loop completely, which still means it's not a good benchmark.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #5  
Old   
Ben Voigt
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 05:43 PM



Quote:
Here are the results (for release mode running from console, with default
settings in the IDE)

C# Test for loop (50000 iterations) 0.000023931 (23 micro seconds)
C++ Test for loop (50000 iterations) 0.000000350 (0.35 micro seconds)

So its like C++ compiler is about 20 times faster than the managed CLR
Jitter. And if I also remove time taken for the QueryPerf...... apis then
the
diff is even more
Did you actually measure the time for QueryPerf? Ok, I see that you did.
Those are native Win32 APIs, C++ will call them much faster than C#.

..35 microseconds is an extremely short time. Even 23 is too short for a
useful benchmark. Run more iterations. In fact, run 50000 iterations
first, ignoring the result, to force .NET to precompile everything. Then
run a half billion or so iterations and compare the results.




Reply With Quote
  #6  
Old   
Ben Voigt
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 05:48 PM



"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote

Quote:
Ben Voigt <rbv (AT) nospam (DOT) nospam> wrote:
This kind of benchmarh is meaningless..
The reason for the huge difference is that the C++ compiler hoists the
loop, as it sees no sensible reason to call an empty function 50000
times,
the C# compiler does not do this, it simply calls the function which
only
contains a ret.

Inlining and optimizing away a call to an empty function is well within
the
capabilities of the CLR JIT.

That was my thought too. I suspect it'll still perform the loop
iteration, however, whereas the C++ compiler may well have removed that
loop completely, which still means it's not a good benchmark.
Oh, and if it's desired not to have the loop optimized away, touch a
volatile variable from inside the function.

Quote:
--
Jon Skeet - <skeet (AT) pobox (DOT) com
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too



Reply With Quote
  #7  
Old   
adhingra
 
Posts: n/a

Default RE: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 06:10 PM



Sorry

I am late with my comments. Shortly after posting this, I realized that this
is a problem with my test as the C++ compiler is optimizing the whole thing
away. (Looked at the disassembly)

However this does not make the benchmark obsolete, rather than measuring the
performance, it actually measured the smartness of the two compilers. I did
some more research and talked to one of my collegeues here at work who is an
expert with C++ and even try making the code do more so that I can fool the
C++ compiler to actually call the function. But the guy is way too smart and
I was told the reason behind this extreme smartness is "Whole Program
Optimization" offered by the VS 2005 Linker.

If the compilation unit is different (i.e. my function is in a different cpp
file) this would not have happened in VS2003, but 2005 is a different beast
of its own with this whole program optimization. The linker no longer just
combine objs anymore, its more like an interpreter now and smart enough to
chip chop objs

But Like Ben pointed out inlining and optimizing are in the feature set of
the Jitter too.
I think I know may be why the Jitter in managed code does not do it because
the Jitter the compiling the one function at a time and it does not have the
luxury due to time constraint to check the whole program and see that the
whether the results of a function are used any where are not.

However I still think it should have jitted away an empty function.

Thanks All
adhingra

Reply With Quote
  #8  
Old   
Barry Kelly
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-19-2007 , 07:06 PM



adhingra wrote:

I wish you wouldn't multipost.

Quote:
However this does not make the benchmark obsolete, rather than measuring the
performance, it actually measured the smartness of the two compilers.
It measured how good the C++ compiler is at doing nothing, versus the
..NET JIT compiler. I agree, C++ is good for nothing.



Quote:
I did
some more research and talked to one of my collegeues here at work who is an
expert with C++ and even try making the code do more so that I can fool the
C++ compiler to actually call the function. But the guy is way too smart and
I was told the reason behind this extreme smartness is "Whole Program
Optimization" offered by the VS 2005 Linker.
..NET necessarily does whole program optimization because compilation
happens so late; but it is constrained by the amount of time it has to
work with - compilation must occur quickly. Performance will improve
over time, when .NET adds techniques that are common in Java, such as
recompiling with more aggressive optimization after many iterations.

-- Barry

--
http://barrkel.blogspot.com/


Reply With Quote
  #9  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default RE: Managed vs Unmanaged Bare Bones Performance Test - 04-20-2007 , 02:25 AM



adhingra <adhingra (AT) discussions (DOT) microsoft.com> wrote:
Quote:
I am late with my comments. Shortly after posting this, I realized that this
is a problem with my test as the C++ compiler is optimizing the whole thing
away. (Looked at the disassembly)

However this does not make the benchmark obsolete, rather than measuring the
performance, it actually measured the smartness of the two compilers.
It measures the smartness of the compilers in *one* particular
situation. Do you often run a loop which does nothing? I know I don't.

<snip>

Quote:
However I still think it should have jitted away an empty function.
I strongly suspect that it did, by inlining. It just didn't optimise
away the loop itself.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #10  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Managed vs Unmanaged Bare Bones Performance Test - 04-20-2007 , 02:27 AM



Barry Kelly <barry.j.kelly (AT) gmail (DOT) com> wrote:

<snip>

Quote:
Performance will improve over time, when .NET adds techniques that
are common in Java, such as recompiling with more aggressive
optimization after many iterations.
It'll be interesting to see whether or not this ever happens. In Java,
it made a huge difference, because by having dynamic optimisation (and
de-optimisation) you can inline virtual methods until they're first
overridden. That's really important when the language makes methods
virtual by default, but not as important in a world which requires you
to specify that methods are virtual (which at least C# does - not sure
about VB.NET).

There are other improvements as well, of course, and it could improve
start-up time (one would hope) but the effects won't be quite as huge
as they were in the Java world.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.