![]() | |
![]() |
| | Thread Tools | Search this Thread | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
At our company we are currently at a decisive point to choose between managed and unmanaged code on the basis of their performance. I have read stuff about this on various blogs and other websites. Then I decided to take my own test as I am more concerned with basic performance at this point. By basic I mean, just the basic stuff inside the CLR i.e. function calling cost, for loop, variable declaration, etc. Let us not consider GC, memory allocation costs, etc. To my surprise the managed code I generated in my test through C# was lagging behind to a considerable degree when compared with the code generated by the C++ compiler. I was wondering if someone can take a quick look at this and tell me why is this the case. I was under the assumption, once the JIT happens, the CLR virtual machine and JIT will give the same performance as native C++ compiler does (as we are talking basic stuff only - no objects, just pure language constructs and primitive data types). I created two sample console applications (one in C# and other in C++). They both call a function passing an int by value from inside a for loop. Nothing happens inside the function. I used QueryPerformance.... apis for measurement. (Code is pasted at the bottom of this posting). Here are the results (for release mode running from console, with default settings in the IDE) C# Test for loop (50000 iterations) 0.000023931 (23 micro seconds) C++ Test for loop (50000 iterations) 0.000000350 (0.35 micro seconds) So its like C++ compiler is about 20 times faster than the managed CLR Jitter. And if I also remove time taken for the QueryPerf...... apis then the diff is even more Can anyone please elaborate. Thanks adhingra =========================================== C# Code PROGRAM.CS =========================================== using System; using System.Collections.Generic; using System.Text; using System.Runtime.InteropServices; namespace ConsoleApp { class Program { //API declarations for frequency timers [DllImport("kernel32.dll")] extern static short QueryPerformanceCounter(ref long x); [DllImport("kernel32.dll")] extern static short QueryPerformanceFrequency(ref long x); static long m_lStart = 0, m_lStop = 0, m_lFreq = 0; static long m_lOverhead = 0; static decimal m_mTotalTime = 0; static void Main(string[] args) { //get the CPU frequency QueryPerformanceFrequency(ref m_lFreq); //record the overhead for calling the performance counter API QueryPerformanceCounter(ref m_lStart); QueryPerformanceCounter(ref m_lStop); m_lOverhead = m_lStop - m_lStart; Console.WriteLine("Starting with a simple For Loop calling a simple function"); QueryPerformanceCounter(ref m_lStart); for (int i = 0; i < 50000; i++) { Run(i); } QueryPerformanceCounter(ref m_lStop); long lDiff = m_lStop - m_lStart; Console.WriteLine(lDiff); //Comment or Uncomment the overhead lines to see the times drop // //if (lDiff > m_lOverhead) //{ // lDiff = lDiff - m_lOverhead; //} m_mTotalTime = ((Decimal)lDiff)/((Decimal)m_lFreq); Console.WriteLine(m_mTotalTime); Console.WriteLine("Press Enter to Continue"); Console.ReadLine(); } static void Run(int i) { //Console.WriteLine(i); } } } =============================================== C++ Code ConsoleApp.cpp =============================================== // ConsoleApp.cpp : Defines the entry point for the console application. // #include "stdafx.h" void Run(int i) { //printf("%d\n",i); } int _tmain(int argc, _TCHAR* argv[]) { LARGE_INTEGER m_start, m_stop, m_freq; ::QueryPerformanceFrequency(&m_freq); //record the overhead for calling the performance counter API ::QueryPerformanceCounter(&m_start); ::QueryPerformanceCounter(&m_stop); LONGLONG m_overhead = m_stop.QuadPart - m_start.QuadPart; m_start.QuadPart = 0; m_stop.QuadPart = 0; printf("%s\n","Starting with a simple For Loop calling a simple function"); QueryPerformanceCounter(&m_start); for (int i = 0; i < 50000; i++) { Run(i); } QueryPerformanceCounter(&m_stop); LONGLONG lDiff = m_stop.QuadPart - m_start.QuadPart; printf("%d\n",lDiff); //Comment or Uncomment the overhead lines to see the times drop // //if (lDiff > m_overhead) //{ // lDiff = lDiff - m_overhead; //} double totalTime = ((double)lDiff) / ((double)m_freq.QuadPart); printf("%15.15f\n",totalTime); printf("%s", "Press Enter to Continue"); int c = getchar(); return 0; } |
#3
| |||
| |||
|
|
This kind of benchmarh is meaningless.. The reason for the huge difference is that the C++ compiler hoists the loop, as it sees no sensible reason to call an empty function 50000 times, the C# compiler does not do this, it simply calls the function which only contains a ret. |
|
So what you are comparing is the time taken for a return from QueryPerformanceCounter plus the time to call QueryPerformanceCounter, against a the time taken to call 50000 times an empty function. Willy. |
#4
| |||
| |||
|
|
This kind of benchmarh is meaningless.. The reason for the huge difference is that the C++ compiler hoists the loop, as it sees no sensible reason to call an empty function 50000 times, the C# compiler does not do this, it simply calls the function which only contains a ret. Inlining and optimizing away a call to an empty function is well within the capabilities of the CLR JIT. |
#5
| |||
| |||
|
|
Here are the results (for release mode running from console, with default settings in the IDE) C# Test for loop (50000 iterations) 0.000023931 (23 micro seconds) C++ Test for loop (50000 iterations) 0.000000350 (0.35 micro seconds) So its like C++ compiler is about 20 times faster than the managed CLR Jitter. And if I also remove time taken for the QueryPerf...... apis then the diff is even more |
#6
| |||
| |||
|
|
Ben Voigt <rbv (AT) nospam (DOT) nospam> wrote: This kind of benchmarh is meaningless.. The reason for the huge difference is that the C++ compiler hoists the loop, as it sees no sensible reason to call an empty function 50000 times, the C# compiler does not do this, it simply calls the function which only contains a ret. Inlining and optimizing away a call to an empty function is well within the capabilities of the CLR JIT. That was my thought too. I suspect it'll still perform the loop iteration, however, whereas the C++ compiler may well have removed that loop completely, which still means it's not a good benchmark. |
|
-- Jon Skeet - <skeet (AT) pobox (DOT) com http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too |
#7
| |||
| |||
|
#8
| |||
| |||
|
|
However this does not make the benchmark obsolete, rather than measuring the performance, it actually measured the smartness of the two compilers. |

|
I did some more research and talked to one of my collegeues here at work who is an expert with C++ and even try making the code do more so that I can fool the C++ compiler to actually call the function. But the guy is way too smart and I was told the reason behind this extreme smartness is "Whole Program Optimization" offered by the VS 2005 Linker. |
#9
| |||
| |||
|
|
I am late with my comments. Shortly after posting this, I realized that this is a problem with my test as the C++ compiler is optimizing the whole thing away. (Looked at the disassembly) However this does not make the benchmark obsolete, rather than measuring the performance, it actually measured the smartness of the two compilers. |
|
However I still think it should have jitted away an empty function. |
#10
| |||
| |||
|
|
Performance will improve over time, when .NET adds techniques that are common in Java, such as recompiling with more aggressive optimization after many iterations. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
| |