![]() | |
![]() |
| | Thread Tools | Search this Thread | Display Modes |
#1
| ||||
| ||||
|
|
| | | | | | V | V memory input written back to heap output |
|
| | | | | | V | V memory input written back to heap output |


|
CPU register | Fast! | V memory input |
|
CPU register | Fast! | V memory input output |
#2
| |||||
| |||||
|
|
We ran into a pretty significant performance penalty when casting floats. |
|
We've identified a code workaround that we wanted to pass along but also was wondering if others had experience with this and if there is a better solution. |
|
I'd like to share findings regarding C# (float) cast. As we convert double to float, we found several slow down issues. We realized C# (float) cast can be costly if not used appropriately. |
|
In my understanding and articles on the Net, the slow down comes from writing intermediate value back to memory as follows. The extra trips are costly. |
|
Otherwise, we will need to optimize our code by hand using temporary variable technique as in the example. Well, we have many instances of this kind of "inline" casts in our code. |
#3
| |||
| |||
|
|
Arnie <jefferyronaldarnett (AT) msn (DOT) com> wrote: We ran into a pretty significant performance penalty when casting floats. To be honest, it doesn't really sound that significant to me. Read on... We've identified a code workaround that we wanted to pass along but also was wondering if others had experience with this and if there is a better solution. snip I'd like to share findings regarding C# (float) cast. As we convert double to float, we found several slow down issues. We realized C# (float) cast can be costly if not used appropriately. snip In my understanding and articles on the Net, the slow down comes from writing intermediate value back to memory as follows. The extra trips are costly. I see no reason to believe that there's an extra value written to the *heap* (rather than the stack), and no reason why the JIT shouldn't use a register for the intermediate value without an explicit local variable. snip I have included a short but complete program below which uses an array of a million elements and iterates each method a thousand times. Here are the results on my laptop: Log10Fast: 64489ms Log10Slow: 70420ms CopyFast: 3841ms CopySlow: 4070ms So your optimisation improves things by about 10% for the Log10 case and about 5% for the Copy case. Otherwise, we will need to optimize our code by hand using temporary variable technique as in the example. Well, we have many instances of this kind of "inline" casts in our code. And have you any reason to believe that's *actually* the bottleneck in your code? Do you regularly convert a billion floats and care about 200ms of performance loss? I don't understand why the results are as they are (it would be worth looking at the JITted, optimised code to find out) - but even so, I certainly wouldn't start micro-optimising all over the place. Find out where the *actual* bottleneck in your code is, and consider reducing readability/simplicity for the sake of performance just in the most significant parts. Don't start doing it all over the place, which sounds like the course of action you're considering at the moment. -- Jon Skeet - <skeet (AT) pobox (DOT) com http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk |
#4
| ||||
| ||||
|
|
This is a mature system where they are "ringing" out the last bit of performance. |
|
It is a scientific test insrument (spectrum analyzer) so they are acquiring and converting extremely large chunks of data (wave forms). Some runs can acquire as much as 500MB of data at a time. |
|
So they have "progressed" to the point where they are looking at the right optimization spots in their code. Casting from double[] is indeed 2x as slow without the optimization and quite different then the 5-10% case you demonstrated. |
|
Again the only thing they changed was assigning a local variable hence their curiosity in what the C#/jit compiler is doing. I think our premise is that given this single change .... it would seem that the there would be no performance difference if the compiler were taking advantage of every reasonable performance optimization. Time to look at IL as see what is going on. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
| |