![]() | |
![]() |
| | Thread Tools | Search this Thread | Display Modes |
#1
| |||
| |||
|
|
I'm trying to write a parser for a Interbase stored procedure and I want to identify the fastest way to go through a string one char at a time. As I'm really new to both c# and dotNet, I thought I'll share my findings and ask for comments: I found 4 different methods of going over the string one char at a time, and my results seem somewhat unexpected! [Method 1 - read directly from string - 100% timing] for (int i=0;i<str_len;i++) c = TestString[i] [Method 2 - read using StringBuilder - 200% timing] for (int i=0;i<str_len;i++) c = TestBuilder[i] [Method 3 - read using an CharEnumerator - 512% timing] ch.Reset(); while (ch.MoveNext()) c = ch.Current; [Method 4 - read using a foreach loop - 170% timing] foreach (char c2 in TestString) c = c2 After doing all those tests I ended up with lots of questions: 1) Why is the "foreach" loop slower then the "for" loop? 2) Why is the CharEnumerator so slow? Am I using this class properly? 3) Is my test method fair? (see attachment for my code) |
#2
| |||
| |||
|
|
I think i can answer your first question. everytime you use "foreach" loop, it will create a iterator to loop thru each char in the string, and there is a overhead for create the iterator. In contrast, "for" loop access directly to each char in the string, so it is faster .. You should avoid using foreach loop. |
#3
| |||
| |||
|
|
Hi Cosmin, I think i can answer your first question. everytime you use "foreach" loop, it will create a iterator to loop thru each char in the string, and there is a overhead for create the iterator. In contrast, "for" loop access directly to each char in the string, so it is faster .. You should avoid using foreach loop. Hope its help, Ivan "Cosmin Prund" wrote: I'm trying to write a parser for a Interbase stored procedure and I want to identify the fastest way to go through a string one char at a time. As I'm really new to both c# and dotNet, I thought I'll share my findings and ask for comments: I found 4 different methods of going over the string one char at a time, and my results seem somewhat unexpected! [Method 1 - read directly from string - 100% timing] for (int i=0;i<str_len;i++) c = TestString[i] [Method 2 - read using StringBuilder - 200% timing] for (int i=0;i<str_len;i++) c = TestBuilder[i] [Method 3 - read using an CharEnumerator - 512% timing] ch.Reset(); while (ch.MoveNext()) c = ch.Current; [Method 4 - read using a foreach loop - 170% timing] foreach (char c2 in TestString) c = c2 After doing all those tests I ended up with lots of questions: 1) Why is the "foreach" loop slower then the "for" loop? 2) Why is the CharEnumerator so slow? Am I using this class properly? 3) Is my test method fair? (see attachment for my code) |
#4
| |||
| |||
|
|
Thanks Ivan, I'm learning something every day. It did not occur to me that the c# compiler will use a generic method for implementing the foreach loop on a string, I somehow expected it to be implemented using some compiler magic. I will certainly not use the foreach loop on a string (in this case the "for" loop can be just as easily implemented). |
#5
| |||
| |||
|
|
Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote: Thanks Ivan, I'm learning something every day. It did not occur to me that the c# compiler will use a generic method for implementing the foreach loop on a string, I somehow expected it to be implemented using some compiler magic. I will certainly not use the foreach loop on a string (in this case the "for" loop can be just as easily implemented). Just as easily, but IMO not as readably. If this is something which is going to involve database work, how sure are you that the looping itself is the bottleneck? I note that you're taking the string length first and putting it in a separate variable for your "for" loop by the way rather than using TestString.Length each time - have you evaluated the performance hit of that? Certainly for array access, it would actually be faster to do it the more readable way (without the extra variable). I can only repeat: unless you are *sure* you've found a bottleneck in your code, go for the most readable version. -- Jon Skeet - <skeet (AT) pobox (DOT) com http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too |
#6
| |||
| |||
|
|
The whole "application" I'm working on is an _exercise_ to learn dotNET and c#. As part of this exercise I wanted to find the fastest way to go over a string one char at a time. As all of my benchmarking suggested, the fastest way to do this is to use the "for" loop, not the "foreach" loop nor any other method. As I've sad, this has been a valuable learning experience for me, as I now know how the "foreach" loop is implemented and I've got a fairly good idea as to why the "for" loop is fast. |
|
Moving on toward my real goal (writing the "tokenizer" part of my parser), the "foreach" loop is pretty much unusable because I'm not really going to go over the string in one go - I'll go over part of the string, return a "token" and wait till the parser request one more token. |
|
What would _really_ help would be identifying the _fastest_ way to read those chars out of a string given an char index. As a matter of fact it doesn't have to be a "char" and "string" method - I'm willing to work with char arrays, byte arrays, anything that would improve performance. |
#7
| |||
| |||
|
|
The whole "application" I'm working on is an _exercise_ to learn dotNET and c#. As part of this exercise I wanted to find the fastest way to go over a string one char at a time. As all of my benchmarking suggested, the fastest way to do this is to use the "for" loop, not the "foreach" loop nor any other method. As I've sad, this has been a valuable learning experience for me, as I now know how the "foreach" loop is implemented and I've got a fairly good idea as to why the "for" loop is fast. Moving on toward my real goal (writing the "tokenizer" part of my parser), the "foreach" loop is pretty much unusable because I'm not really going to go over the string in one go - I'll go over part of the string, return a "token" and wait till the parser request one more token. What would _really_ help would be identifying the _fastest_ way to read those chars out of a string given an char index. As a matter of fact it doesn't have to be a "char" and "string" method - I'm willing to work with char arrays, byte arrays, anything that would improve performance. "Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote in message news:MPG.1de6ed0b460045e298ca54 (AT) msnews (DOT) microsoft.com... Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote: Thanks Ivan, I'm learning something every day. It did not occur to me that the c# compiler will use a generic method for implementing the foreach loop on a string, I somehow expected it to be implemented using some compiler magic. I will certainly not use the foreach loop on a string (in this case the "for" loop can be just as easily implemented). Just as easily, but IMO not as readably. If this is something which is going to involve database work, how sure are you that the looping itself is the bottleneck? I note that you're taking the string length first and putting it in a separate variable for your "for" loop by the way rather than using TestString.Length each time - have you evaluated the performance hit of that? Certainly for array access, it would actually be faster to do it the more readable way (without the extra variable). I can only repeat: unless you are *sure* you've found a bottleneck in your code, go for the most readable version. -- Jon Skeet - <skeet (AT) pobox (DOT) com http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too |
#8
| |||
| |||
|
|
For the most part, I agree with everything said here about maintainability, readability, and only evaluating for peformance _where needed_. BUT...I strongly disagree with someone discouraging you to do some investigating on your own when learning something new...that's what makes a good programmer. |
#9
| |||
| |||
|
|
Cosmin, For the most part, I agree with everything said here about maintainability, readability, and only evaluating for peformance _where needed_. BUT...I strongly disagree with someone discouraging you to do some investigating on your own when learning something new...that's what makes a good programmer. All of the techniques discussed are tools, and knowing when/why/how to use each at the right time is extremely beneficial (even in this trivial case). And to be frank, MS has some great programmers, but they're not gods by any means - you should always confirm assumptions when possible, especially when there are a dozen and half ways to do the same thing. Anyway, I found the issue interesting and did some (extreme overkill) investigation myself.... The fastest way to do this, but only by a *fraction* of a millisecond, is by using unsafe code and pointer arithmetic (read: **KILLS** readability and maintainability by most C#/VB.NET programmers), but would be useful if you're needing to parse a lot of string data extremely fast (i.e., messaging applications). Please see my test below, I've uncommented the methods for my timer class (RealTimer), because I don't want to clutter up the posting with that code - it uses the P/Invoke to get to the Win32 performance counters for accurate timing, so just replace it with your own. Results: CharEnumerator method: 0.294171 ms For...Loop method: 0.002235 ms Unsafe method: 0.001956 ms Code <code edited out, see previous post Notes: Since we're really not doing anything with the strings, you need to turn off optimizations as well as do something (like assigning the char to an arbitrary variable as I did above) as the compiler will optimize out the entire operations. You can verify that your code is executing by openging up ILDASM (AWESOME tool) and view the resulting IL code. The instructions of interest for verification are: CharEnumerator method: callvirt instance char [mscorlib]System.CharEnumerator::get_Current() in the IL indicating that enumerator is retrieving the char at the current position, like wise for the other two tests: for...loop: callvirt instance char [mscorlib]System.String::get_Chars(int32) unsafe method: Quite a bit different and too much to post, but you'll see the instructions where the values are being added (add), loaded onto the stack (ldloc) and duplicated on the stack (dup)...another note on this in a minute. You'll notice there are a ton more instructions for the unsafe code, which is unfortunate, and it may be the case that this would not be more performant on some processors....just another thing to think about and keep in your "toolbox", so to speak. Also, a more thorough test would have these operations being performed in a tight loop, calculating the "average" time required to iterate through the string. Hope this helps, and encourages you to keep testing more... Later, Brandon Hamm |
#10
| |||
| |||
|
|
=?Utf-8?B?QnJhbmRvbiBIYW1t?= <lastname_at_roguewave_dot_com>> wrote: For the most part, I agree with everything said here about maintainability, readability, and only evaluating for peformance _where needed_. BUT...I strongly disagree with someone discouraging you to do some investigating on your own when learning something new...that's what makes a good programmer. There's nothing wrong with investigating something - so long as you're very well aware that the results are unlikely to be significant, and may well be invalid in other situations. I would personally prefer to investigate finding the most *readable* ways of doing things - learning patterns rather than micro-optimisation tips. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
| |