HighTechTalks DotNet Forums  

RE: Fast way to read string one char at a time

Dotnet Framework (Performance) microsoft.public.dotnet.framework.performance


Discuss RE: Fast way to read string one char at a time in the Dotnet Framework (Performance) forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
Ivan Wong
 
Posts: n/a

Default RE: Fast way to read string one char at a time - 11-16-2005 , 07:10 AM






Hi Cosmin,

I think i can answer your first question.
everytime you use "foreach" loop, it will create a iterator to loop thru
each char in the string, and there is a overhead for create the iterator. In
contrast, "for" loop access directly to each char in the string, so it is
faster .. You should avoid using foreach loop.

Hope its help,
Ivan

"Cosmin Prund" wrote:

Quote:
I'm trying to write a parser for a Interbase stored procedure and I want to
identify the fastest way to go through a string one char at a time.

As I'm really new to both c# and dotNet, I thought I'll share my findings
and ask for comments:

I found 4 different methods of going over the string one char at a time, and
my results seem somewhat unexpected!

[Method 1 - read directly from string - 100% timing]
for (int i=0;i<str_len;i++) c = TestString[i]


[Method 2 - read using StringBuilder - 200% timing]
for (int i=0;i<str_len;i++) c = TestBuilder[i]

[Method 3 - read using an CharEnumerator - 512% timing]
ch.Reset();
while (ch.MoveNext()) c = ch.Current;


[Method 4 - read using a foreach loop - 170% timing]
foreach (char c2 in TestString) c = c2

After doing all those tests I ended up with lots of questions:
1) Why is the "foreach" loop slower then the "for" loop?
2) Why is the CharEnumerator so slow? Am I using this class properly?
3) Is my test method fair? (see attachment for my code)




Reply With Quote
  #2  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default RE: Fast way to read string one char at a time - 11-16-2005 , 12:45 PM






Ivan Wong <IvanWong (AT) discussions (DOT) microsoft.com> wrote:
Quote:
I think i can answer your first question.
everytime you use "foreach" loop, it will create a iterator to loop thru
each char in the string, and there is a overhead for create the iterator. In
contrast, "for" loop access directly to each char in the string, so it is
faster .. You should avoid using foreach loop.
No, you *shouldn't* avoid using foreach. You should consider (and
investigate the performance) of the different looping methods *in the
places where it matters*. Most situations don't actually require the
most efficient code - making the code readable and maintainable is more
important.

See http://www.interact-sw.co.uk/iangblo...1/09/profiling for a
really good article on this topic.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #3  
Old   
Cosmin Prund
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-17-2005 , 06:02 AM



Thanks Ivan, I'm learning something every day.

It did not occur to me that the c# compiler will use a generic method for
implementing the foreach loop on a string, I somehow expected it to be
implemented using some compiler magic. I will certainly not use the foreach
loop on a string (in this case the "for" loop can be just as easily
implemented).

"Ivan Wong" <IvanWong (AT) discussions (DOT) microsoft.com> wrote

Quote:
Hi Cosmin,

I think i can answer your first question.
everytime you use "foreach" loop, it will create a iterator to loop thru
each char in the string, and there is a overhead for create the iterator.
In
contrast, "for" loop access directly to each char in the string, so it is
faster .. You should avoid using foreach loop.

Hope its help,
Ivan

"Cosmin Prund" wrote:

I'm trying to write a parser for a Interbase stored procedure and I want
to
identify the fastest way to go through a string one char at a time.

As I'm really new to both c# and dotNet, I thought I'll share my findings
and ask for comments:

I found 4 different methods of going over the string one char at a time,
and
my results seem somewhat unexpected!

[Method 1 - read directly from string - 100% timing]
for (int i=0;i<str_len;i++) c = TestString[i]


[Method 2 - read using StringBuilder - 200% timing]
for (int i=0;i<str_len;i++) c = TestBuilder[i]

[Method 3 - read using an CharEnumerator - 512% timing]
ch.Reset();
while (ch.MoveNext()) c = ch.Current;


[Method 4 - read using a foreach loop - 170% timing]
foreach (char c2 in TestString) c = c2

After doing all those tests I ended up with lots of questions:
1) Why is the "foreach" loop slower then the "for" loop?
2) Why is the CharEnumerator so slow? Am I using this class properly?
3) Is my test method fair? (see attachment for my code)






Reply With Quote
  #4  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-17-2005 , 02:40 PM



Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote:
Quote:
Thanks Ivan, I'm learning something every day.

It did not occur to me that the c# compiler will use a generic method for
implementing the foreach loop on a string, I somehow expected it to be
implemented using some compiler magic. I will certainly not use the foreach
loop on a string (in this case the "for" loop can be just as easily
implemented).
Just as easily, but IMO not as readably. If this is something which is
going to involve database work, how sure are you that the looping
itself is the bottleneck?

I note that you're taking the string length first and putting it in a
separate variable for your "for" loop by the way rather than using
TestString.Length each time - have you evaluated the performance hit of
that? Certainly for array access, it would actually be faster to do it
the more readable way (without the extra variable).

I can only repeat: unless you are *sure* you've found a bottleneck in
your code, go for the most readable version.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #5  
Old   
Cosmin Prund
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-18-2005 , 07:00 AM



The whole "application" I'm working on is an _exercise_ to learn dotNET and
c#. As part of this exercise I wanted to find the fastest way to go over a
string one char at a time. As all of my benchmarking suggested, the fastest
way to do this is to use the "for" loop, not the "foreach" loop nor any
other method. As I've sad, this has been a valuable learning experience for
me, as I now know how the "foreach" loop is implemented and I've got a
fairly good idea as to why the "for" loop is fast.

Moving on toward my real goal (writing the "tokenizer" part of my parser),
the "foreach" loop is pretty much unusable because I'm not really going to
go over the string in one go - I'll go over part of the string, return a
"token" and wait till the parser request one more token. What would _really_
help would be identifying the _fastest_ way to read those chars out of a
string given an char index. As a matter of fact it doesn't have to be a
"char" and "string" method - I'm willing to work with char arrays, byte
arrays, anything that would improve performance.

"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote

Quote:
Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote:
Thanks Ivan, I'm learning something every day.

It did not occur to me that the c# compiler will use a generic method for
implementing the foreach loop on a string, I somehow expected it to be
implemented using some compiler magic. I will certainly not use the
foreach
loop on a string (in this case the "for" loop can be just as easily
implemented).

Just as easily, but IMO not as readably. If this is something which is
going to involve database work, how sure are you that the looping
itself is the bottleneck?

I note that you're taking the string length first and putting it in a
separate variable for your "for" loop by the way rather than using
TestString.Length each time - have you evaluated the performance hit of
that? Certainly for array access, it would actually be faster to do it
the more readable way (without the extra variable).

I can only repeat: unless you are *sure* you've found a bottleneck in
your code, go for the most readable version.

--
Jon Skeet - <skeet (AT) pobox (DOT) com
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too



Reply With Quote
  #6  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-18-2005 , 12:43 PM



Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote:
Quote:
The whole "application" I'm working on is an _exercise_ to learn dotNET and
c#. As part of this exercise I wanted to find the fastest way to go over a
string one char at a time. As all of my benchmarking suggested, the fastest
way to do this is to use the "for" loop, not the "foreach" loop nor any
other method. As I've sad, this has been a valuable learning experience for
me, as I now know how the "foreach" loop is implemented and I've got a
fairly good idea as to why the "for" loop is fast.
Well, bear in mind that that may well change in the future -
performance is a bit like that.

You should also bear in mind that very, very few situations will
actually notice *any* difference between the two in terms of
performance, whereas the difference in readability can be quite
significant.

Quote:
Moving on toward my real goal (writing the "tokenizer" part of my parser),
the "foreach" loop is pretty much unusable because I'm not really going to
go over the string in one go - I'll go over part of the string, return a
"token" and wait till the parser request one more token.
In that case, you're not really doing what your subject implies. That's
okay, but it's worth noting because suddenly the readability criteria
change.

Quote:
What would _really_
help would be identifying the _fastest_ way to read those chars out of a
string given an char index. As a matter of fact it doesn't have to be a
"char" and "string" method - I'm willing to work with char arrays, byte
arrays, anything that would improve performance.
Bear in mind that as soon as you change data type, you may find that
foreach is now faster. One thing you almost *certainly* will find if
you use arrays is that:

int len = array.Length;
for (int i=0; i < len; i++)
{
DoSomethingWith(array[i]);
}

is slower than

for (int i=0; i < array.Length; i++)
{
DoSomethingWith(array[i]);
}

Or at least, it was last time I checked...

However, to go back to my main theme - your willingness to completely
change the way you work for the sake of *performance* is worrying - at
least unless you've actually proved you've got a performance problem.
Have you? Are things really working too slowly for you, in a real
situation with real data, running the release build? If not, you
shouldn't be worrying about the performance yet, and you certainly
shouldn't be sacrificing elegance and readability for the sake of
performance.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #7  
Old   
Brandon Hamm
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-22-2005 , 06:07 PM





"Cosmin Prund" wrote:

Quote:
The whole "application" I'm working on is an _exercise_ to learn dotNET and
c#. As part of this exercise I wanted to find the fastest way to go over a
string one char at a time. As all of my benchmarking suggested, the fastest
way to do this is to use the "for" loop, not the "foreach" loop nor any
other method. As I've sad, this has been a valuable learning experience for
me, as I now know how the "foreach" loop is implemented and I've got a
fairly good idea as to why the "for" loop is fast.

Moving on toward my real goal (writing the "tokenizer" part of my parser),
the "foreach" loop is pretty much unusable because I'm not really going to
go over the string in one go - I'll go over part of the string, return a
"token" and wait till the parser request one more token. What would _really_
help would be identifying the _fastest_ way to read those chars out of a
string given an char index. As a matter of fact it doesn't have to be a
"char" and "string" method - I'm willing to work with char arrays, byte
arrays, anything that would improve performance.

"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote in message
news:MPG.1de6ed0b460045e298ca54 (AT) msnews (DOT) microsoft.com...
Cosmin Prund <cosminREMOVE (AT) adicomsft (DOT) NOSPAM.ro> wrote:
Thanks Ivan, I'm learning something every day.

It did not occur to me that the c# compiler will use a generic method for
implementing the foreach loop on a string, I somehow expected it to be
implemented using some compiler magic. I will certainly not use the
foreach
loop on a string (in this case the "for" loop can be just as easily
implemented).

Just as easily, but IMO not as readably. If this is something which is
going to involve database work, how sure are you that the looping
itself is the bottleneck?

I note that you're taking the string length first and putting it in a
separate variable for your "for" loop by the way rather than using
TestString.Length each time - have you evaluated the performance hit of
that? Certainly for array access, it would actually be faster to do it
the more readable way (without the extra variable).

I can only repeat: unless you are *sure* you've found a bottleneck in
your code, go for the most readable version.

--
Jon Skeet - <skeet (AT) pobox (DOT) com
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too



Cosmin,

For the most part, I agree with everything said here about maintainability,
readability, and only evaluating for peformance _where needed_. BUT...I
strongly disagree with someone discouraging you to do some investigating on
your own when learning something new...that's what makes a good programmer.
All of the techniques discussed are tools, and knowing when/why/how to use
each at the right time is extremely beneficial (even in this trivial case).
And to be frank, MS has some great programmers, but they're not gods by any
means - you should always confirm assumptions when possible, especially when
there are a dozen and half ways to do the same thing. Anyway, I found the
issue interesting and did some (extreme overkill) investigation myself....

The fastest way to do this, but only by a *fraction* of a millisecond, is by
using unsafe code and pointer arithmetic (read: **KILLS** readability and
maintainability by most C#/VB.NET programmers), but would be useful if you're
needing to parse a lot of string data extremely fast (i.e., messaging
applications). Please see my test below, I've uncommented the methods for my
timer class (RealTimer), because I don't want to clutter up the posting with
that code - it uses the P/Invoke to get to the Win32 performance counters for
accurate timing, so just replace it with your own.

Results:

CharEnumerator method: 0.294171 ms
For...Loop method: 0.002235 ms
Unsafe method: 0.001956 ms

Code:

class StringIterTest
{
unsafe static void Main(string[] args)
{
string s1 = "A string of ## characters!";

//RealTimer timer = RealTimer.getTimer();

try
{
//timer.reset();

// The '.NET way'
char next;
System.CharEnumerator iter = s1.GetEnumerator();
while(iter.MoveNext())
{
//uncomment to verify
next = iter.Current;
//System.Console.Write(iter.Current);
}
}
finally
{
//double ms = timer.getMilliseconds();
//System.Console.WriteLine("CharEnumerator method: {0} ms",
ms.ToString("f6"));
}

string s2 = "A StrIng of ** CharacTers!";
try
{
//timer.reset();

// for...loop
char next;
for(int i=0; i<s2.Length; ++i)
{
//uncomment to verify
//System.Console.Write(s2[i]);
next = s2[i];
}
}
finally
{
//double ms = timer.getMilliseconds();
//System.Console.WriteLine("For...Loop method: {0} ms", ms.ToString("f6"));
}

string s3 = "A STRING OF @@ CHARACTERS!";

try
{
//timer.reset();

// YEA! Let's use pointers!
fixed(char *c = s3)
{
int i=0;
char *currentChar = c;
char next;

while(i < s3.Length)
{
next = *currentChar++;
// uncomment to verify
//System.Console.Write(next);
i++;
}
}
}
finally
{
//double ms = timer.getMilliseconds();
//System.Console.WriteLine("Unsafe method: {0} ms", ms.ToString("f6"));
}
}
}


Compile with:
csc /out:stringitertest.exe /t:exe /debug- /optimize- /warnaserror /w:4
/unsafe /main:StringIterTest /nologo main.cs

Notes:
Since we're really not doing anything with the strings, you need to turn off
optimizations as well as do something (like assigning the char to an
arbitrary variable as I did above) as the compiler will optimize out the
entire operations. You can verify that your code is executing by openging up
ILDASM (AWESOME tool) and view the resulting IL code. The instructions of
interest for verification are:

CharEnumerator method:
callvirt instance char [mscorlib]System.CharEnumerator::get_Current()

in the IL indicating that enumerator is retrieving the char at the current
position, like wise for the other two tests:

for...loop:
callvirt instance char [mscorlib]System.String::get_Chars(int32)

unsafe method:
Quite a bit different and too much to post, but you'll see the instructions
where the values are being added (add), loaded onto the stack (ldloc) and
duplicated on the stack (dup)...another note on this in a minute.

You'll notice there are a ton more instructions for the unsafe code, which
is unfortunate, and it may be the case that this would not be more performant
on some processors....just another thing to think about and keep in your
"toolbox", so to speak. Also, a more thorough test would have these
operations being performed in a tight loop, calculating the "average" time
required to iterate through the string.

Hope this helps, and encourages you to keep testing more...

Later,
Brandon Hamm




Reply With Quote
  #8  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-23-2005 , 02:02 AM



<=?Utf-8?B?QnJhbmRvbiBIYW1t?= <lastname_at_roguewave_dot_com>> wrote:
Quote:
For the most part, I agree with everything said here about maintainability,
readability, and only evaluating for peformance _where needed_. BUT...I
strongly disagree with someone discouraging you to do some investigating on
your own when learning something new...that's what makes a good programmer.
There's nothing wrong with investigating something - so long as you're
very well aware that the results are unlikely to be significant, and
may well be invalid in other situations.

I would personally prefer to investigate finding the most *readable*
ways of doing things - learning patterns rather than micro-optimisation
tips.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #9  
Old   
Cosmin Prund
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-23-2005 , 05:16 AM



Interesting bits about the unsafe method. I did not think of using any
unsafe code.
Questions:

1) What are some "best practices" about using unsafe code? If possible I'd
like to stay away from any unsafe code, but that's not cast in stone.
2) Is there a tool / way to see JIT-ed code for a test program? I'm
interested in a few things: Does the JIT-er do any "inlining" of method
calls at JIT-ing time? How is unsafe pointer arithmetic transformed into x86
code?

Oh yes, and the difference between the "safe" and the "unsafe" method of
reading chars in your code is not as small as your suggesting, it's a 13%
increase in speed. Also you seem to be using Console.Write(...) to consume
the char's in both reading methods (and that takes constant time) so the
time difference between the reading part of both methods might be more
significant.


"Brandon Hamm" <lastname_at_roguewave_dot_com> wrote

Quote:
Cosmin,

For the most part, I agree with everything said here about
maintainability,
readability, and only evaluating for peformance _where needed_. BUT...I
strongly disagree with someone discouraging you to do some investigating
on
your own when learning something new...that's what makes a good
programmer.
All of the techniques discussed are tools, and knowing when/why/how to use
each at the right time is extremely beneficial (even in this trivial
case).
And to be frank, MS has some great programmers, but they're not gods by
any
means - you should always confirm assumptions when possible, especially
when
there are a dozen and half ways to do the same thing. Anyway, I found the
issue interesting and did some (extreme overkill) investigation myself....

The fastest way to do this, but only by a *fraction* of a millisecond, is
by
using unsafe code and pointer arithmetic (read: **KILLS** readability and
maintainability by most C#/VB.NET programmers), but would be useful if
you're
needing to parse a lot of string data extremely fast (i.e., messaging
applications). Please see my test below, I've uncommented the methods for
my
timer class (RealTimer), because I don't want to clutter up the posting
with
that code - it uses the P/Invoke to get to the Win32 performance counters
for
accurate timing, so just replace it with your own.

Results:

CharEnumerator method: 0.294171 ms
For...Loop method: 0.002235 ms
Unsafe method: 0.001956 ms

Code <code edited out, see previous post

Notes:
Since we're really not doing anything with the strings, you need to turn
off
optimizations as well as do something (like assigning the char to an
arbitrary variable as I did above) as the compiler will optimize out the
entire operations. You can verify that your code is executing by openging
up
ILDASM (AWESOME tool) and view the resulting IL code. The instructions of
interest for verification are:

CharEnumerator method:
callvirt instance char [mscorlib]System.CharEnumerator::get_Current()

in the IL indicating that enumerator is retrieving the char at the current
position, like wise for the other two tests:

for...loop:
callvirt instance char [mscorlib]System.String::get_Chars(int32)

unsafe method:
Quite a bit different and too much to post, but you'll see the
instructions
where the values are being added (add), loaded onto the stack (ldloc) and
duplicated on the stack (dup)...another note on this in a minute.

You'll notice there are a ton more instructions for the unsafe code, which
is unfortunate, and it may be the case that this would not be more
performant
on some processors....just another thing to think about and keep in your
"toolbox", so to speak. Also, a more thorough test would have these
operations being performed in a tight loop, calculating the "average" time
required to iterate through the string.

Hope this helps, and encourages you to keep testing more...

Later,
Brandon Hamm





Reply With Quote
  #10  
Old   
Cosmin Prund
 
Posts: n/a

Default Re: Fast way to read string one char at a time - 11-23-2005 , 05:16 AM



"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote

Quote:
=?Utf-8?B?QnJhbmRvbiBIYW1t?= <lastname_at_roguewave_dot_com>> wrote:
For the most part, I agree with everything said here about
maintainability,
readability, and only evaluating for peformance _where needed_. BUT...I
strongly disagree with someone discouraging you to do some investigating
on
your own when learning something new...that's what makes a good
programmer.

There's nothing wrong with investigating something - so long as you're
very well aware that the results are unlikely to be significant, and
may well be invalid in other situations.

I would personally prefer to investigate finding the most *readable*
ways of doing things - learning patterns rather than micro-optimisation
tips.
With all due respect Jon Skeet, you're turning my original question into
something it's not. My question was about raw speed and the raison different
implementations of what looks like the same thing are performing radically
different. I don't think anyone bases design decisions on
code-speed-profiling alone, but, as Brandon Hamm posted, knowing the toolset
is important.




Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.