HighTechTalks DotNet Forums  

Encoding difference in Vista breaks my app :(

Dotnet Framework (CLR) microsoft.public.dotnet.framework.clr


Discuss Encoding difference in Vista breaks my app :( in the Dotnet Framework (CLR) forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
Tim_Mac
 
Posts: n/a

Default Encoding difference in Vista breaks my app :( - 10-06-2006 , 10:15 AM






hi,
just getting Vista up and running now for some dev testing, seems great so
far.
i have an app that uses MD5 hashing on the passwords. when i run the app in
Vista, the hashing code below gives a different result to what XP/Server
2003 computes, it is probably because i'm using GetString on binary content
but i'm sure i got this code off an MS sample somewhere...

public static string EncryptMd5(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return encoder.GetString(hashedBytes);
}

when this runs on Vista i get some extra unprintable characters included in
the string, which aren't there when i run it XP. when i debug in Vista,
these additional characters are interleaved between the XP set of
characters, rendered as squares in the debug window. not sure how they will
appear in this post but i'll include them here anyway:
Vista: ��E�Y����j��\fg
XP: EYj g

When i look at the individual bytes before GetString is called, the
unprintable ones are above int 179 which seems to be where the ASCII table
goes beyond normally used characters. ref: http://www.lookuptables.com/

as a short term i hack i can regex out anything above 179 but i would really
like to understand it!

any help is greatly appreciated.
tim


Reply With Quote
  #2  
Old   
Tim_Mac
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-06-2006 , 10:57 AM






i've done some more testing and found that it wasn't safe to discard
anything above ASCII 179.
a working version is to open up the hashed string into a Char array, and
then check if the integer value of each one is 65533. If it is, the
character should be discarded because it would not exist if you run the same
code on Server 2003 or XP.

/// <summary>
/// This function hashes the text to MD5, in binary format
/// </summary>
public static string EncryptMd5Binary(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
string hashedString = encoder.GetString(hashedBytes);
string result = "";
// strip out any chars that int to 65533, this only happens when running
on vista
foreach (char c in hashedString.ToCharArray())
if ((int)c != 65533)
result += c;
return result;
}

can anyone explain what the difference is?
thanks
tim


Reply With Quote
  #3  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 02:59 AM



Tim_Mac <tim.mackey (AT) community (DOT) nospam> wrote:
Quote:
just getting Vista up and running now for some dev testing, seems great so
far.
i have an app that uses MD5 hashing on the passwords. when i run the app in
Vista, the hashing code below gives a different result to what XP/Server
2003 computes, it is probably because i'm using GetString on binary content
but i'm sure i got this code off an MS sample somewhere...

public static string EncryptMd5(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return encoder.GetString(hashedBytes);
}
Well, that code is broken to start with. ComputeHash returns arbitrary
binary data, which is unlikely to be a valid UTF-8 encoded string.

You should use something like base64 encoding to convert *arbitrary*
binary data into a string.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #4  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 03:00 AM



Tim_Mac <tim.mackey (AT) community (DOT) nospam> wrote:
Quote:
i've done some more testing and found that it wasn't safe to discard
anything above ASCII 179.
Just as another point - there's no such thing as ASCII 179. ASCII is a
7-bit encoding.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #5  
Old   
Tim_Mac
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 06:24 AM



hey guys thanks for the clarification. i'll have to keep the code
operational because i can't re-encode the passwords with a correct
algorithm, without knowing the passwords.

jon i suppose you mean 'arbitrary' in a technical sense that i'm not aware
of. since it's MD5 it always returns an identical hash of a given string.
according to the SDK, UTF8Encoding.GetString "Decodes a sequence of bytes
into a string", which is what i want, and it has worked correctly for years
except for this new platform difference between Vista and previous windows
versions.

i have another version of the code which yields Hex digits, and is obviously
much safer to use but i'll have to live with the current setup.

thanks again for your help,
tim


"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote

Quote:
Tim_Mac <tim.mackey (AT) community (DOT) nospam> wrote:
just getting Vista up and running now for some dev testing, seems great
so
far.
i have an app that uses MD5 hashing on the passwords. when i run the app
in
Vista, the hashing code below gives a different result to what XP/Server
2003 computes, it is probably because i'm using GetString on binary
content
but i'm sure i got this code off an MS sample somewhere...

public static string EncryptMd5(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return encoder.GetString(hashedBytes);
}

Well, that code is broken to start with. ComputeHash returns arbitrary
binary data, which is unlikely to be a valid UTF-8 encoded string.

You should use something like base64 encoding to convert *arbitrary*
binary data into a string.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.

--
Jon Skeet - <skeet (AT) pobox (DOT) com
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #6  
Old   
Chris Mullins
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 02:57 PM



"Tim_Mac" <tim.mackey (AT) community (DOT) nospam> wrote
Quote:
[Broken Code on Vista]

public static string EncryptMd5(string text)
{
UTF8Encoding encoder = new UTF8Encoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return encoder.GetString(hashedBytes);
}

I'm afraid your code is broken, and it's got naught to do with Vista. The
bytes being returned from ComputeHash aren't UTF8 (or even UTF16) bytes.
They're just random bytes.

Try using:
public static string EncryptMd5(string text)
{
UnicodeEncoding encoder = new UnicodeEncoding();
MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
byte[] hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(text));
return Convert.ToBase64String(hashedBytes);
}

Also, was there a specific reason you were using the UFT8 encoder to get the
byte array? The native encoding is UTF16 (aka: Unicode Encoding), and if
there's no compelling reason to use soemthing else, you should just use
that.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins




Reply With Quote
  #7  
Old   
Jon Skeet [C# MVP]
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 04:51 PM



Tim_Mac <tim.mackey (AT) community (DOT) nospam> wrote:
Quote:
hey guys thanks for the clarification. i'll have to keep the code
operational because i can't re-encode the passwords with a correct
algorithm, without knowing the passwords.

jon i suppose you mean 'arbitrary' in a technical sense that i'm not aware
of.
I mean that it's binary data with no significance as far as the UTF-8
encoding is concerned. It could be *any* binary data. Not every
sequence of binary data is a valid UTF-8 string.

Quote:
since it's MD5 it always returns an identical hash of a given string.
according to the SDK, UTF8Encoding.GetString "Decodes a sequence of bytes
into a string", which is what i want, and it has worked correctly for years
except for this new platform difference between Vista and previous windows
versions.
It didn't work "correctly". It may have done something repeatable when
presented with a sequence of bytes which was not a valid UTF-8 encoded
string, but I don't believe that behaviour was documented, and I don't
think it's unreasonable to change it in Vista.

It's like relying on the results of GetHashcode from one framework
version (or even one run) to another, or accessing the UI from a
different thread: you may get away with it for a while, but that
doesn't mean the code is correct, or that you should rely on it working
in the future.

Quote:
i have another version of the code which yields Hex digits, and is obviously
much safer to use but i'll have to live with the current setup.
I would start migrating code away from the flawed behaviour ASAP, if I
were you. You may need to support two formats or something like that
for a while, but relying on *just* the band-aid could well cause more
problems down the line.

--
Jon Skeet - <skeet (AT) pobox (DOT) com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
  #8  
Old   
Tim_Mac
 
Posts: n/a

Default Re: Encoding difference in Vista breaks my app :( - 10-07-2006 , 06:48 PM



hi jon,
you're absolutely right. i can implement a correct solution in parallel and
add it in to a second password column in the database. eventually everyone
will have logged in to the site and i can disband the old code and badly
hashed passwords.

thanks for the convincing argument!

tim


"Jon Skeet [C# MVP]" <skeet (AT) pobox (DOT) com> wrote

Quote:
Tim_Mac <tim.mackey (AT) community (DOT) nospam> wrote:
hey guys thanks for the clarification. i'll have to keep the code
operational because i can't re-encode the passwords with a correct
algorithm, without knowing the passwords.

jon i suppose you mean 'arbitrary' in a technical sense that i'm not
aware
of.

I mean that it's binary data with no significance as far as the UTF-8
encoding is concerned. It could be *any* binary data. Not every
sequence of binary data is a valid UTF-8 string.

since it's MD5 it always returns an identical hash of a given string.
according to the SDK, UTF8Encoding.GetString "Decodes a sequence of bytes
into a string", which is what i want, and it has worked correctly for
years
except for this new platform difference between Vista and previous
windows
versions.

It didn't work "correctly". It may have done something repeatable when
presented with a sequence of bytes which was not a valid UTF-8 encoded
string, but I don't believe that behaviour was documented, and I don't
think it's unreasonable to change it in Vista.

It's like relying on the results of GetHashcode from one framework
version (or even one run) to another, or accessing the UI from a
different thread: you may get away with it for a while, but that
doesn't mean the code is correct, or that you should rely on it working
in the future.

i have another version of the code which yields Hex digits, and is
obviously
much safer to use but i'll have to live with the current setup.

I would start migrating code away from the flawed behaviour ASAP, if I
were you. You may need to support two formats or something like that
for a while, but relying on *just* the band-aid could well cause more
problems down the line.

--
Jon Skeet - <skeet (AT) pobox (DOT) com
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.