![]() | |
![]() |
| | Thread Tools | Search this Thread | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Do I misunderstand something? This is in C# When reading some UTF-8 from an Access notes field using ADO.NET I get the UTF-8 characters held in a UTF-16 string. In this example I'm going to use the hebrew het character 'ח' character. This is 5D7 unicode and D7 97 UTF-8. Now I can convert the 5D7 character held in a C# string to its corresponding a UTF-8 bytes easily. UnicodeEncoding unicode = new UnicodeEncoding(); UTF8Encoding utf8 = new UTF8Encoding(); string het = "ח"; byte[] UnicodeHet = unicode.GetBytes(het); byte[] UTF8Bytes = Encoding.Convert(unicode,utf8,UnicodeHet); UTF8Bytes is then written to the database. When I read this from the database I get two characters that represent the UTF-8 string held in UTF-16 C# string. I can convert these back to the het character using the following code UnicodeEncoding unicode = new UnicodeEncoding(); UTF8Encoding utf8 = new UTF8Encoding(); Encoding local = Encoding.GetEncoding(1252); string utf8het = "׳—"; //Normally read from the database but hardcoded here byte[] utf8hetbytes = local.GetBytes(utf8het); byte[] utf8result = Encoding.Convert(utf8,unicode,utf8hetbytes); result = unicode.GetString(utf8result); If the code page for the machine is set to 1252 this works correctly. e.g. If the result from the database was a hebrew het character 'ח' character it will return the utf-8 characters D7 97 in the byte sequence, which will be correctly decoded to 5D7 Problem: If I subsequently change the code page of the machine to hebrew, byte[] utf8hetbytes = local.GetBytes(utf8het); will start returning 3F 97. 3F is ? which generally means a translation error has occurred on the character. Why? If I switch to getting the default code page, it always works. Unfortunately it appears the rest of the code (poor) requires 1252. Am I wrong in assuming that if I get 1252 encoding it should not be effected by the code page of the machine? It appears that I am faced with a bit of a major re-work due to this. Is there another way to get the two utf-8 bytes held in a C# string into a byte array without going through a code page? Thanks in advance Alex |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
| |