![]() | |
![]() |
| | Thread Tools | Search this Thread | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#3
| |||
| |||
|
|
Well, technically, the ones that .Net is not marking as punctuation are NOT punctuation. In what sentence do you use > or = or < or << or $ as punctuation? You might check out Char.IsWhiteSpace to take out some of the weird control characters. What exactly are you trying to accomplish? Robin S. ----------------------------------------- "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#4
| |||
| |||
|
|
I agree. The issue here is that there is some existing C++ code that I'm trying to refactor and use within a C# library. I'd like to have equivalent functionality; this is one important aspect of accomplishing that. I could use a C++/CLI module to ensure equivalent behavior, but I'd like to avoid that. Thanks for the response. Jeff "RobinS" <RobinS (AT) NoSpam (DOT) yah.none> wrote in message news:K5GdnTRyv4VX7dLYnZ2dnUVZ_qGdnZ2d (AT) comcast (DOT) com... Well, technically, the ones that .Net is not marking as punctuation are NOT punctuation. In what sentence do you use > or = or < or << or $ as punctuation? You might check out Char.IsWhiteSpace to take out some of the weird control characters. What exactly are you trying to accomplish? Robin S. ----------------------------------------- "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#5
| |||
| |||
|
|
I agree. The issue here is that there is some existing C++ code that I'm trying to refactor and use within a C# library. I'd like to have equivalent functionality; this is one important aspect of accomplishing that. I could use a C++/CLI module to ensure equivalent behavior, but I'd like to avoid that. |
|
Thanks for the response. Jeff "RobinS" <RobinS (AT) NoSpam (DOT) yah.none> wrote in message news:K5GdnTRyv4VX7dLYnZ2dnUVZ_qGdnZ2d (AT) comcast (DOT) com... Well, technically, the ones that .Net is not marking as punctuation are NOT punctuation. In what sentence do you use > or = or < or << or $ as punctuation? You might check out Char.IsWhiteSpace to take out some of the weird control characters. What exactly are you trying to accomplish? Robin S. ----------------------------------------- "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#6
| |||
| |||
|
|
Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#7
| |||
| |||
|
|
Once you hit Unicode land, I think determining punctuation is difficult. There is a good answer though: Stringprep - http://www.ietf.org/rfc/rfc3454.txt Stringprep addresses case folding, whitespace, prohibited characters, bidirectional validity, and normalization form. An example profile is nameprep, which is how Internationalized Domain Names work: http://tools.ietf.org/html/rfc3491 Another example profile is "resourceprep" which is part of the XMPP standard: http://www.xmpp.org/internet-drafts/...ceprep-03.html For example, this profile prohibits all characters in : Table C.1.2 Table C.2.1 Table C.2.2 Table C.3 Table C.4 Table C.5 Table C.6 Table C.7 Table C.8 Table C.9 It specifies unicode normalication form KC, and that bidirectional checking must be performed. -- Chris Mullins "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#8
| |||
| |||
|
|
I should add there is an open-source C# implementation of stringprep that part of libidn. This implementation is a bit memory hungry, and not exactly tuned for optimal performance, but it works. -- Chris Mullins, MCSD.NET, MCPD:Enterprise http://www.coversant.net/blogs/cmullins "Chris Mullins" <cmullins (AT) yahoo (DOT) com> wrote in message news:u$qVDStAHHA.2276 (AT) TK2MSFTNGP03 (DOT) phx.gbl... Once you hit Unicode land, I think determining punctuation is difficult. There is a good answer though: Stringprep - http://www.ietf.org/rfc/rfc3454.txt Stringprep addresses case folding, whitespace, prohibited characters, bidirectional validity, and normalization form. An example profile is nameprep, which is how Internationalized Domain Names work: http://tools.ietf.org/html/rfc3491 Another example profile is "resourceprep" which is part of the XMPP standard: http://www.xmpp.org/internet-drafts/...ceprep-03.html For example, this profile prohibits all characters in : Table C.1.2 Table C.2.1 Table C.2.2 Table C.3 Table C.4 Table C.5 Table C.6 Table C.7 Table C.8 Table C.9 It specifies unicode normalication form KC, and that bidirectional checking must be performed. -- Chris Mullins "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
#9
| |||
| |||
|
|
Thanks, all. This is all good stuff. What I was trying to do was to mimic the behavior of iswpunct (and therefore the existing code). PInvoking iswpunct seems reasonable, provided that I know that that DLL is going to be there. |
|
- jp "Chris Mullins" <cmullins (AT) yahoo (DOT) com> wrote in message news:uY19mXtAHHA.144 (AT) TK2MSFTNGP02 (DOT) phx.gbl... I should add there is an open-source C# implementation of stringprep that part of libidn. This implementation is a bit memory hungry, and not exactly tuned for optimal performance, but it works. -- Chris Mullins, MCSD.NET, MCPD:Enterprise http://www.coversant.net/blogs/cmullins "Chris Mullins" <cmullins (AT) yahoo (DOT) com> wrote in message news:u$qVDStAHHA.2276 (AT) TK2MSFTNGP03 (DOT) phx.gbl... Once you hit Unicode land, I think determining punctuation is difficult. There is a good answer though: Stringprep - http://www.ietf.org/rfc/rfc3454.txt Stringprep addresses case folding, whitespace, prohibited characters, bidirectional validity, and normalization form. An example profile is nameprep, which is how Internationalized Domain Names work: http://tools.ietf.org/html/rfc3491 Another example profile is "resourceprep" which is part of the XMPP standard: http://www.xmpp.org/internet-drafts/...ceprep-03.html For example, this profile prohibits all characters in : Table C.1.2 Table C.2.1 Table C.2.2 Table C.3 Table C.4 Table C.5 Table C.6 Table C.7 Table C.8 Table C.9 It specifies unicode normalication form KC, and that bidirectional checking must be performed. -- Chris Mullins "Jeff Pek (Autodesk)" <jeff.pek (AT) nospam (DOT) autodesk.com> wrote in message news:%233seADaAHHA.5060 (AT) TK2MSFTNGP02 (DOT) phx.gbl... Hi all - A Kb article indicates that Char.IsPunctuation is the "equivalent" of the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found significant differences in their behaviors. As a test, I ran each function through the first 1000 or so unicode characters, and found the results that follow. It identifies that characters for which the 2 functions returned different results, and shows what the .NET method said. I'm sure there are other differences later on in the character set. So far, I haven't seen any documentation regarding the specific differences. I wonder if anything exists. Thanks for any pointers. Regards, Jeff -------- IsPunctuation mismatch: ! (33). .NET says: True IsPunctuation mismatch: " (34). .NET says: True IsPunctuation mismatch: # (35). .NET says: True IsPunctuation mismatch: $ (36). .NET says: False IsPunctuation mismatch: % (37). .NET says: True IsPunctuation mismatch: & (38). .NET says: True IsPunctuation mismatch: ' (39). .NET says: True IsPunctuation mismatch: ( (40). .NET says: True IsPunctuation mismatch: ) (41). .NET says: True IsPunctuation mismatch: * (42). .NET says: True IsPunctuation mismatch: + (43). .NET says: False IsPunctuation mismatch: , (44). .NET says: True IsPunctuation mismatch: - (45). .NET says: True IsPunctuation mismatch: . (46). .NET says: True IsPunctuation mismatch: / (47). .NET says: True IsPunctuation mismatch: : (58). .NET says: True IsPunctuation mismatch: ; (59). .NET says: True IsPunctuation mismatch: < (60). .NET says: False IsPunctuation mismatch: = (61). .NET says: False IsPunctuation mismatch: > (62). .NET says: False IsPunctuation mismatch: ? (63). .NET says: True IsPunctuation mismatch: @ (64). .NET says: True IsPunctuation mismatch: [ (91). .NET says: True IsPunctuation mismatch: \ (92). .NET says: True IsPunctuation mismatch: ] (93). .NET says: True IsPunctuation mismatch: ^ (94). .NET says: False IsPunctuation mismatch: _ (95). .NET says: True IsPunctuation mismatch: ` (96). .NET says: False IsPunctuation mismatch: { (123). .NET says: True IsPunctuation mismatch: | (124). .NET says: False IsPunctuation mismatch: } (125). .NET says: True IsPunctuation mismatch: ~ (126). .NET says: False IsPunctuation mismatch: * (161). .NET says: True IsPunctuation mismatch: > (162). .NET says: False IsPunctuation mismatch: o (163). .NET says: False IsPunctuation mismatch: (164). .NET says: False IsPunctuation mismatch: (165). .NET says: False IsPunctuation mismatch: (166). .NET says: False IsPunctuation mismatch: (167). .NET says: False IsPunctuation mismatch: " (168). .NET says: False IsPunctuation mismatch: c (169). .NET says: False IsPunctuation mismatch: (170). .NET says: False IsPunctuation mismatch: (171). .NET says: True IsPunctuation mismatch: (172). .NET says: False IsPunctuation mismatch: - (173). .NET says: True IsPunctuation mismatch: r (174). .NET says: False IsPunctuation mismatch: _ (175). .NET says: False IsPunctuation mismatch: (176). .NET says: False IsPunctuation mismatch: (177). .NET says: False IsPunctuation mismatch: (178). .NET says: False IsPunctuation mismatch: 3 (179). .NET says: False IsPunctuation mismatch: ' (180). .NET says: False IsPunctuation mismatch: (181). .NET says: False IsPunctuation mismatch: (182). .NET says: False IsPunctuation mismatch: (183). .NET says: True IsPunctuation mismatch: , (184). .NET says: False IsPunctuation mismatch: 1 (185). .NET says: False IsPunctuation mismatch: (186). .NET says: False IsPunctuation mismatch: (187). .NET says: True IsPunctuation mismatch: (188). .NET says: False IsPunctuation mismatch: (189). .NET says: False IsPunctuation mismatch: _ (190). .NET says: False IsPunctuation mismatch: (191). .NET says: True IsPunctuation mismatch: x (215). .NET says: False IsPunctuation mismatch: (247). .NET says: False IsPunctuation mismatch: ; (894). .NET says: True IsPunctuation mismatch: ? (903). .NET says: True |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
| |