HighTechTalks DotNet Forums  

viewing Cyrillics in PDF

Dotnet Internationalization microsoft.public.dotnet.internationalization


Discuss viewing Cyrillics in PDF in the Dotnet Internationalization forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
aa
 
Posts: n/a

Default viewing Cyrillics in PDF - 03-11-2006 , 03:51 AM






A UK company made safety data sheets in Russian in PDF format.
I view these files on their computer which was running English w2k and they
were OK

However on my russified w2k with the system code page 866 I am getting that
gobbledegook from the upper part of 1252 coding table

What should I do to view it correctly?



Reply With Quote
  #2  
Old   
Mihai N.
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-12-2006 , 05:48 AM






Quote:
However on my russified w2k with the system code page 866 I am getting that
gobbledegook from the upper part of 1252 coding table
I am not sure what you mean by "russified" ?
For a Russian system, the system code page is 1251, not 866.
Can you explain a bit what you have there?


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email


Reply With Quote
  #3  
Old   
aa
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-12-2006 , 10:50 AM



I have w2k bought in Russia and described as russified. I do not know what
exactly it means, but all the system messages come in Russian.
Russian is set as defalt in Control Panel--> Languages

As to 866 and 1251 - 866 is what is returned against chcp commmand in the
DOS prompt.
If English is set as default, chcp returns 850, not 1252



"Mihai N." <nmihai_year_2000 (AT) yahoo (DOT) com> wrote

Quote:
However on my russified w2k with the system code page 866 I am getting
that
gobbledegook from the upper part of 1252 coding table
I am not sure what you mean by "russified" ?
For a Russian system, the system code page is 1251, not 866.
Can you explain a bit what you have there?


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email



Reply With Quote
  #4  
Old   
aa
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-16-2006 , 04:14 AM



Any more ideas on that?

"aa" <A (AT) aa (DOT) com> wrote

Quote:
I have w2k bought in Russia and described as russified. I do not know what
exactly it means, but all the system messages come in Russian.
Russian is set as defalt in Control Panel--> Languages

As to 866 and 1251 - 866 is what is returned against chcp commmand in
the
DOS prompt.
If English is set as default, chcp returns 850, not 1252



"Mihai N." <nmihai_year_2000 (AT) yahoo (DOT) com> wrote in message
news:Xns97841C8398B26MihaiN (AT) 207 (DOT) 46.248.16...
However on my russified w2k with the system code page 866 I am getting
that
gobbledegook from the upper part of 1252 coding table
I am not sure what you mean by "russified" ?
For a Russian system, the system code page is 1251, not 866.
Can you explain a bit what you have there?


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email





Reply With Quote
  #5  
Old   
Mihai N.
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-17-2006 , 03:07 AM



Quote:
Any more ideas on that?
Not much.

But I would check the following:
- what version of PDF it is (see File-Document properties in Acrobat)
- was it created with Adobe Acrobat, or some cheap clone (also in Document
properties )
- are the fonts embeded?

I am asking because Acrobat got better and better from version to version in
handling foreign languages. And I have seen clones doing poorly in this area.

Mihai

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email


Reply With Quote
  #6  
Old   
aa
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-19-2006 , 05:05 AM



Thanks, Mihai,

1. File-Document properties say:
PDF producer Acrobat PDFWriter 3.02 for Windows
PDF version 1,2 (Acrobat 3,x)
I am viewing it in Acrobat 6

2. I do not know what it was made with. I only know that the file was
created probably 5 years ago by a reputable company which is unlikly to use
clones

3. I do not know if the fonts are embedded. How do I know this?
File-Document properties - Fonts - Fonts used in this documents says that
4 fonts are used in this document:
Arial, Arial Bold, WLCyrillicSans and WLCyrillicSans Bold. (None shows
correctly). Encoding ANSI

If you have time to looks at it, I uploaded the file here:
www.andrei.plus.com/hkw/ccp.pdf



"Mihai N." <nmihai_year_2000 (AT) yahoo (DOT) com> wrote

Quote:
Any more ideas on that?
Not much.

But I would check the following:
- what version of PDF it is (see File-Document properties in Acrobat)
- was it created with Adobe Acrobat, or some cheap clone (also in
Document
properties )
- are the fonts embeded?

I am asking because Acrobat got better and better from version to version
in
handling foreign languages. And I have seen clones doing poorly in this
area.

Mihai

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email



Reply With Quote
  #7  
Old   
Mihai N.
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-20-2006 , 01:40 AM



Quote:
1. File-Document properties say:
PDF producer Acrobat PDFWriter 3.02 for Windows
PDF version 1,2 (Acrobat 3,x)
Wow, this is really old! Acrobat 3 was released in 1996!
And I think support for languages ouside Latin-1 was only added in version 4.


Quote:
3. I do not know if the fonts are embedded. How do I know this?
File-Document properties - Fonts - Fonts used in this documents says that
4 fonts are used in this document:
Arial, Arial Bold, WLCyrillicSans and WLCyrillicSans Bold. (None shows
correctly). Encoding ANSI
Probably the WLCyrillicSans is not Unicode (and most likely the Arial used in
the document is not Unicode).
I cannot see it ok on English XP, with Acrobat 7.

Is is ok if I manage to recover the text out of the thing? Let's say as Word
document? Or you really need to see the original PDF?


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email


Reply With Quote
  #8  
Old   
aa
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-20-2006 , 01:24 PM



Thanks, Mihai
I definitely remember seeing it correctly when I was in the UK several years
ago. I think it was W98 computer or w2000.

Yes it is OK to recover the text out of the thing as Word Document/ Actually
I copy-and-paste in Word - same effect

I did copy-and-paste it into Notepad saved the result as html file both
as ANSI and UTF8, played with charset=windows-1251, utf-8 but nothing helped




"Mihai N." <nmihai_year_2000 (AT) yahoo (DOT) com> wrote

Quote:
1. File-Document properties say:
PDF producer Acrobat PDFWriter 3.02 for Windows
PDF version 1,2 (Acrobat 3,x)
Wow, this is really old! Acrobat 3 was released in 1996!
And I think support for languages ouside Latin-1 was only added in version
4.


3. I do not know if the fonts are embedded. How do I know this?
File-Document properties - Fonts - Fonts used in this documents says
that
4 fonts are used in this document:
Arial, Arial Bold, WLCyrillicSans and WLCyrillicSans Bold. (None shows
correctly). Encoding ANSI
Probably the WLCyrillicSans is not Unicode (and most likely the Arial used
in
the document is not Unicode).
I cannot see it ok on English XP, with Acrobat 7.

Is is ok if I manage to recover the text out of the thing? Let's say as
Word
document? Or you really need to see the original PDF?


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email



Reply With Quote
  #9  
Old   
Mihai N.
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-21-2006 , 03:30 AM



Quote:
Thanks, Mihai
I definitely remember seeing it correctly when I was in the UK several
years ago. I think it was W98 computer or w2000.
I believe you. But there are so many things that matter.
The Acrobat version, language, installed fonts, code pages installed,
OS version, OS language, temperature, wind direction, etc. :-)

Ok, see here: http://www.mihai-nita.net/mix/ccpfix.zip

Secret: copy-paste in Notepad and save as 1252 (for instance on US system),
then open the file as 1250 (for instance on Russian system).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email


Reply With Quote
  #10  
Old   
aa
 
Posts: n/a

Default Re: viewing Cyrillics in PDF - 03-22-2006 , 01:24 AM



Thanks a lot, Mihai? it looks OK now.
I am going to use your technique to recover another 50 files like that.
When you are talking about US system and Russian system - does it mean that
I can get this by setting the English US and then the Russian as default in
Control Panel --->Languges and Standards ?

"Mihai N." <nmihai_year_2000 (AT) yahoo (DOT) com> wrote

Quote:
Thanks, Mihai
I definitely remember seeing it correctly when I was in the UK several
years ago. I think it was W98 computer or w2000.
I believe you. But there are so many things that matter.
The Acrobat version, language, installed fonts, code pages installed,
OS version, OS language, temperature, wind direction, etc. :-)

Ok, see here: http://www.mihai-nita.net/mix/ccpfix.zip

Secret: copy-paste in Notepad and save as 1252 (for instance on US
system),
then open the file as 1250 (for instance on Russian system).

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email



Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.