HighTechTalks DotNet Forums  

Removing BOF from a utf8 file

Dotnet Internationalization microsoft.public.dotnet.internationalization


Discuss Removing BOF from a utf8 file in the Dotnet Internationalization forum.



Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old   
aa
 
Posts: n/a

Default Removing BOF from a utf8 file - 10-02-2004 , 12:08 PM






I have a utf-8 PHP file handled with Notapad (w2k), which suddenly started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning of
the file, which, I guess, is BOM
How do I remove it?



Reply With Quote
  #2  
Old   
Jochen Kalmbach
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-02-2004 , 12:23 PM






aa wrote:

Quote:
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly
started showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the
beginning of the file, which, I guess, is BOM
How do I remove it?
Open it in notepad and save it as "ANSI".

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/


Reply With Quote
  #3  
Old   
aa
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-02-2004 , 12:58 PM



Open it in notepad and save it as "ANSI".

Then I will loose all the non-ANSI data?



Reply With Quote
  #4  
Old   
Jochen Kalmbach
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-02-2004 , 01:07 PM



aa wrote:

Quote:
Open it in notepad and save it as "ANSI".

Then I will loose all the non-ANSI data?
Yes.

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/


Reply With Quote
  #5  
Old   
Michael \(michka\) Kaplan [MS]
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-02-2004 , 02:11 PM



The BOM is not visible in Internet Explorer any time that either:

a) IE recognizes the file format (which is to say, usually), or

b) the code point is in the font as a ZERO WIDTH NO BREAK SPACE (which is
again to say, usually)

You can try right-clicking on the page and verifying the encoding in the
[unlikely] event that both (A) and (B) are not true.


--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.


"aa" <aa (AT) virgin (DOT) net> wrote

Quote:
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning of
the file, which, I guess, is BOM
How do I remove it?





Reply With Quote
  #6  
Old   
Joerg Jooss
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-02-2004 , 03:10 PM




"aa" <aa (AT) virgin (DOT) net> schrieb im Newsbeitrag
news:O9X$LnJqEHA.1576 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
Quote:
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning of
the file, which, I guess, is BOM
How do I remove it?
Certain editors like SciTE allow you to save UTF-8 files either with our
without BOM.

Cheers,

--
Joerg Jooss
joerg.jooss (AT) gmx (DOT) net




Reply With Quote
  #7  
Old   
aa
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-03-2004 , 06:27 AM



Thanks,
right-clicking --> encoding shows Unicode (UTF-8)
However the file in question is a PHP file wich includes another PHP UTF-8
files at the very begining ising the PHP operator include.
That second file, I guess, has its own BOF which is located somewhere after
the first BOF, which might render it visible in the browser
In a non-Unicode text editor this shows up as 

Some time ago I run across similat problem with ASP, but cannot remember how
I got round it.

"Michael (michka) Kaplan [MS]" <michkap (AT) online (DOT) microsoft.com> wrote in
message news:%23o9vDtKqEHA.556 (AT) TK2MSFTNGP11 (DOT) phx.gbl...
Quote:
The BOM is not visible in Internet Explorer any time that either:

a) IE recognizes the file format (which is to say, usually), or

b) the code point is in the font as a ZERO WIDTH NO BREAK SPACE (which is
again to say, usually)

You can try right-clicking on the page and verifying the encoding in the
[unlikely] event that both (A) and (B) are not true.


--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.


"aa" <aa (AT) virgin (DOT) net> wrote in message
news:O9X$LnJqEHA.1576 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly
started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning
of
the file, which, I guess, is BOM
How do I remove it?







Reply With Quote
  #8  
Old   
Michael \(michka\) Kaplan [MS]
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-03-2004 , 09:37 PM



Those are the bytes of a BOM -- and what they would look like if it was not
detected as UTF-8 (which is not an issue in IE, by your own admission).

If you are combining files in something and no one is removing the
superfluous BOM then make sure you see it with a font that recognizes it is
a ZERO WIDTH NO BREAK SPACE. I know that it can read Unicode in UTF-8 (since
you claim there are many international characters in the file?).

In other words, everything you have discussed so far should have no problem.
Eventually you masy need to ask the question a more relevant forum for the
the responsible technology (PHP?).


--
MichKa [MS]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.



"aa" <aa (AT) virgin (DOT) net> wrote

Quote:
Thanks,
right-clicking --> encoding shows Unicode (UTF-8)
However the file in question is a PHP file wich includes another PHP
UTF-8
files at the very begining ising the PHP operator include.
That second file, I guess, has its own BOF which is located somewhere
after
the first BOF, which might render it visible in the browser
In a non-Unicode text editor this shows up as 

Some time ago I run across similat problem with ASP, but cannot remember
how
I got round it.

"Michael (michka) Kaplan [MS]" <michkap (AT) online (DOT) microsoft.com> wrote in
message news:%23o9vDtKqEHA.556 (AT) TK2MSFTNGP11 (DOT) phx.gbl...
The BOM is not visible in Internet Explorer any time that either:

a) IE recognizes the file format (which is to say, usually), or

b) the code point is in the font as a ZERO WIDTH NO BREAK SPACE (which
is
again to say, usually)

You can try right-clicking on the page and verifying the encoding in the
[unlikely] event that both (A) and (B) are not true.


--
MichKa [MS]
NLS Collation/Locale/Keyboard Development
Globalization Infrastructure and Font Technologies
Windows International Division

This posting is provided "AS IS" with
no warranties, and confers no rights.


"aa" <aa (AT) virgin (DOT) net> wrote in message
news:O9X$LnJqEHA.1576 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly
started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the
beginning
of
the file, which, I guess, is BOM
How do I remove it?









Reply With Quote
  #9  
Old   
Jeremy Pullicino
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-11-2004 , 07:45 AM



Is the BOM the special hex numbers found at the begining of UTF-8 files when
saved with notepad?

Are UTF-8 text files with no BOM valid utf-8? If so, how can my application
detect that a file is in UTF-8 or ANSI?

Jeremy.

"Joerg Jooss" <joerg.jooss (AT) gmx (DOT) net> wrote

Quote:
"aa" <aa (AT) virgin (DOT) net> schrieb im Newsbeitrag
news:O9X$LnJqEHA.1576 (AT) TK2MSFTNGP12 (DOT) phx.gbl...
I have a utf-8 PHP file handled with Notapad (w2k), which suddenly
started
showing a gap at the top of the IE6 screen.
Viewing the source code via View-Source shows a square it the beginning
of
the file, which, I guess, is BOM
How do I remove it?

Certain editors like SciTE allow you to save UTF-8 files either with our
without BOM.

Cheers,

--
Joerg Jooss
joerg.jooss (AT) gmx (DOT) net





Reply With Quote
  #10  
Old   
Joerg Jooss
 
Posts: n/a

Default Re: Removing BOF from a utf8 file - 10-11-2004 , 04:25 PM



Jeremy Pullicino wrote:
Quote:
Is the BOM the special hex numbers found at the begining of UTF-8
files when saved with notepad?
Yes. Notepad always prepends the BOM.

Quote:
Are UTF-8 text files with no BOM valid utf-8?
Yes. A UTF-8 BOM is optional.

Quote:
If so, how can my
application detect that a file is in UTF-8 or ANSI?
That's impossible. Even a BOM is a valid (though rather likely meaningless)
character sequence in ANSI (and the next question would be what's ANSI?
Windows 1252? Windows 1250?).

Cheers,

--
Joerg Jooss
www.joergjooss.de
news (AT) joergjooss (DOT) de




Reply With Quote
Reply




Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.