
brizrobbo
New User
Apr 28, 2009, 11:42 PM
Post #1 of 1
(132 views)
|
|
Help with weird windows 1252 character
|
Can't Post
|
|
Hi Folks, I'm just going through a process of writing a routine that will automatically replace any "known" windows 1252 characters with an equivalent HTML encoded character (as I have specified myself). I thought I had it nailed, until I parsed all of our existing HTML pages (thousands, spanning 10 years of development). Then I came across this weird phenomena where I have this character (it "seems" like it is an e acute, but I don't really know what it is!). On our Sun box, it shows up in a putty terminal as:<h3>Communiqué</h3> If I "vi" it, it shows up like this: <h3>Communiqu\303\251</h3> In my windows text editor (textpad, file encoding is utf-8), it shows up like this: <h3>Communiqué</h3> And if I run it through a perl script using Devel::Peek I get this information for "two" characters: SV = PVIV(0x238efc) at 0x18ebab0 REFCNT = 2 FLAGS = (IOK,POK,pIOK,pPOK) IV = 195 PV = 0x1909694 "195"\0 CUR = 3 LEN = 4 SV = PVIV(0x238f0c) at 0x18ebabc REFCNT = 2 FLAGS = (IOK,POK,pIOK,pPOK) IV = 169 PV = 0x190a7b4 "169"\0 CUR = 3 LEN = 4 Thats it! Two characters for what I thought was one windows-1252 e acute. Now admitting that my character encoding knowledge is rudimentry, but I'm not understanding this at all. Is it possible to get one character represented by two characters? What am I missing? Any pointers are gratefully appreciated.
|