CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Help with weird windows 1252 character

 



brizrobbo
New User

Apr 28, 2009, 11:42 PM

Post #1 of 1 (240 views)
Help with weird windows 1252 character Can't Post

Hi Folks,

I'm just going through a process of writing a routine that will automatically replace any "known" windows 1252 characters with an equivalent HTML encoded character (as I have specified myself). I thought I had it nailed, until I parsed all of our existing HTML pages (thousands, spanning 10 years of development). Then I came across this weird phenomena where I have this character (it "seems" like it is an e acute, but I don't really know what it is!).

On our Sun box, it shows up in a putty terminal as:<h3>Communiqué</h3>

If I "vi" it, it shows up like this:

<h3>Communiqu\303\251</h3>

In my windows text editor (textpad, file encoding is utf-8), it shows up like this:

<h3>Communiqué</h3>

And if I run it through a perl script using Devel::Peek I get this information for "two" characters:

SV = PVIV(0x238efc) at 0x18ebab0
REFCNT = 2
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 195
PV = 0x1909694 "195"\0
CUR = 3
LEN = 4
SV = PVIV(0x238f0c) at 0x18ebabc
REFCNT = 2
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 169
PV = 0x190a7b4 "169"\0
CUR = 3
LEN = 4

Thats it! Two characters for what I thought was one windows-1252 e acute.

Now admitting that my character encoding knowledge is rudimentry, but I'm not understanding this at all. Is it possible to get one character represented by two characters? What am I missing?

Any pointers are gratefully appreciated.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives