CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
(cgi) UNICODE postings

 



yapp
User

Jul 5, 2002, 3:59 AM

Post #1 of 8 (1763 views)
(cgi) UNICODE postings Can't Post

Hello,

At my forum, I use a module that escapes the ASCII codes into HTML escape sequences. For example, < becomes &lt; Besides the normal characters, I also convert all other characters, like and .

This approach clashes with UNICODE language sets, like russian and arab. Those languages use 2-bytes, and perl's s/// only checks one byte each time.

What can I do about this? Limit the replacement (not my favourite idea) to & < and > ? Or is there a way I can see someone posted in UNICODE?, so my s/// handles this.

As attachment, I've included the file converting the characters. (windows users: use wordpad to read the file, to handle unix 'lf' sequences..)

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool
Attachments: EscapeASCII.pm (4.31 KB)


davorg
Thaumaturge / Moderator

Jul 5, 2002, 5:06 AM

Post #2 of 8 (1757 views)
Re: [yapp] (cgi) UNICODE postings [In reply to] Can't Post

s/// should handle Unicode characters ok, but you might need to put "use utf8" at the top of your program.

Might be worth taking a look at perldoc unicode and perldoc utf8.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


yapp
User

Jul 6, 2002, 2:22 AM

Post #3 of 8 (1755 views)
Re: [davorg] (cgi) UNICODE postings [In reply to] Can't Post

Thank you for your quick response.

I wonder, is there any way to see a user posted in utf8?

I have a strong feeling that 'use utf8' does only work on new perl versions, and forces the use of utf8. My program also runs at Perl 5.005_03. Smile And, not all users will post unicode messages...

You explained that s/// should handle this, but maybe the source of my module (attached with first post) could explain the source of this problem.. apparently, it doesn't do that.

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool

(This post was edited by yapp on Jul 6, 2002, 2:24 AM)


davorg
Thaumaturge / Moderator

Jul 6, 2002, 7:13 AM

Post #4 of 8 (1750 views)
Re: [yapp] (cgi) UNICODE postings [In reply to] Can't Post


In Reply To
I wonder, is there any way to see a user posted in utf8?


I'm sorry, but I'm not entirely sure what you mean by that question.


In Reply To
I have a strong feeling that 'use utf8' does only work on new perl versions, and forces the use of utf8. My program also runs at Perl 5.005_03. Smile


utf8 was included in Perl 5.005_03 - see http://www.perldoc.com/perl5.005_03/lib/utf8.html.


In Reply To
And, not all users will post unicode messages...


Ah, but that's the joy of UTF8. The standard ASCII character set is a subset of UTF8, so if you're using UTF8 and your users don't post in two-byte characters then it makes no difference to the way your program works.


In Reply To
You explained that s/// should handle this, but maybe the source of my module (attached with first post) could explain the source of this problem.. apparently, it doesn't do that.


You code make no difference. I'm pretty sure it should all just work if you use the utf8 module.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


yapp
User

Jul 7, 2002, 1:55 AM

Post #5 of 8 (1747 views)
Re: [davorg] (cgi) UNICODE postings [In reply to] Can't Post

Oh Boy! I'll check this out ASAP Cool Thanks

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool


yapp
User

Jul 8, 2002, 2:48 PM

Post #6 of 8 (1742 views)
Re: [davorg] (cgi) UNICODE postings [In reply to] Can't Post

ehhh...

Unsure Didn't really work Shocked

I'll get an error message instead.

Please check it yourself: (open the attachment at post 1, and include the use utf8)

Here is some sample text code:

Code
 
use HTML::EscapeASCII;

my $message = "aaaa <B>bbbb</B> &amp; hallo...";

FormatFieldHTML($message);

print $message;


Honestly, I find this FormatFieldHTML of mine very ugly too.. I created it a year ago, when I was very happy VB6 programmer. Wink

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool

(This post was edited by yapp on Jul 8, 2002, 2:49 PM)


davorg
Thaumaturge / Moderator

Jul 9, 2002, 1:27 AM

Post #7 of 8 (1738 views)
Re: [yapp] (cgi) UNICODE postings [In reply to] Can't Post


In Reply To
ehhh...

Didn't really work

I'll get an error message instead.


Would be useful if you told me what error message you got :)


In Reply To
Please check it yourself: (open the attachment at post 1, and include the use utf8)

Here is some sample text code:

Code
 
use HTML::EscapeASCII;

my $message = "aaaa <B>bbbb</B> &amp; hallo...";

FormatFieldHTML($message);

print $message;



Tried that. Seemed to work ok for me. The output I got was:

Code
aaaa &eacute; &yen; &ntilde; &lt;B&gt;bbbb&lt;/B&gt; &amp;amp; hallo...


What should I be expecting to see?

Isn't there something on the CPAN that already does this? HTML::Entities, perhaps.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


yapp
User

Aug 20, 2002, 1:47 PM

Post #8 of 8 (1722 views)
Re: [davorg] (cgi) UNICODE postings [In reply to] Can't Post

Oh Boy... completely didn't reply to this topic... Blush I'll save you from the details of this...

This is the error message I get:

Malformed UTF-8 character (unexpected continuation byte 0xa2) in regexp compilation at x-modules/HTML/EscapeASCII.pm line 160.

Could you still help me out with this problem? I run with the -w switch and "use strict" BTW.

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool

(This post was edited by yapp on Aug 20, 2002, 1:50 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives