CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
disallowing certain strings inside expressions

 



LKM
Novice

Apr 25, 2002, 5:49 AM

Post #1 of 5 (5192 views)
disallowing certain strings inside expressions Can't Post

I'm trying to remove some objects from a PDF, namely those containg the phrase '/Type /Catalog'. All objects start with two numbers and 'obj' and end with 'endobj'. I tried this expression:

$text =~ s/\d+\D\D?\d+\D\D?obj.*?\/Type.\/Catalog.*?endobj//sg;

The problem is that if it doesn't find '/Type /Catalog' in one object, it will just go right ahead and look at the next object, jumping over the first object's 'endobj' and deleting it too, if it finds '/Type /Catalog' followed by endobj. So what I think I need is some way to tell it to exclude anything that has "endobj" between 'obj' and '/Type /Catalog'. Is there any way to make a regular expression that explicitly disallows some string to appear inside something like '.*' ?

Any help is greatly appreciated.


fashimpaur
User

Apr 25, 2002, 9:13 AM

Post #2 of 5 (5189 views)
Re: [LKM] disallowing certain strings inside expressions [In reply to] Can't Post

LKM,

Try this:


Code
  


use
English;


my $string = "34objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj33obj123ZXQT /Foo /Bar PPGHREW123endobj";
$string =~ s/\d{2}obj[^\/]*(\/Type \/Catalog){1}(.*?)endobj//gs;
print "\$MATCH: $MATCH\n";
print $string;

See if this does what you expect.

Good Luck,
Dennis

$a="c323745335d3221214b364d545".
"a362532582521254c3640504c3729".
"2f493759214b3635554c3040606a0",
print unpack"u*",pack "h*",$a,"\n\n";


LKM
Novice

Apr 25, 2002, 9:41 AM

Post #3 of 5 (5185 views)
Re: [fashimpaur] disallowing certain strings inside expressions [In reply to] Can't Post

I haven't been able to test it yet, but I *think* that won't do the trick. The problem occurs when the /Type /Catalog is in the second object, so I think your code will start in the first object, run over its endobj, go through the second object, find /Type /Catalog there, find the endobj and delete both objects. I'm not sure what [^\/]* does, though, so I might be wrong. I'll try it out tomorrow when I get to work.

Thanks for your help!


fashimpaur
User

Apr 25, 2002, 10:09 AM

Post #4 of 5 (5183 views)
Re: [LKM] disallowing certain strings inside expressions [In reply to] Can't Post

LKM,

Here
is a more advanced test:


Code
  

use
English;
my $string = "34objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj33obj123ZXQT /Foo /Bar PPGHREW123endobj".
"32objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj31obj123ZXQT /Foo /Bar PPGHREW123endobj".
"30objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj29obj123ZXQT /Foo /Bar PPGHREW123endobj".
"28objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj27obj123ZXQT /Foo /Bar PPGHREW123endobj".
"26objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj25obj123ZXQT /Foo /Bar PPGHREW123endobj".
"24objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj23obj123ZXQT /Foo /Bar PPGHREW123endobj".
"22objGGTIDWWED/Type /Catalog 23ddswwe3389tendobj21obj123ZXQT /Foo /Bar PPGHREW123endobj";

$string =~ s/\d{2}obj[^\/]*(\/Type \/Catalog){1}(.*?)endobj//gs;
print join("endobj\n", (split ("endobj", $string)));
print "endobj";

print "\n\nReal \$string Value: <".$string.">";



...
and this was the result:


Quote


33
obj123ZXQT /Foo /Bar PPGHREW123endobj
31obj123ZXQT /Foo /Bar PPGHREW123endobj
29obj123ZXQT /Foo /Bar PPGHREW123endobj
27obj123ZXQT /Foo /Bar PPGHREW123endobj
25obj123ZXQT /Foo /Bar PPGHREW123endobj
23obj123ZXQT /Foo /Bar PPGHREW123endobj
21obj123ZXQT /Foo /Bar PPGHREW123endobj

Real $string Value: <33obj123ZXQT /Foo /Bar PPGHREW123endobj31obj123ZXQT /Foo /Bar PPGHREW123endobj29obj123ZXQT /Foo /Bar PPGHREW123endobj27obj123ZXQT /Foo /Bar PPGHREW123endobj25obj123ZXQT /Foo /Bar PPGHREW123endobj23obj123ZXQT /Foo /Bar PPGHREW123endobj21obj123ZXQT /Foo /Bar PPGHREW123endobj>


Please ignore how I split it to print it neatly. This was just to show that in fact this would work for embedded /Type /Category objects as well as leading ones. So, when you get to work tomorrow, you can be more excited that you are
one step closer to solving your programming task.

Glad to help,

Dennis

$a="c323745335d3221214b364d545".
"a362532582521254c3640504c3729".
"2f493759214b3635554c3040606a0",
print unpack"u*",pack "h*",$a,"\n\n";


LKM
Novice

Apr 25, 2002, 10:27 AM

Post #5 of 5 (5180 views)
Re: [fashimpaur] disallowing certain strings inside expressions [In reply to] Can't Post

Yeah, you're right! I still don't understand *why* it's working though. I've tried id using BBEdit, and I've found another problem: If the object looks like this:

obj GGTIDWWED/Foo /Bar /Type /Catalog /Foo /Bar 23ddswwe3389tendobj

it doesn't get found. I think I understand your regular expression now: you allow any character except /. That doesn't work for this example though, there might be / inside the string.

I think I'll have to do it using two expresions, first to get the object and then to look inside it. Thanks anyway!

Lucas

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives