CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
This is annoying me. I'm sure I'm f@#ing it up, please correct me.

 



rmarc
New User

Apr 4, 2014, 10:07 PM

Post #1 of 6 (9575 views)
This is annoying me. I'm sure I'm f@#ing it up, please correct me. Can't Post


Code
cat /tmp/stuff |  perl -e  ' 
my %urlhash ;
while (<STDIN>) {
my ($url) = $_ =~ /https?:\/\/(\S+)\//;
print "$url\n";
if ($urlhash{$url}) {
$urlhash{$url}++;
} else {
$urlhash{$url} = 1;
}
} '

ib.adnxs.com/ttj?ttjb=1&bdref=http%3A%2F%2Fnym1.b.adnxs.com%2Fif%3Fenc%3DuB6F61G4nj-4HoXrUbieP39qvHSTGMQ_uB6F61G4nj-4HoXrUbieP4lItL9PFAlrSeCjkBlH1Hsauj5TAAAAAFyWHwBAAgAAQAIAAAIAAABnK78AFycFAAAAAQBVU0QAVVNEAKAAWAImEwAADJ4AAgQCAQIAAIwAeyTbagAAAAA.%26cnd%3D%25217iWHmAjcib0BEOfW_AUYACCXzhQwADimphhABEjABFDcrH5YAGDJBGgAcAB4AIABAIgBAJABAZgBAaABAagBA7ABALkBuB6F61G4nj_BAbgehetRuJ4_yQEltGaNUfL7P9kBAAAAAAAA8D_gAQD1AQAAAAA.%26ccd%3D%2521rwYRPwjcib0BEOfW_AUYl84UIAQ.%26udj%3Duf%2528%2527a%2527%252C%2B54560%252C%2B1396619802%2529%253Buf%2528%2527r%2527%252C%2B12528487%252C%2B1396619802%2529%253B%26vpid%3D78%26apid%3D223406%26referrer%3Dhttp%253A%252F%252Fwww.chancese.com%252Findex.php%253Foption%253Dcom_content%2526view%253Darticle%2526id%253D1367%253AThe-Disadvantages-of-Working-as-a-Team%2526catid%253D27%2526Itemid%253D67%26ct%3D0%26dlo%3D1&id=2263489&cb=[CACHEBUSTER]&pubclick=http://nym1.b.adnxs.com/click?uB6F61G4nj-4HoXrUbieP39qvHSTGMQ_uB6F61G4nj-4HoXrUbieP4lItL9PFAlrSeCjkBlH1Hsauj5TAAAAAFyWHwBAAgAAQAIAAAIAAABnK78AFycFAAAAAQBVU0QAVVNEAKAAWAImEwAADJ4DAQQCAQIAAIwAfSQAawAAAAA./cnd=%21rwYRPwjcib0BEOfW_AUYl84UIAQ./referrer=http%3A%2F%2Fwww.chancese.com%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D1367%3AThe-Disadvantages-of-Working-as-a-Team%26catid%3D27%26Itemid%3D67


It works fine if I do this:

Code
cat /tmp/stuff |  perl -e 'my %urlhash ;  
while (<STDIN>) {
my ($url) = $_ =~ /https?:\/\/(\S+)\/t/;
print "$url\n";
if ($urlhash{$url}) {
$urlhash{$url}++;
} else {
$urlhash{$url} = 1;
}
} '
ib.adnxs.com

Here’s /tmp/stuff:


Code
173.234.12.237 - - [04/Apr/2014:08:56:58 -0500] "GET http://ib.adnxs.com/ttj?ttjb=1&bdref=http%3A%2F%2Fnym1.b.adnxs.com%2Fif%3Fenc%3DuB6F61G4nj-4HoXrUbieP39qvHSTGMQ_uB6F61G4nj-4HoXrUbieP4lItL9PFAlrSeCjkBlH1Hsauj5TAAAAAFyWHwBAAgAAQAIAAAIAAABnK78AFycFAAAAAQBVU0QAVVNEAKAAWAImEwAADJ4AAgQCAQIAAIwAeyTbagAAAAA.%26cnd%3D%25217iWHmAjcib0BEOfW_AUYACCXzhQwADimphhABEjABFDcrH5YAGDJBGgAcAB4AIABAIgBAJABAZgBAaABAagBA7ABALkBuB6F61G4nj_BAbgehetRuJ4_yQEltGaNUfL7P9kBAAAAAAAA8D_gAQD1AQAAAAA.%26ccd%3D%2521rwYRPwjcib0BEOfW_AUYl84UIAQ.%26udj%3Duf%2528%2527a%2527%252C%2B54560%252C%2B1396619802%2529%253Buf%2528%2527r%2527%252C%2B12528487%252C%2B1396619802%2529%253B%26vpid%3D78%26apid%3D223406%26referrer%3Dhttp%253A%252F%252Fwww.chancese.com%252Findex.php%253Foption%253Dcom_content%2526view%253Darticle%2526id%253D1367%253AThe-Disadvantages-of-Working-as-a-Team%2526catid%253D27%2526Itemid%253D67%26ct%3D0%26dlo%3D1&id=2263489&cb=[CACHEBUSTER]&pubclick=http://nym1.b.adnxs.com/click?uB6F61G4nj-4HoXrUbieP39qvHSTGMQ_uB6F61G4nj-4HoXrUbieP4lItL9PFAlrSeCjkBlH1Hsauj5TAAAAAFyWHwBAAgAAQAIAAAIAAABnK78AFycFAAAAAQBVU0QAVVNEAKAAWAImEwAADJ4DAQQCAQIAAIwAfSQAawAAAAA./cnd=%21rwYRPwjcib0BEOfW_AUYl84UIAQ./referrer=http%3A%2F%2Fwww.chancese.com%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D1367%3AThe-Disadvantages-of-Working-as-a-Team%26catid%3D27%26Itemid%3D67/clickenc= HTTP/1.0" 200 1458 "http://nym1.b.adnxs.com/if?enc=uB6F61G4nj-4HoXrUbieP39qvHSTGMQ_uB6F61G4nj-4HoXrUbieP4lItL9PFAlrSeCjkBlH1Hsauj5TAAAAAFyWHwBAAgAAQAIAAAIAAABnK78AFycFAAAAAQBVU0QAVVNEAKAAWAImEwAADJ4AAgQCAQIAAIwAeyTbagAAAAA.&cnd=%217iWHmAjcib0BEOfW_AUYACCXzhQwADimphhABEjABFDcrH5YAGDJBGgAcAB4AIABAIgBAJABAZgBAaABAagBA7ABALkBuB6F61G4nj_BAbgehetRuJ4_yQEltGaNUfL7P9kBAAAAAAAA8D_gAQD1AQAAAAA.&ccd=%21rwYRPwjcib0BEOfW_AUYl84UIAQ.&udj=uf%28%27a%27%2C+54560%2C+1396619802%29%3Buf%28%27r%27%2C+12528487%2C+1396619802%29%3B&vpid=78&apid=223406&referrer=http%3A%2F%2Fwww.chancese.com%2Findex.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D1367%3AThe-Disadvantages-of-Working-as-a-Team%26catid%3D27%26Itemid%3D67&ct=0&dlo=1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; FunWebProducts; .NET CLR 1.1.4322)"



(This post was edited by Laurent_R on Apr 5, 2014, 8:27 AM)


Laurent_R
Veteran / Moderator

Apr 5, 2014, 8:22 AM

Post #2 of 6 (9555 views)
Re: [rmarc] This is annoying me. I'm sure I'm f@#ing it up, please correct me. [In reply to] Can't Post

Hi,
I have heavily edited your post to add code tags as well as line returns and indentation in the code, to try to make your post more readable. Please use code tags next time you post here.

Even after having reformated the post to make it more readable, I still don't understand what you want. What is your problem? What is your question?


rmarc
New User

Apr 5, 2014, 8:30 AM

Post #3 of 6 (9547 views)
Re: [Laurent_R] This is annoying me. I'm sure I'm f@#ing it up, please correct me. [In reply to] Can't Post

I'm just trying to parse an apache log. In this case, just trying to extract the hostname from the GET, if it exists. After that I'm counting.

It works for a lot, but not for everything. I'm confused as to why it doesn't work. It seems that it's not matching the slash in cases similar to the one I noted.

R. Marc


BillKSmith
Veteran

Apr 5, 2014, 8:32 AM

Post #4 of 6 (9546 views)
Re: [rmarc] This is annoying me. I'm sure I'm f@#ing it up, please correct me. [In reply to] Can't Post

I suspect that you problem is the greedy match. Try:

Code
my ($url) = $_ =~ /https?:\/\/(\S+?)\//;

Good Luck,
Bill


rmarc
New User

Apr 5, 2014, 8:37 AM

Post #5 of 6 (9544 views)
Re: [BillKSmith] This is annoying me. I'm sure I'm f@#ing it up, please correct me. [In reply to] Can't Post

Beautiful.

I would have thought an explicit match would override the greed, but I'm for what works.

Thanks.

R. Marc


FishMonger
Veteran / Moderator

Apr 5, 2014, 9:16 AM

Post #6 of 6 (9543 views)
Re: [rmarc] This is annoying me. I'm sure I'm f@#ing it up, please correct me. [In reply to] Can't Post


Quote
I'm just trying to parse an apache log.


Rather than rolling your own parser, why not use one that has been tested by tens of thousands of people.

Apache::Log::Parser - Parser for Apache Log (common, combined, and any other custom styles by LogFormat).
http://search.cpan.org/~tagomoris/Apache-Log-Parser-0.02/lib/Apache/Log/Parser.pm

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives