CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Help with Regex

 



CodingNub
New User

Oct 20, 2013, 11:38 AM

Post #1 of 12 (1061 views)
Help with Regex Can't Post

Hello folks,

What I need help with is some Regex! I may also be approaching the task entirely wrong so I will tell you what I am trying to accomplish and what I have come up with so far:

I am running a command on a server and the output of this command has 2 numbers in it. I need to grab these two numbers and just print them out. This sounded simple at first but when I realized I may be too novice for this, I decided to get some help!

When you run the command in an ssh session on the server itself output looks like this:

Unnamed[1] Unnamed[2]
--------------------------------- ---------------------------------
3639090.0 2837835.73400000000


Here is the code I have come up with so far minus the regex I believe I would need to finished this task. Also the first number in the output is the total disk space and the second number is the space in use currently:


Code
#!/usr/bin/perl 

use strict;
use warnings;
use List::Util qw(max);

my $rundsm = `dsmadmc -id=reports -pa=reports "SELECT Sum(VOLUMES.EST_CAPACITY_MB), Sum(VOLUMES.EST_CAPACITY_MB*volumes.pct_utilized/100) FROM VOLUMES VOLUMES WHERE (VOLUMES.DEVCLASS_NAME='DISK')" | grep " "`;
my @totalspace = $rundsm =~ #regex to pull first number
my @diskused = $rundsm =~ #regex to pull second number

print "Message: Total Disk Space available\n",
"Statistic: @totalspace\n";

print "Message: Actual Disk in Use\n",
"Statistic: @diskused\n";

exit 0;


Any help you folks could offer to get this accomplished would be awesome!


BillKSmith
Veteran

Oct 20, 2013, 1:08 PM

Post #2 of 12 (1055 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

You do not seem to know the meaning of the $ or @ at the beginning of a perl variable. Please read the perl documentation 'perldata'. (At you command line, type perldoc perldata)

Your external function 'dsmadmc' appears to be a database interface. You probably can (and should) do the whole job in perl using the module DBI as your db interface.

Numbers can come in a surprisingly large number of formats. The safest way to match all numbers and nothing else is to use the module Regex::Common.

It is actually easier to match both your numbers with one Regex than it is to do them separately.

I divided up your long command line to fit the page. The string sent to the shell should be exactly the same as yours.


Code
#!/usr/bin/perl 
use strict;
use warnings;
#use List::Util qw(max);
use Regexp::Common;
use Readonly;
Readonly::Scalar my $NUMBER => qr/$RE{num}{real}/;
Readonly::Scalar my $SELECT => q["SELECT]
.q[ Sum(VOLUMES.EST_CAPACITY_MB),]
.q[ Sum(VOLUMES.EST_CAPACITY_MB*volumes.pct_utilized/100)]
.q[ FROM VOLUMES VOLUMES WHERE (VOLUMES.DEVCLASS_NAME='DISK')"]
;
my $rundsm = `dsmadmc -id=reports -pa=reports $SELECT | grep " "`;

my ($totalspace, $diskused) = $rundsm =~ /($NUMBER).+($NUMBER)/;
print "Message: Total Disk Space available\n", "Statistic: $totalspace\n";
print "Message: Actual Disk in Use\n", "Statistic: $diskused\n";
exit 0;

Good Luck,
Bill


CodingNub
New User

Oct 20, 2013, 1:42 PM

Post #3 of 12 (1053 views)
Re: [BillKSmith] Help with Regex [In reply to] Can't Post

BillKSmith,

Thank you so much for the help! I will definitely keep learning more and read up on perldata! I recently purchased the O'Reily book to help me learn Perl! I am still pretty green.

When I tried to use your code I got this error on my AIX system:


Can't locate Regexp/Common.pm in @INC (@INC contains: /usr/opt/perl5/lib/5.8.8/aix-thread-multi /usr/opt/perl5/lib/5.8.8 /usr/opt/perl5/lib/site_perl/5.8.8/aix-thread-multi /usr/opt/perl5/lib/site_perl/5.8.8 /usr/opt/perl5/lib/site_perl .) at ./disk_usage.pl line 5.
BEGIN failed--compilation aborted at ./disk_usage.pl line 5.


The environment I work in we are not allowed to change anything on these machines or I could lose my job. Since I can not install the regexp common module, how else could this be done with default modules?


BillKSmith
Veteran

Oct 20, 2013, 3:31 PM

Post #4 of 12 (1048 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

If we can assume that the numbers always conform to the format in your example you can use:

Code
Readonly::Scalar my $NUMBER => qr/\d+\.\d+/;


You will have to add code to handle special cases as they arise. Can the numbers be negative? No digits before decimal point? after? No decimal point at all? Very large (or small) numbers in floating point? One of the two number missing? Both missing? I doubt that this list is complete.
Good Luck,
Bill


Kenosis
User

Oct 20, 2013, 4:19 PM

Post #5 of 12 (1046 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

Given your output, the following regex will capture both values:

Code
use strict; 
use warnings;

my $rundsm = <<END;
Unnamed[1] Unnamed[2]
--------------------------------- ---------------------------------
3639090.0 2837835.73400000000
END

my ( $totalspace, $diskused ) = $rundsm =~ /(.+)\s+(\S+)$/;
print "totalspace: $totalspace\ndiskused: $diskused";

Output:

Code
totalspace: 3639090.0 
diskused: 2837835.73400000000

Hope this helps!


CodingNub
New User

Oct 21, 2013, 7:37 AM

Post #6 of 12 (1032 views)
Re: [Kenosis] Help with Regex [In reply to] Can't Post

Kenosis,

Given the restrictions of my AIX system your example seems to work the best because we do not have readonly module installed either.

I do have a quick question. When I run it on my server against our active output the numbers don't actually start at the beginning of a new line like my poorly formatted output makes it look.

The first number starts underneath the top output and about 20 space to the right. Then the second number comes right after. When I run your code I get a second number in the variable I would expect but the first variable is a zero. Since I am very new to regex I assume its because its trying to grab the beginning of a line? If not maybe just a small change needs to be made?


Kenosis
User

Oct 22, 2013, 9:08 AM

Post #7 of 12 (1021 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

Hi CodingNub.

You said,

Quote
The first number starts underneath the top output and about 20 space to the right. Then the second number comes right after.


My apologies, but this is difficult for me to imagine, and it would be best to see the actual output, so the regex could be adjusted.

Can you either paste that text into a code block or attach it in a text file? Either would be helpful, in this case.


BillKSmith
Veteran

Oct 22, 2013, 10:06 AM

Post #8 of 12 (1017 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

Readonly is never required. It is only a tool that makes the intent clearer to human readers. It also detects errors which attempt to change the value to the protected variable.

It now sounds like you intend to apply the regex to multiple lines. You expect it to fail on all but the right one. False positive matches can be worse than no match. The design of a regex depends very much on how it is used. The more we know, the more we can help.
Good Luck,
Bill


CodingNub
New User

Oct 22, 2013, 10:50 AM

Post #9 of 12 (1011 views)
Re: [Kenosis] Help with Regex [In reply to] Can't Post


Code
                       Unnamed[1]                            Unnamed[2] 
--------------------------------- ---------------------------------
3639090.0 2811992.81100000000


That is the output of the command. if I was unclear before I need a regex to grab both numbers and print them out whenever I run this script.

When I run this code:


Code
#!/usr/bin/perl  
use strict;
use warnings;

my $rundsm = `dsmadmc -id=reports -pa=reports "SELECT Sum(VOLUMES.EST_CAPACITY_MB), Sum(VOLUMES.EST_CAPACITY_MB*volumes.pct_utilized/100) FROM VOLUMES VOLUMES WHERE (VOLUMES.DEVCLASS_NAME='DISK')" | grep " "`;
my ($totalspace, $diskused) = $rundsm =~ /(.)\s+(\S+)$/;
print "Message: Total Disk Space available\n", "Statistic: $totalspace\n";
print "Message: Actual Disk in Use\n", "Statistic: $diskused\n";
exit 0;


It gives me the second number but the first number is a zero. Also when I tried Bill's previous method \n it error'd on readonly and regex::common like the machine has neither installed and I can not install anything. \n I am hoping given the output if the regex was changed slightly it will work as intended!

Thanks for all the help Ken and Bill!


Kenosis
User

Oct 22, 2013, 1:53 PM

Post #10 of 12 (1004 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

Try the following, 'more forgiving' regex:

Code
/(\S+)\s+(\S+)\s*$/



FishMonger
Veteran / Moderator

Oct 22, 2013, 2:57 PM

Post #11 of 12 (996 views)
Re: [Kenosis] Help with Regex [In reply to] Can't Post

Another option would be to use the split function and load the data into an array instead of the scalar.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use Data::Dumper;

my @rundsm = <DATA>; # simulates the OP's dsmadmc call
my ($totalspace, $diskused) = (split(/\s+/, $rundsm[-1]))[-2,-1];

print Dumper ($totalspace, $diskused);


# data for the simulated dsmadmc call
__DATA__
Unnamed[1] Unnamed[2]
--------------------------------- ---------------------------------
3639090.0 2811992.81100000000


output:
$VAR1 = '3639090.0';
$VAR2 = '2811992.81100000000';


If you wanted to use your original $rundsm var instead of the array I used, you would need to use an additional split() function.


BillKSmith
Veteran

Oct 22, 2013, 3:28 PM

Post #12 of 12 (994 views)
Re: [CodingNub] Help with Regex [In reply to] Can't Post

I already told you Readonly is not necessary! The Regex module will match almost any valid number format. This power may not be needed. My second regex will match the numbers in your example. You can generalize it in any of several ways if necessary.

Code
#!/usr/bin/perl 
use strict;
use warnings;
my $NUMBER = qr/\d+\.\d+/;
my $SELECT = q["SELECT]
.q[ Sum(VOLUMES.EST_CAPACITY_MB),]
.q[ Sum(VOLUMES.EST_CAPACITY_MB*volumes.pct_utilized/100)]
.q[ FROM VOLUMES VOLUMES WHERE (VOLUMES.DEVCLASS_NAME='DISK')"]
;
#my $rundsm = `dsmadmc -id=reports -pa=reports $SELECT | grep " "`;
my $rundsm =

" Unnamed[1] "
." Unnamed[2]\n"
." ---------------------------------"
."---------------------------------\n"
." 3639090.0"
." 2811992.81100000000\n"
;

my ($totalspace, $diskused) = $rundsm =~ /($NUMBER).+?($NUMBER)/;
print "Message: Total Disk Space available\n", "Statistic: $totalspace\n";
print "Message: Actual Disk in Use\n", "Statistic: $diskused\n";
exit 0;

Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives