CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
split string into dates

 



alferic
Novice

Apr 7, 2014, 12:59 AM

Post #1 of 7 (11539 views)
split string into dates Can't Post

hello Gurus,

I need your valuable ideas on how to resolve my regex isssue.

I have this date strings from a excel file which may contain any of the following format:

3122014_50437_AM ##read as Mar 12, 2014 5:04:37 AM
112014_112532_PM ##read as Jan 1,2014 11:25:32 PM
12152013_63524_PM ##read as Dec 15, 2014 6:35:24 PM

I wanted the perl script that when it sees the string to split this string into pieces so that I can convert them to timegm(). I tried the regex below but it is not dynamic to every combination of date strings.

here's a portion of my script.

#the $mydate variable may contain any of the above date string combinations.

my ($mmddyyyy, $hhmmss, $ampm) = split /\_/, $mydate;

my ($mm, $dd, $yy) = $mmddyyyy =~ /\b(\d{2})(\d{2})(\d{4})\b/;

my ($hr, $min, $ss) = $hhmmss =~ /\b(\d{2})(\d{2})(\d{2})\b/;

print ("$mm\/$dd\/$yy $hr\:$min\:$ss\n";

$datetime = timegm($ss, $min, $hr, $dd, $mm-1, $yy);

print "$datetime\n";


(This post was edited by alferic on Apr 7, 2014, 1:04 AM)


BillKSmith
Veteran

Apr 7, 2014, 7:24 AM

Post #2 of 7 (11530 views)
Re: [alferic] split string into dates [In reply to] Can't Post

Either your examples are not correct or you have not given enough to infer the format correctly. None of your examples use a zero in the month or the day field. If this were always true, we could not parse 1212014_.... There is no way to know if this means Jan 21, 2014 or Dec 1, 2014. Also note that example 3 appears to have an error in the year.

Your regex assumes that all days and months will always use two characters. They will not match your first two examples.

I do not currently have access to Excel. Please verify that all your examples are correct and add the two dates above.
Good Luck,
Bill


BillKSmith
Veteran

Apr 7, 2014, 10:09 AM

Post #3 of 7 (11524 views)
Re: [alferic] split string into dates [In reply to] Can't Post

The following code should match all valid dates (Ambiguities resolved arbitrarily by greediness). It correctly fails to match most invalid dates (It assumes that all months have 31 days. Therefore it will incorrectly match patterns such as Feb 31.)


Code
use strict; 
use warnings;
use Time::timegm qw(timegm);
my @given = (
'3122014_50437_AM', ##read as Mar 12, 2014 5:04:37 AM
'112014_112532_PM', ##read as Jan 1,2014 11:25:32 PM
'12152013_63524_PM', ##read as Dec 15, 2014 6:35:24 PM
);
my @Month = qw(xxx Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
my $YR = qr/19[7-9][0-9]|20[0-9][0-9]/;
my $MO = qr/[1-9]|1[0-2]/;
my $DY = qr/[1-9]|[12][0-9]|3[01]/;
my $HR = qr/0?[1-9]|1[0-2]/;
my $MN = qr/0?[1-9]|[1-5][0-9]/;
my $SC = qr/0?[1-9]|[1-5][0-9]/;
my $AP = qr/[AP]M/i;


foreach (@given) {
if (/($MO)($DY)($YR)_($HR)($MN)($SC)_($AP)/) {
my @date = ($Month[$1], $2, $3, $4, $5, $6, $7);
printf "%3s %2d, %4d %2d:%2d:%2d %3s\n", @date;
my $yy = $1 - 1900;
my $mm = $2 -1;
my $dd = $3;
my $hr = $4 + ($7 =~ /^pm$/i) ? 12 : 0;
my $min= $5;
my $ss = $6;
my $datetime = timegm($ss, $min, $hr, $dd, $mm, $yy);
print "$datetime\n";
}
else{
warn "No date found in '$_'\n";
}
}


Update: Fixed code error unrelated to regex.
Good Luck,
Bill

(This post was edited by BillKSmith on Apr 8, 2014, 4:25 AM)


Laurent_R
Veteran / Moderator

Apr 7, 2014, 10:12 AM

Post #4 of 7 (11524 views)
Re: [alferic] split string into dates [In reply to] Can't Post

In addition to the ambiguities pointed by Bill, your post does not seem to describe any problem, nor to ask any question.


alferic
Novice

Apr 8, 2014, 1:43 AM

Post #5 of 7 (11477 views)
Re: [BillKSmith] split string into dates [In reply to] Can't Post

Hi Bill,

Thanks for your solution. I appreciate it. I believe your code should work well but I was able to find a workaround last night. I used the repitition/series methof {n,m}. Here is what I did

my ($mm, $dd, $yy) = $mmddyyyy =~ /\b(\d{1,2})(\d{0,1})(\d{4})\b/;

my ($hr, $min, $ss) = $hhmmss =~ /\b(\d{1,2})(\d{2})(\d{2})\b/;

I will apply your code in case I encounter issues with the combinations


alferic
Novice

Apr 8, 2014, 1:49 AM

Post #6 of 7 (11474 views)
Re: [alferic] split string into dates [In reply to] Can't Post

Bill,

my workaround does not seem to work after a few combinations.

your code seem to work.


BillKSmith
Veteran

Apr 8, 2014, 9:55 AM

Post #7 of 7 (11250 views)
Re: [alferic] split string into dates [In reply to] Can't Post

You do not seem to understand. You cannot truly test either program without an accurate set of input data. Prepare test cases by Making an Excel spreadsheet that outputs a sample of every special case that you can think of. (As a minimum, I would use the first and last day of every month. Include a February for both a leap and non-leap year. Also add at least one pair of dates which could be ambiguous.)

It is equally important that you reject invalid strings. (Note your Regex will match any string of from five to eight digits and try to parse it. I bet that there are many of these in your spreadsheet that are not dates!)

On the other hand, my regex matches only valid dates. I KNOW it is not 100% perfect, but probably good enough. Test it!

I can help with the testing, but only if you provide quality data.
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives