CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Difficult Regular Expression, help needed.

 



erodriguez
New User

Feb 19, 2003, 5:12 PM

Post #1 of 6 (7591 views)
Difficult Regular Expression, help needed. Can't Post

I am having problems constructing a regular expression that I need.
I am pulling articles out of a db and trying to strip them of unnecessary content.

The format of each content item is such:

Display Title = CRLF
Subtitle = CRLF
Subtitle2 = CRLF
Subtitle3 = CRLF
Article content here...


As you can see at the top of each article there is appended a display title, and subtitle 1 thru 3, after each "=" there may or may not be text, however there are always carriage returns. I need to strip the title information from the main article content, and so far my efforts have fallen short. Part of my problem is that the main article content also contain CRLFs.

Any Help is appreciated.

- Eric R.


davorg
Thaumaturge / Moderator

Feb 20, 2003, 1:51 AM

Post #2 of 6 (7580 views)
Re: [erodriguez] Difficult Regular Expression, help needed. [In reply to] Can't Post

There are simpler approaches than using regular expressions.

Assuming that you have one of these records in $rec then you can use code like this to split it into the various sections.

Code
my $(title, $sub1, $sub2, $sub3, $content) 
= split /\n/, $rec, 5);

You can then attack the various parts individually.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


erodriguez
New User

Feb 20, 2003, 6:28 PM

Post #3 of 6 (7573 views)
Re: [davorg] Difficult Regular Expression, help needed. [In reply to] Can't Post

I thought about using split, but the content contains newlines also, and the header (title information) contains a variable number of newlines.

- Eric


(This post was edited by erodriguez on Feb 20, 2003, 6:34 PM)


davorg
Thaumaturge / Moderator

Feb 21, 2003, 1:45 AM

Post #4 of 6 (7566 views)
Re: [erodriguez] Difficult Regular Expression, help needed. [In reply to] Can't Post


In Reply To
I thought about using split, but the content contains newlines also,

Yes, that's why I used the third parameter to split.


In Reply To
and the header (title information) contains a variable number of newlines.

That might be more of a problem. Perhaps you could show us some real sample data.

--
Dave Cross, Perl Hacker, Trainer and Writer
http://www.dave.org.uk/
Get more help at Perl Monks


erodriguez
New User

Feb 22, 2003, 10:11 AM

Post #5 of 6 (7559 views)
Re: [davorg] Difficult Regular Expression, help needed. [In reply to] Can't Post

Here is an example of some of the article content:

----------------------------------------------------------------------------

Display Title = Why Good Executives Get Fired

Subtitle = Career HQ: Online Resource

Subtitle2 =

Subtitle3 =

About three years ago, we began to observe two significant and disturbing trends. First, a number of association executives commonly accepted as successful, and in many cases officially recognized by their colleagues as models for the profession, were getting fired. At the same time, we noted that a number of associations, particularly large trade groups, were increasingly turning to their own subject experts for leadership, rather than hiring from the association management profession. It occurred to us that it would be important to find out why.

</P><P>

The conclusions we drew in this article are based on ongoing research for the Foundation. This research project seeks to identify the competencies executives will require in the 21st century, based on more than 600 case studies in the client files of our consulting practice. This research relies not only on opinion survey work but also upon expert observations about the facts of real situations. That perspective is important because while you may not like what you read, you cannot easily dismiss its truth.

more content...

----------------------------------------------------------------------------



As you can see the content starts after the newlines after subtitle3.



-Thanks...


(This post was edited by erodriguez on Feb 22, 2003, 10:12 AM)


kencl
User

Jul 12, 2004, 6:01 AM

Post #6 of 6 (7333 views)
Re: [erodriguez] Difficult Regular Expression, help needed. [In reply to] Can't Post

(undef, $content) = split (/Subtitle3[^\r\n]*[\r\n]+/, $rec);
__

>> If you can't control it, improve it, correlate it or disseminate it with PERL, it doesn't exist!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives