CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
string concatenation in Perl - a easy one

 



dilbert
User

Feb 16, 2011, 1:54 PM

Post #1 of 7 (1009 views)
string concatenation in Perl - a easy one Can't Post

hello dear friends

many many thanks for running this great site - i love it!

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;

my $te = HTML::TableExtract->new;

use LWP::Simple;

getstore('http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=100&s=2750', 'temp.html') or die 'Unable to get page';

$te->parse_file('temp.html');

my ($table) = $te->tables;

for my $row ( $table->rows ) {
cleanup(@$row);
print "@$row\n";
}

sub cleanup {
for ( @_ ) {
s/\s+/ /g;
}
}


Well lwp is just great: It has many many powerful options: one question regarding the spider-logic: see the site here:

http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=20

it shows following:

Treffer 1 - 20 von insgesamt 6150
> that means: hits 1 to 20 from a total = 6150

Well here a question: how can i force the script to fetch all (!) the sites - beginning from the first page and ending at the last one!?

see the range:

http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=20&s=0 -> the first 20 resluts:
http://192.68.214.70/km/asps/schulsuche.asp?q=e&a=20&s=6140 --> the last 20 resluts: ... out of 6140

how to call this in the get-argument - how to search the whole range?


Code
from.... ?q=e&a=20&s=0  
to ... ?q=e&a=20&s=6140


well this is a question of string concatenation:

in php we can do [in a simmilar task] something like the following:


Code
$number_array = array ("123", "43567", "9287","3323"); 
for($i=0; $i<$count($number_array); $i ++) {

$new_url = $orig_url . $number_array[$i];

/* do something with the new url */


how to do it with Perl in the above mentioned example?


Code
from.... ?q=e&a=20&s=0  
to ... ?q=e&a=20&s=6140



btw - again see the full code:


Code
 
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;

my $te = HTML::TableExtract->new;

use LWP::Simple;

getstore('http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=100&s=2750', 'temp.html') or die 'Unable to get page';

$te->parse_file('temp.html');

my ($table) = $te->tables;

for my $row ( $table->rows ) {
cleanup(@$row);
print "@$row\n";
}

sub cleanup {
for ( @_ ) {
s/\s+/ /g;
}
}


and here see some results:


Quote
lfd. Nr. Schul- nummer Schulname Stra&#65533;e PLZ Ort Telefon Fax Schulart Webseite
2751 8787 Mittelschule Lindau (Bodensee) - Aeschach&#65533; Anheggerstr. 18 88131&#65533; Lindau&#65533;Aeschach 08382/944555 08382/944554 Volksschulen www.hs-aeschach.de
2752 8789 Volksschule Lindau (Bodensee) - Hoyren&#65533;(Grundschule) Hoyerbergstr. 33 88131&#65533; Lindau&#65533; 08382/944581 08382/944582 Volksschulen www.vs-lindau-hoyren.de/
2753 8790 Volksschule Lindau (Bodensee) - Reutin-Zech&#65533;(Grundschule) Schulstr. 23 88131&#65533; Lindau&#65533; 08382/975261 08382/975262 Volksschulen
2754 8791 Mittelschule Lindau (Bodensee) - Reutin&#65533; Schulstr. 23 88131&#65533; Lindau&#65533;Reutin 08382/975264 08382/975265 Volksschulen www.hs-reutin.de
2755 8799 Volksschule Lindau (Bodensee) - Oberreitnau&#65533;(Grundschule) Hepachstr. 9 88131&#65533; Lindau&#65533;Oberreitnau 08382/944591 08382/944592 Volksschulen



well - i have to get all the pages - therefore in need to create an approbiate spider-logic that contains a way of string concatenation - in order to fetch all the pages...

i look forward to a hint...

many many thanks in advance!


rovf
Veteran

Feb 17, 2011, 9:09 AM

Post #2 of 7 (996 views)
Re: [dilbert] string concatenation in Perl - a easy one [In reply to] Can't Post

It's not clear to me, at which point you got stuck. Maybe you can ask the question shorter, and more to the point?

For catenation in Perl, you have two options:

(1) Using the catenation operator, which is a dot:

$a="foo"; $b="bar";
$c=$a.$b; # foobar

(2) Using interpolation:

$x="here $a, $there $b"; # here foo, there bar


dilbert
User

Feb 17, 2011, 2:19 PM

Post #3 of 7 (987 views)
Re: [rovf] string concatenation in Perl - a easy one [In reply to] Can't Post

hello rovf,

many many thanks for the hint - solved this ...

btw - one last question - on the results of the parsed dataset - (German language) .

see an example - with one little thing left : in the German language we have special characters which are not recognized correctly .... see the following lines - out of a result:



Quote
lfd. Nr. Schul- nummer Schulname Stra�e PLZ Ort Telefon Fax Schulart Webseite
1 0401 M�dchenrealschule Marienburg,�Abenberg, der Di�zese Eichst�tt Marienburg 1 91183� Abenberg� 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg�(Grundschule) G�ss�belstr. 2 91183� Abenberg� 09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg
3 6913 Mittelschule Abenberg� G�ss�belstr. 2 91183� Abenberg� 09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg
4 0402 Johann-Turmair-Realschule�Staatliche Realschule Abensberg Stadionstra�e 46 93326� Abensberg� 09443/9143-0,12,13 09443/914330 Realschulen www.rs-abensberg.de
5 3041 Cabrini-Schule Offenstetten, Priv. F�rderzentrum�F�rderschwerp. geist.Entwickl. d. Kath.Jugendf�rs. Am Schmiedweiher 8 93326� Abensberg�Offenstetten 09443/9188-3 09443/918855 Volksschulen zur sonderp�dog. F�rderung www.cabrinischule.de
6 3074 Private Berufsschule zur sonderp�d. F�rderung,�F�rderschwerpunkt Lernen, Abensberg Regensburger Stra�e 60 93326� Abensberg� 09443/709191 09443/709193 Berufsschulen zur sonderp�dog. F�rderung www.berufsschule-abensberg.de



in the following lines i add the correct characters:


Quote
lfd. Nr. Schul- nummer Schulname Straße PLZ Ort Telefon Fax Schulart Webseite
1 0401 Mädchenrealschule Marienburg, Abenberg, der Diözese Eichstätt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg (Grundschule) Güssübelstr. 2 91183 Abenberg


see some of the corrections in bold....


Well how can we rewrite the regex to go round the issue with the special characters...?

any hint on this here .... !?
db1

see the code:


Code
sub processData() { 
while ( $range <= $total_records) {
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
for my $row ( $table->rows ) {
cleanup(@$row);
print OUTFILE "@$row\n";
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
$te = HTML::TableExtract->new;
}
}

sub cleanup() {
for ( @_ ) {
s/\s+/ /g;
}
}


any idea!?

look forward!


rovf
Veteran

Feb 18, 2011, 3:38 AM

Post #4 of 7 (977 views)
Re: [dilbert] string concatenation in Perl - a easy one [In reply to] Can't Post

It looks to me more like an encoding problem when reading the characters (i.e. the text which you are reading is using a different encoding from what you are expecting). Anyway, if you need to translate certain characters in a string, use the tr operator:




Code
$string =~ tr/SET1/SET2/


This replaces in $string each respective character in SET1 by the corresponding one from SET2.

Ronald


dilbert
User

Feb 19, 2011, 1:50 AM

Post #5 of 7 (961 views)
Re: [rovf] string concatenation in Perl - a easy one [In reply to] Can't Post

hello rovf

many many thanks for the quick reply. The main part of the parser are solved - the thing runs very nicely... one thing is left!!

one last question regarding the parsing... is there any chance to catch some seperators within the that seperate the table...
( http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 ) Note - after all i want to store the data into a MySQL database. So it would be great to have some seperators - (commas, tabs or somewhat else - a tab seperated values or comma seperated values
are handy formats to work with...

( here the data out of the following site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 )

lfd. Nr. Schul- nummer Schulname Strae PLZ Ort Telefon Fax Schulart Webseite
1 0401 Mdchenrealschule Marienburg,Abenberg, der Dizese Eichsttt Marienburg 1 91183 Abenberg 09178/509210 Realschulen mrs-marienburg.homepage.t-online.de
2 6581 Volksschule Abenberg(Grundschule) Gssbelstr. 2 91183 Abenberg 09178/215 09178/905060 Volksschulen home.t-online.de/home/vs-abenberg
6 3074 Private Berufsschule zur sonderpd. Frderung, Frderschwerpunkt Lernen, Abensberg Regensburger Strae 60 93326 Abensberg 09443/709191
09443/709193 Berufsschulen zur sonderpdog. Frderung www.berufsschule-abensberg.de


Well i need to have those lines divided into at least three columns - take the first record.

name: Volksschule Abenberg(Grundschule)
street: Gssbelstr. 2
postal-code and town: 91183 Abenberg
fax and telephone: 09178/215 09178/905060
type of school: Volksschulen
website: home.t-online.de/home/vs-abenberg

Or even better - i have divided the postal-code and town into two seperate columns!?
Question: is this possible?

By the way: see the first record: (here i only show the names of the school)

1 0401 Mdchenrealschule Marienburg,Abenberg,
6 3074 Private Berufsschule zur sonderpd. Frderung, Frderschwerpunkt Lernen, Abensberg

Those have some commas inside the name; does this make it difficult to create a parser that creates csv-fomate?

any idea how to do this in Perl... If possible it would be just great!!

many many thx for a hint regarding this little issue - besides this all is great and fascinating!



see here the code

Code
 


#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
use Cwd;
use POSIX qw(strftime);


my $te = HTML::TableExtract->new;
my $total_records = 0;
my $suchbegriffe = "e";
my $treffer = 50;
my $range = 0;
my $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";
my $processdir = "processing";
my $counter = 50;
my $displaydate = "";
my $percent = 0;

&workDir();
chdir $processdir;
&processURL();
print "\nPress <enter> to continue\n";
<>;
$displaydate = strftime('%Y%m%d%H%M%S', localtime);
open OUTFILE, ">webdata_for_$suchbegriffe\_$displaydate.txt";
&processData();
close OUTFILE;
print "Finished processing $total_records records...\n";
print "Processed data saved to $ENV{HOME}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n";
unlink 'processing.html';
die "\n";

sub processURL() {
print "\nProcessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n";
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'Unable to get page';

while( <tempfile.html> ) {
open( FH, "$_" ) or die;
while( <FH> ) {
if( $_ =~ /^.*?(Treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) {
$total_records = $6;
print "Total records to process is $total_records\n";
}
}
close FH;
}
unlink 'tempfile.html';
}

sub processData() {
while ( $range <= $total_records) {
getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'Unable to get page';
$te->parse_file('processing.html');
my ($table) = $te->tables;
for my $row ( $table->rows ) {
cleanup(@$row);
print OUTFILE "@$row\n";
}
$| = 1;
print "Processed records $range to $counter";
print "\r";
$counter = $counter + 50;
$range = $range + 50;
$te = HTML::TableExtract->new;
}
}

sub cleanup() {
for ( @_ ) {
s/s+/ /g;
}
}

sub workDir() {
# Use home directory to process data
chdir or die "$!";
if ( ! -d $processdir ) {
mkdir ("$ENV{HOME}/$processdir", 0755) or die "Cannot make directory $processdir: $!";
}
}


rovf
what do you think - is it possible to add some seperators... !?


dilbert
User

Feb 20, 2011, 2:06 PM

Post #6 of 7 (926 views)
Re: [dilbert] string concatenation in Perl - a easy one [In reply to] Can't Post

hello rovf,

many thanks for the pm - i agree.

By the way - i have some good news... As an addition i have a new script:

It is a good addition - since the above mentioned script does not do the Separation...

Again - here the full story: currently work out how to parse a site with a table (containing 6150 records) see here http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=20 in order to get a good result
with comma seperated values...: i have to make use of the Text::CSV - module

Well - i have had very good results with the above mentioned script! This was able to run great! It fetches the data from the page: http://192.68.214.70/km/asps/schulsuche.asp?q=n&a=20 - But note - the data are not separated...!

And now i have a second script. This part can do the CSV-formate. i want to combine it with the spider-logic.
rovf - can you give me some hints - and help me in combining the two scripts - to make one ...



see here the code - but t his has no spider-logic - so we need to combine the two parts... can you help...!? I look forward!



Code
#!/usr/bin/perl 
use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;
use Text::CSV;

my $html= get 'http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20';
$html =~ tr/\r//d; # strip carriage returns
$html =~ s/&nbsp;/ /g; # expand spaces

my $te = new HTML::TableExtract();
$te->parse($html);

my @cols = qw(
rownum
number
name
phone
type
website
);

my @fields = qw(
rownum
number
name
street
postal
town
phone
fax
type
website
);

my $csv = Text::CSV->new({ binary => 1 });

foreach my $ts ($te->table_states) {
foreach my $row ($ts->rows) {

# trim leading/trailing whitespace from base fields
s/^\s+//, s/\s+$// for @$row;

# load the fields into the hash using a "hash slice"
my %h;
@h{@cols} = @$row;

# derive some fields from base fields, again using a hash slice
@h{qw/name street postal town/} = split /\n+/, $h{name};
@h{qw/phone fax/} = split /\n+/, $h{phone};

# trim leading/trailing whitespace from derived fields
s/^\s+//, s/\s+$// for @h{qw/name street postal town/};

$csv->combine(@h{@fields});
print $csv->string, "\n";
}
}



rovf
Veteran

Feb 22, 2011, 3:59 AM

Post #7 of 7 (907 views)
Re: [dilbert] string concatenation in Perl - a easy one [In reply to] Can't Post

I think for getting help here, you need to ask precises question to show where you got stuck. Just explaining your problem and then asking "Can you help?" at best gets you answers such as "Yes, maybe". We don't even know whether you need help in a Perl-specific problem, or whether it is a design problem.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives