CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: since php-parser attempts failed i need to get a perl-approach: Edit Log



dilbert
User

Feb 16, 2018, 3:32 AM


Views: 13824
since php-parser attempts failed i need to get a perl-approach

hello dear Perl-Gurus


i tried to retrieve the contents of a div from the external site withg PHP, and XPath. See below the story and - subsequently my very very first steps in a Perl-approach to this problem (below the php-explanations.


What happened:
as it is sometimes a bit tricky i tried several attempts -and used various approaches - in PHP - now i want to try out Perl.


See the php-Story: as it goes..

This is an excerpt from the page, showing the relevant code: note: i try to add all
- also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:



goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.or...wp-job-manager/

Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+

i want to have a little database that runs locally - with those data of my favorite-plugins. So i want to fetch the data automatically - with a chron job.
Well after the PHP-trials, i need to know how to do this in perl instead - i want to try out this in perl



btw: this is my XPath: //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/


Try to retrieve the contents of a div from the external site withg PHP, and XPath

This is an excerpt from the page, showing the relevant code: note: i try to add all
- also to add @ on the class and a at the end on my query, After that,
i use saveHTML() to get it. see my test:


see the subsequent code:


Code
<?php 

$remote = "https://wordpress.org/plugins/participants-database/";
$doc = new DOMDocument();
@$doc->loadHTMLFile($remote);
$xpath = new DOMXpath($doc);
$node = $xpath->query('//*[@id="post-519"]/div[4]/div[1]/ul/li[2]');
echo $node->item(0)->nodeValue;

?>


output: But the output looks like so


Code
see the results:  martin@linux-3645:~/dev/php> php p20.php 
PHP Notice: Trying to get property of non-object in /home/martin/dev/php/p20.php on line 8
martin@linux-3645:~/dev/php> php p20.php



background:


my way to get the xpath; use google chrome: I have a webpage I want to get some data off:


Quote


goal: i need the following data: the values of the following lines


Code
Version: 
Last updated:
Active installations:
Tested up


see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/


Code
Version: 1.29.3 
Last updated: 5 days ago
Active installations: 100,000+




eg the html lines

Code
                 <li> 
Requires WordPress Version:<strong>4.3.1</strong> </li>

<li>Tested up to: <strong>4.9.2</strong></li>



background: i need the data from all my favorite plugins - want to have it in a db or a calc sheet. So there were approx 70 pages to scrape:_

see here the list for the example - the full xpath:


Code
//*[@id="post-15991"]/div[4]/div[1]



and job-board-manager:


Code
//*[@id="post-519"]/div[4]/div[1]/ul/li[1] 
//*[@id="post-519"]/div[4]/div[1]/ul/li[2]
//*[@id="post-519"]/div[4]/div[1]/ul/li[3]
//*[@id="post-519"]/div[4]/div[1]/ul/li[7]


i used this method: Is there a way to get the xpath in google chrome?

Quote
Right click "inspect" on the item you are trying to find the xpath
Right click on the highlighted area on the console.
Go to Copy xpath


see the subsequent code:


Code
 
<?php

include('simple_html_dom');
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$html = file_get_html($url);
$text = array();
foreach($html->find('DIV[class="widget plugin-meta"]') as $text) {
$text[] = $text->plaintext;
}
print_r($headlines);

?>








Code
 
martin@linux-3645:~/dev/php> php p100.php

PHP Warning: include(simple_html_dom): failed to open stream: No such file or directory in /home/martin/dev/php/p100.php on line 4
PHP Warning: include(): Failed opening 'simple_html_dom' for inclusion (include_path='.:/usr/share/php5:/usr/share/php5/PEAR') in /home/martin/dev/php/p100.php on line 4
PHP Fatal error: Call to undefined function file_get_html() in /home/martin/dev/php/p100.php on line 6
martin@linux-3645:~/dev/php>



goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.or...wp-job-manager/

Version: 1.29.3
Last updated: 5 days ago
Active installations: 100,000+

i want to have a little database that runs locally - with those data of my favorite-plugins. So i want to fetch the data automatically - with a chron job.
Well after the PHP-trials, i need to know how to do this in perl instead - i want to try out this in perl



the idea:


i try to parse site using Perl inside perlbrew and XML::LibXML.



Code
 
my $parser = XML::LibXML->new();

my $doc = $parser->load_html(location => "http://www.example.com/", recover => 2);
foreach my $x ($doc->findnodes('*xPath*'){
...
}



Well i think that this code should give me a first approach to a working model


(This post was edited by dilbert on Feb 16, 2018, 6:48 AM)


Edit Log:
Post edited by dilbert (User) on Feb 16, 2018, 3:35 AM
Post edited by dilbert (User) on Feb 16, 2018, 3:37 AM
Post edited by dilbert (User) on Feb 16, 2018, 3:49 AM
Post edited by dilbert (User) on Feb 16, 2018, 6:48 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives