CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Information Extraction from PDF

 



lovboy
New User

May 24, 2010, 2:18 PM

Post #1 of 3 (4033 views)
Information Extraction from PDF Can't Post

Hi Guys,

My first post here - as this suggests - very new to Perl...

I am looking for some elegant design solution to the following problem. It will be really nice of you all to guide me through this and refer me to the right modules and libraries available.

Task: I am trying to extract information from this kind of a PDF page (page 872) - http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/UCM071436.pdf

I need to provide the output in this format -->

Drug Name Approval number Patent number
------------ ------------------- ----------------
ABC 020977 5034394
XYZ 020977 5089500

How do you think I should approach this problem ?

Thanks,
lovboy


Bianca
User

Jun 7, 2010, 1:23 AM

Post #2 of 3 (3975 views)
Re: [lovboy] Information Extraction from PDF [In reply to] Can't Post

http://search.cpan.org/~cdolan/CAM-PDF-1.52/lib/CAM/PDF.pm tested?


deepeshtronics
Novice

Jul 31, 2010, 10:10 AM

Post #3 of 3 (3767 views)
Re: [lovboy] Information Extraction from PDF [In reply to] Can't Post

Hi,

I would like to help you in this.
Follow the below steps:

1] Install the following modules in your machine

CAM::PDF
Compress-Raw-Zlib
Text-PDF-0.29

2] Try to convert your pdf file into a text file by using the code given below

#!/usr/perl/bin

use warnings;
use strict;

use CAM::PDF;

my $file_name = shift;
my $pdf = CAM::PDF->new($file_name);

for my $page (1 .. $pdf->numPages()) {
my $text = $pdf->getPageText($page);
print "$text" if $text;
}

Run the script with the following command
perl script_name paf_name > output_text_file_name

Once you are done with the above steps, Kindly let me know. On the basis of input pattern in the converted text file we will pick up the rellevent information by another perl script and write it into different output file.

Thanks

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives