Helpful Information
 
 
Category: Perl/ CGI
Parsing parts of an HTML file?

I have a huge webpage (over 300kb of just links) and what I want to be able to do is parse just pieces of the big page onto a template or another page. Basically I want to be able to put comments or anchors or something in the big HTML file to tell a CGI parsing script where to start parsing and where to stop parsing. Not only do I want it to be able to do that, I want it to work with variables. Being able to parse Part A or Part B not the entire page. I have found many scripts that use CGI and SSI to parse entire webpages, but I can't find anything that will parse customly defined parts of a page. Is this possible to do? If so, somebody please point me in the right direction of a script that already accomplishes this, or some code that I could use to start writing a script like this.

To help you visualize what I want to do....I want to use a CGI script to parse out different parts of this (www.smasonline.com/lyrics/list.html) lyrics page. So I can divide it into sections for each letter of the alaphbet.


If you could help me I would be forever greatful.

Thanks in advance,
Sancho

What you could do is something like this:
(not guarenteed to work and youd definately have to test it)



#!/usr/bin/perl

use LWP; # not sure if this is correct ... maybe LWP::Simple;

$addr = "http://www.somewhere.com/";

$html = get("$addr");

@data = split(/\n/,$html);

foreach (@data) {
if ($_ =~ /<!--(.*)-->/gis) {
if ($1 eq "LIST START") {
$start_typing = "true";
} elsif ($1 eq "LIST END") {
$start_typing = "false";
}
}
if ($start_typing eq "true") {
print $_;
}
}


Note: you have to put a comment (eg: <!--LIST START--> and <!--LIST END-->) where the content or links start.

What exactly are you trying to do here?

Do you want to split the whole page into a group of pages or just print out the content within the <!-- LIST ... --> comments?

By the way, it is LWP::Simple that you want here :).

If you want to parse HTML documents there are a few modules out there which can help you..

I want to be able to split the page into lots of smaller pages. But I want a script that will do it for me. I want to continue to make the big webpage full of links, and have it split into smaller pages by a script using comments. I want a page for each letter of alphabet.

That way when I get new lyrics I can just update the big page and all the other pages would include the new lyrics as well; because they are just parsing whats in between comments. The idea I have is to use the big HTML file in the same kind of way I would use a database. Except I just want pull things from the database instead of searching it or anything like that.

I know this all sounds confusing, sorry. Hopefully you will understand what I mean.

As far as modules go, I can't use them. Thanks for the idea though. The site is being hosted by a crappy webhost company. So I can't change anything like that, or use PHP or use anything useful besides Perl and SSI.

I have tried the script you posted mr_ego. Thanks for pointing me in the right direction. But I know very little about Perl....I've always just used other peoples scripts, never took time out to learn any language. Anyways, I set up my own web server to test it out on temporarly. I always get a 500 error and when I check the Apache error log, I get "Syntax error on line 23 of EOF". Anybody got any ideas how to fix this, or what I'm doing wrong?

I have posted this same question in multiple forums, you guys are the first people that even responded. Thanks alot :)










privacy (GDPR)