You could use an XML parser. Alternatively, something like this...
Code:
#!/usr/bin/perl
my(@lines, $open, $close);
$open = "<".$ARGV[1];
$close = "</".$ARGV[1].">";
open(FILE, "$ARGV[0]") || die "couldnt open file";
@lines = <FILE>;
close(FILE);
foreach(@lines) {
if(/$open[^>]*>(.*)$close/) {print $1."\n";}
}
will search a file for a tag, and print out the text between it.
For example...
"perl parser.pl what.html title"
will output the text between the <title> and </title> tag in the what.html file
"perl parser.pl blah.html b"
will output all the bold text in the file (ie all the text between <b> and </b>)
Obviously thats pretty basic, but you could play around with it till it does what you need.
edit - stupid mistake