|
28 Jan 2003, 17:31
|
#1
|
Registered User
Join Date: Apr 2002
Location: Leeds, but looking for a way to escape
Posts: 128
|
Reading headers in PHP
I am trying to create a little php script to parse through a list of URLs and check that each of these is a website - i.e. not a 404
I have the parsing and list control part of the script sorted, but I cant seem to find any functions to open a specific url and get the headers.
Does anybody know if/how this can be done?
EDIT : Once I can get the headers for the site back, I then plan to check if it contains "HTTP/1.0 404 Not Found"
__________________
SELECT everything FROM everywhere WHERE something = something_else;
> 42
|
|
|
28 Jan 2003, 23:52
|
#2
|
Albatross!
Join Date: Mar 2000
Location: Oslo
Posts: 14
|
fopen()
Try to read on fopen();
It can read more than files...
(unless you use safemode)
Otherwise you could use cURL lib.
For a newbie fopen() is easier and since you dont know how to open a file or url I would guess this is your function.
www.php.net/fopen
www.php.net/fread
www.php.net/fclose <-- cause you should close.. good programming
www.php.net/fwrite <-- to write to file
www.php.net/curl
Chriso
__________________
.........................
Any kiddie in school can love like a fool,
But Hating, my boy, is an Art.
-- Ogden Nash
|
|
|
29 Jan 2003, 12:30
|
#3
|
Forever Delayed
Join Date: Sep 2000
Location: www.netgamers.org
Posts: 1,475
|
And of course, publish your scripts if you can - thus showing other ppl how to do it I publish every PHP thing I do, unless it'll disadvantage my employers.
M.
__________________
Firefly Oper and General l4m3r - "I Do Stuff"
O2 Rip-off campaign
<vampy> plus i hate people ... i despise humanity as a whole
pablissimo "I'm still geting over the fact you just posted a pic of your own vomit"
|
|
|
29 Jan 2003, 12:50
|
#4
|
humble ex-n00b
Join Date: May 2001
Posts: 51
|
this is a webclass pulled from php help
added my own functions around it some time ago and dug this chuck of code out for you. hope it helps.
[edit: strictly speaking function shouldnt be called "GetHeaderAsText", its more GetHeadersAsArray, but hey ]
satanis.
Code:
<?php
# this just demo's the GetPageAsText function
#$pagetext = GetPageAsText("www.yahoo.com","/",80);
#echo $pagetext;
# this just demo's the GetHeaderAsText function
# using /index.htm since request was for a 404 detector
$headers = GetHeaderAsText("www.yahoo.com","/index.htm",80);
echo $headers["status"];
echo $headers["Content-Type"];
function GetHeaderAsText($thehost,$theurlpath,$theport)
{
$file = new GetWebObject($thehost, $theport, $theurlpath);
return $file->get_header();
}
# GetPageAsText (refer the php documentation where the GetWebObject class was lifted from)
# $thehost is the "www.somedomain.com" part
# $theurlpath is the url so.. "index.htm" or "dynamicpage.php?var1=a&var2=b"
# $theport is port for connecting socket, typical web = 80
function GetPageAsText($thehost,$theurlpath,$theport)
{
$file = new GetWebObject($thehost, $theport, $theurlpath);
return $file->get_content();
}
class GetWebObject
{
var $host = "";
var $port = "";
var $path = "";
var $header = array();
var $content = "";
function GetWebObject($host, $port, $path)
{
$this->host = $host;
$this->port = $port;
$this->path = $path;
$this->fetch();
}
function fetch()
{
$fp = fsockopen ($this->host, $this->port);
if(!$fp)
{ die("Could not connect to host.");}
$header_done=false;
$request = "GET ".$this->path." HTTP/1.0\r\n";
$request .= "User-Agent: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)\r\n";
$request .= "Host: ".$this->host."\r\n";
$request .= "Connection: Close\r\n\r\n";
$return = '';
fputs ($fp, $request);
$line = fgets ($fp, 128);
$this->header["status"] = $line;
while (!feof($fp))
{
$line = fgets ( $fp, 128 );
if($header_done)
{ $this->content .= $line;}
else
{
if($line == "\r\n")
{ $header_done=true;}
else
{
$data = explode(": ",$line);
$this->header[$data[0]] = $data[1];
}
}
}
fclose ($fp);
}
function get_header()
{ return($this->header);}
function get_content()
{ return($this->content);}
}
?>
__________________
think you live in a free country?
wrong - its a democracy - and the majority disagree with you.
|
|
|
31 Jan 2003, 00:18
|
#5
|
Registered User
Join Date: Apr 2002
Location: Leeds, but looking for a way to escape
Posts: 128
|
Well this is my report, in the end I went down the perl route to do this as getting PHP to handle it was a nightmare from either a functionality point of view or a timing point of view (fsockopen was taking 1 min+ per url, with a list of 2000+ urls it was gonna take a while)
In the end, I bit the bullet, and wrote the following in perl :
Code:
#! d:\development\perl\bin\perl.exe
use LWP::Simple;
print "URL Checker \n\n";
@Array = (
#URLS HERE
);
my $array_element;
foreach $array_element(@Array)
{
my $content = get $array_element;
if ($content)
{
print "$array_element \n";
open(GOOD_URLS, ">>goodurls.txt");
print GOOD_URLS "$array_element \n";
}
else
{
open(BAD_URLS, ">>badurls.txt");
print BAD_URLS "$array_element \n";
}
#else
#{
# print "$array_element - BAD \n";
#}
}
Which does (in a dirty way) what I need - admitted it doesnt handle reading the URLs in from a file but this could easily be added
At the end of the day, not bad for 30 mins research by somebody who knew nothing about perl
EDIT : At the time of posting I hadnt tried tyriel's code posted above, will keep that for future reference, ty
__________________
SELECT everything FROM everywhere WHERE something = something_else;
> 42
|
|
|
2 Feb 2003, 06:13
|
#6
|
Albatross!
Join Date: Mar 2000
Location: Oslo
Posts: 14
|
Quote:
Originally posted by BuddhistPunk
[b]Well this is my report, in the end I went down the perl route to do this as getting PHP to handle it was a nightmare from either a functionality point of view or a timing point of view (fsockopen was taking 1 min+ per url, with a list of 2000+ urls it was gonna take a while)
In the end, I bit the bullet, and wrote the following in perl :
::::SNIP::::
|
Even tough I am a great fan of PHP I have to admit that url loading is done faster in perl.
The exception is use of cURL lib for doing so, however seeing that its a independent library you might aswell use perl to load it rather than php.
Nice to see you got it running
Chriso
__________________
.........................
Any kiddie in school can love like a fool,
But Hating, my boy, is an Art.
-- Ogden Nash
|
|
|
2 Feb 2003, 12:54
|
#7
|
Guest
|
Quote:
Originally posted by BuddhistPunk
In the end, I bit the bullet, and wrote the following in perl :
|
Perl rocks, so no need to be sad about using it
Quote:
Originally posted by BuddhistPunk
Which does (in a dirty way) what I need - admitted it doesnt handle reading the URLs in from a file but this could easily be added
|
Code:
open (FILE, $ARGV[0]) or die "Can't open file $ARGV[0]: $!";
@array = <FILE>;
close (FILE);
should do it ... then call the program with the file containing urls as first parameter. (ie, 'perl urlchecker.pl urls.txt')
Quote:
Originally posted by BuddhistPunk
At the end of the day, not bad for 30 mins research by somebody who knew nothing about perl
|
If thats a first, I'm impressed
|
|
|
|
All times are GMT +1. The time now is 09:53.
| |