User Name
Password

Go Back   Planetarion Forums > Non Planetarion Discussions > Programming and Discussion
Register FAQ Members List Calendar Arcade Today's Posts

Reply
Thread Tools Display Modes
Unread 28 Jan 2003, 17:31   #1
BuddhistPunk
Registered User
 
Join Date: Apr 2002
Location: Leeds, but looking for a way to escape
Posts: 128
BuddhistPunk is an unknown quantity at this point
Reading headers in PHP

I am trying to create a little php script to parse through a list of URLs and check that each of these is a website - i.e. not a 404

I have the parsing and list control part of the script sorted, but I cant seem to find any functions to open a specific url and get the headers.

Does anybody know if/how this can be done?

EDIT : Once I can get the headers for the site back, I then plan to check if it contains "HTTP/1.0 404 Not Found"
__________________
SELECT everything FROM everywhere WHERE something = something_else;
> 42
BuddhistPunk is offline   Reply With Quote
Unread 28 Jan 2003, 23:52   #2
Breed
Albatross!
 
Join Date: Mar 2000
Location: Oslo
Posts: 14
Breed is an unknown quantity at this point
fopen()

Try to read on fopen();

It can read more than files...
(unless you use safemode)

Otherwise you could use cURL lib.

For a newbie fopen() is easier and since you dont know how to open a file or url I would guess this is your function.

www.php.net/fopen
www.php.net/fread
www.php.net/fclose <-- cause you should close.. good programming
www.php.net/fwrite <-- to write to file
www.php.net/curl

Chriso
__________________
.........................
Any kiddie in school can love like a fool,
But Hating, my boy, is an Art.
-- Ogden Nash
Breed is offline   Reply With Quote
Unread 29 Jan 2003, 12:30   #3
Mong
Forever Delayed
 
Join Date: Sep 2000
Location: www.netgamers.org
Posts: 1,475
Mong is on a distinguished road
And of course, publish your scripts if you can - thus showing other ppl how to do it I publish every PHP thing I do, unless it'll disadvantage my employers.

M.
__________________
Firefly Oper and General l4m3r - "I Do Stuff"

O2 Rip-off campaign

<vampy> plus i hate people ... i despise humanity as a whole

pablissimo "I'm still geting over the fact you just posted a pic of your own vomit"
Mong is offline   Reply With Quote
Unread 29 Jan 2003, 12:50   #4
tyriel
humble ex-n00b
 
Join Date: May 2001
Posts: 51
tyriel is an unknown quantity at this point
this is a webclass pulled from php help

added my own functions around it some time ago and dug this chuck of code out for you. hope it helps.

[edit: strictly speaking function shouldnt be called "GetHeaderAsText", its more GetHeadersAsArray, but hey ]

satanis.

Code:
<?php

# this just demo's the GetPageAsText function
#$pagetext = GetPageAsText("www.yahoo.com","/",80);
#echo $pagetext;

# this just demo's the GetHeaderAsText function
# using /index.htm since request was for a 404 detector
$headers = GetHeaderAsText("www.yahoo.com","/index.htm",80);

 echo $headers["status"];
 echo $headers["Content-Type"];


function GetHeaderAsText($thehost,$theurlpath,$theport)
{
	$file = new GetWebObject($thehost, $theport, $theurlpath);
	return $file->get_header();
}

# GetPageAsText (refer the php documentation where the GetWebObject class was lifted from)
# $thehost is the "www.somedomain.com" part
# $theurlpath is the url so.. "index.htm" or "dynamicpage.php?var1=a&var2=b"
# $theport is port for connecting socket, typical web = 80
function GetPageAsText($thehost,$theurlpath,$theport)
{
	$file = new GetWebObject($thehost, $theport, $theurlpath);
	return $file->get_content();
}

class GetWebObject
{
 var $host  = "";
 var $port  = "";
 var $path   = "";
 var $header = array();
 var $content = "";

 function GetWebObject($host, $port, $path)
 {
   $this->host = $host;
   $this->port = $port;
   $this->path = $path;
   $this->fetch();
 }

 function fetch()
 {
   $fp = fsockopen ($this->host, $this->port);

   if(!$fp)
   { die("Could not connect to host.");}

   $header_done=false;

   $request = "GET ".$this->path." HTTP/1.0\r\n";
   $request .= "User-Agent: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)\r\n";
   $request .= "Host: ".$this->host."\r\n";
   $request .= "Connection: Close\r\n\r\n";
   $return = '';

   fputs ($fp, $request);

   $line = fgets ($fp, 128);
   $this->header["status"] = $line;

   while (!feof($fp))
   {
     $line = fgets ( $fp, 128 );
     if($header_done)
     { $this->content .= $line;}
     else
     {
       if($line == "\r\n")
       { $header_done=true;}
       else
       {
         $data = explode(": ",$line);
         $this->header[$data[0]] = $data[1];
       }
     }
   }

   fclose ($fp);
 }

 function get_header()
 { return($this->header);}

 function get_content()
 { return($this->content);}
}

?>
__________________
think you live in a free country?

wrong - its a democracy - and the majority disagree with you.
tyriel is offline   Reply With Quote
Unread 31 Jan 2003, 00:18   #5
BuddhistPunk
Registered User
 
Join Date: Apr 2002
Location: Leeds, but looking for a way to escape
Posts: 128
BuddhistPunk is an unknown quantity at this point
Well this is my report, in the end I went down the perl route to do this as getting PHP to handle it was a nightmare from either a functionality point of view or a timing point of view (fsockopen was taking 1 min+ per url, with a list of 2000+ urls it was gonna take a while)

In the end, I bit the bullet, and wrote the following in perl :

Code:
#! d:\development\perl\bin\perl.exe

use LWP::Simple; 

print "URL Checker \n\n";

@Array = ( 
#URLS HERE
);

my $array_element;

foreach $array_element(@Array)
{
   my $content = get $array_element;
   
   if ($content)
   {
	   print "$array_element \n";
	   
	   open(GOOD_URLS, ">>goodurls.txt");
	   print GOOD_URLS "$array_element \n";	   
   }
   else
   {
	   open(BAD_URLS, ">>badurls.txt");
	   print BAD_URLS "$array_element \n";
   }
   #else
   #{
   #	   print "$array_element - BAD \n";
   #}
  
}
Which does (in a dirty way) what I need - admitted it doesnt handle reading the URLs in from a file but this could easily be added

At the end of the day, not bad for 30 mins research by somebody who knew nothing about perl

EDIT : At the time of posting I hadnt tried tyriel's code posted above, will keep that for future reference, ty
__________________
SELECT everything FROM everywhere WHERE something = something_else;
> 42
BuddhistPunk is offline   Reply With Quote
Unread 2 Feb 2003, 06:13   #6
Breed
Albatross!
 
Join Date: Mar 2000
Location: Oslo
Posts: 14
Breed is an unknown quantity at this point
Quote:
Originally posted by BuddhistPunk
[b]Well this is my report, in the end I went down the perl route to do this as getting PHP to handle it was a nightmare from either a functionality point of view or a timing point of view (fsockopen was taking 1 min+ per url, with a list of 2000+ urls it was gonna take a while)

In the end, I bit the bullet, and wrote the following in perl :
::::SNIP::::
Even tough I am a great fan of PHP I have to admit that url loading is done faster in perl.
The exception is use of cURL lib for doing so, however seeing that its a independent library you might aswell use perl to load it rather than php.

Nice to see you got it running

Chriso
__________________
.........................
Any kiddie in school can love like a fool,
But Hating, my boy, is an Art.
-- Ogden Nash
Breed is offline   Reply With Quote
Unread 2 Feb 2003, 12:54   #7
Pariah
Guest
 
Posts: n/a
Quote:
Originally posted by BuddhistPunk
In the end, I bit the bullet, and wrote the following in perl :
Perl rocks, so no need to be sad about using it

Quote:
Originally posted by BuddhistPunk
Which does (in a dirty way) what I need - admitted it doesnt handle reading the URLs in from a file but this could easily be added
Code:
open (FILE, $ARGV[0]) or die "Can't open file $ARGV[0]: $!";
@array = <FILE>;
close (FILE);
should do it ... then call the program with the file containing urls as first parameter. (ie, 'perl urlchecker.pl urls.txt')


Quote:
Originally posted by BuddhistPunk
At the end of the day, not bad for 30 mins research by somebody who knew nothing about perl
If thats a first, I'm impressed
  Reply With Quote
Reply



Forum Jump


All times are GMT +1. The time now is 09:53.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2002 - 2018