Go4Expert

Go4Expert (http://www.go4expert.com/)
-   PHP (http://www.go4expert.com/forums/php/)
-   -   Content Copy of webpage (http://www.go4expert.com/forums/content-copy-webpage-t15861/)

Bhullarz 18Jan2009 04:42

Content Copy of webpage
 
using php , how can we collect the text data of a webpage ? I don't have access to the server of the webpage of which I want to copy.

shabbir 18Jan2009 10:18

Re: Content Copy of webpage
 
Something like this

PHP Code:

function getTitle($url
{
    
$doc = new DOMDocument// Create a new DOMDocument object in $doc 
    
$doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc
    
$a $doc->getElementsByTagName("title"); // Get all of the 'a' XHTML tags and there attributes and store in array $a
    
return $a->item(0)->nodeValue;


It will get the title of the page

Bhullarz 23Jan2009 12:33

Re: Content Copy of webpage
 
Code:

$url="http://www.go4expert.com";
$doc = new DOMDocument; // Create a new DOMDocument object in $doc
    $doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc
    $a = $doc->getElementsByTagName("title"); // Get all of the 'a' XHTML tags and there attributes and store in array $a
    echo $a->item(0)->nodeValue;

IT didn't worked for me. what is the error here?

RandiR 2Mar2009 00:08

Re: Content Copy of webpage
 
Use the repro command or script SS_WebPageToText in biterscripting (biterscripting.com for free download).

1. To copy the web page as it is to a local file, use the following command.

repro "URL" > file.txt

URL must begin with an http : / / .

2. To extract only the plain text from the web page and store it to a local file,

script SS_WebPageToText.txt page("URL") > file.txt

This script is available at biterscripting.com / WW_WebPageToText.html .

Randi

shabbir 2Mar2009 00:52

Re: Content Copy of webpage
 
RandiR Looks like the page you are referring to does not exist.

RandiR 2Mar2009 03:46

Re: Content Copy of webpage
 
Shabbir:

I had to add spaces in the URLs, otherwise, I guess this site does not allow Links. Just remove the spaces so you can access the pages. In general all these pages are at biterscripting . com .

Randi

RandiR 2Mar2009 04:24

Re: Content Copy of webpage
 
Shabbir:

Oops, my mistake. The correct URL for the script SS_WebPageToText is (with spaces inserted)


biterscripting.com / SS_WebPageToText.html

Randi

(I had typed WW instead of SS.)

shabbir 2Mar2009 09:18

Re: Content Copy of webpage
 
The links will be allowed after you up your post count though and yes now it looks correct

P455w0rd_Cr4kz 2Mar2009 10:15

Re: Content Copy of webpage
 
Why don't you use a website copier,so you can browse offline. Google for webripper/httrack/webcow/
There's plenty out there,also you can specif what files you want to download {txt,php,html ect}

Bhullarz 3Mar2009 06:32

Re: Content Copy of webpage
 
Quote:

Originally Posted by P455w0rd_Cr4kz (Post 43699)
Why don't you use a website copier,so you can browse offline. Google for webripper/httrack/webcow/
There's plenty out there,also you can specif what files you want to download {txt,php,html ect}

Bro ! We are trying to download the content of the page, not the page itself. So these tools are not of use in this case. It's like copying text data of one file and pasting it into another file.


All times are GMT +5.5. The time now is 00:07.