Content Copy of webpage

Discussion in 'PHP' started by Bhullarz, Jan 17, 2009.

  1. Bhullarz

    Bhullarz New Member

    Joined:
    Nov 15, 2006
    Messages:
    253
    Likes Received:
    13
    Trophy Points:
    0
    Occupation:
    System Manager
    Home Page:
    http://www.tutors161.com
    using php , how can we collect the text data of a webpage ? I don't have access to the server of the webpage of which I want to copy.
     
  2. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    Something like this

    PHP:
    function getTitle($url
    {
        
    $doc = new DOMDocument// Create a new DOMDocument object in $doc 
        
    $doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc
        
    $a $doc->getElementsByTagName("title"); // Get all of the 'a' XHTML tags and there attributes and store in array $a
        
    return $a->item(0)->nodeValue;
    }
    It will get the title of the page
     
  3. Bhullarz

    Bhullarz New Member

    Joined:
    Nov 15, 2006
    Messages:
    253
    Likes Received:
    13
    Trophy Points:
    0
    Occupation:
    System Manager
    Home Page:
    http://www.tutors161.com
    Code:
    $url="http://www.go4expert.com";
    $doc = new DOMDocument; // Create a new DOMDocument object in $doc 
        $doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc
        $a = $doc->getElementsByTagName("title"); // Get all of the 'a' XHTML tags and there attributes and store in array $a
        echo $a->item(0)->nodeValue;
    
    IT didn't worked for me. what is the error here?
     
  4. RandiR

    RandiR New Member

    Joined:
    Mar 1, 2009
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    0
    Use the repro command or script SS_WebPageToText in biterscripting (biterscripting.com for free download).

    1. To copy the web page as it is to a local file, use the following command.

    repro "URL" > file.txt

    URL must begin with an http : / / .

    2. To extract only the plain text from the web page and store it to a local file,

    script SS_WebPageToText.txt page("URL") > file.txt

    This script is available at biterscripting.com / WW_WebPageToText.html .

    Randi
     
  5. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    RandiR Looks like the page you are referring to does not exist.
     
  6. RandiR

    RandiR New Member

    Joined:
    Mar 1, 2009
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    0
    Shabbir:

    I had to add spaces in the URLs, otherwise, I guess this site does not allow Links. Just remove the spaces so you can access the pages. In general all these pages are at biterscripting . com .

    Randi
     
  7. RandiR

    RandiR New Member

    Joined:
    Mar 1, 2009
    Messages:
    5
    Likes Received:
    0
    Trophy Points:
    0
    Shabbir:

    Oops, my mistake. The correct URL for the script SS_WebPageToText is (with spaces inserted)


    biterscripting.com / SS_WebPageToText.html

    Randi

    (I had typed WW instead of SS.)
     
  8. shabbir

    shabbir Administrator Staff Member

    Joined:
    Jul 12, 2004
    Messages:
    15,375
    Likes Received:
    388
    Trophy Points:
    83
    The links will be allowed after you up your post count though and yes now it looks correct
     
  9. P455w0rd_Cr4kz

    P455w0rd_Cr4kz Member

    Joined:
    Jan 12, 2007
    Messages:
    198
    Likes Received:
    12
    Trophy Points:
    18
    Location:
    H3LL
    Home Page:
    http://amishrakefight.org
    Why don't you use a website copier,so you can browse offline. Google for webripper/httrack/webcow/
    There's plenty out there,also you can specif what files you want to download {txt,php,html ect}
     
  10. Bhullarz

    Bhullarz New Member

    Joined:
    Nov 15, 2006
    Messages:
    253
    Likes Received:
    13
    Trophy Points:
    0
    Occupation:
    System Manager
    Home Page:
    http://www.tutors161.com
    Bro ! We are trying to download the content of the page, not the page itself. So these tools are not of use in this case. It's like copying text data of one file and pasting it into another file.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice