1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to scrap a website in node.js

Discussion in 'Programming' started by Ashish Bisht, Sep 19, 2016.

  1. Ashish Bisht

    Ashish Bisht New Member

    Joined:
    Aug 4, 2016
    Messages:
    13
    Likes Received:
    0
    Trophy Points:
    1
    Gender:
    Male
    Hey friends, I am here to share a query in javascript development forum, that how to scrap a website using nodejs which includes getting, parsing & extracting the content of a webpage . Read & share your views about scrapping the content of a website.
     
  2. richaj247

    richaj247 New Member

    Joined:
    Nov 19, 2015
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    Gender:
    Female
    Occupation:
    IT Consultant
    Location:
    USA
    Home Page:
    Web scraping is the technique of data extraction where you can pull data and information from the website. Node.js is the best tool for web scraping.
    Here are the three main steps of web scraping-
    1. Getting the HTML source code from the website
    2. Making sense of HTML content, finding the information and extracting it
    3. Moving the finalize information to storage (textfile, database etc.)
     
  3. Emily Williamson

    Emily Williamson New Member

    Joined:
    Sep 19, 2019
    Messages:
    2
    Likes Received:
    1
    Trophy Points:
    3
    Gender:
    Female
    Hi Ashish,


    Web scraper is going to be very minimalistic. The basic flow will be as follows:
    1. Launch web server
    2. Visit a URL on our server that activates the web scraper
    3. The scraper will make a request to the website we want to scrape
    4. The request will capture the HTML of the website and pass it along to our server
    5. We will traverse the DOM and extract the information we want
    6. Next, we will format the extracted data into a format we need
    7. Finally, we will save this formatted data into a JSON file on our machine
     

Share This Page