Go4Expert

Go4Expert (http://www.go4expert.com/)
-   C (http://www.go4expert.com/articles/c-tutorials/)
-   -   How to get source of a Web Page in C (http://www.go4expert.com/articles/source-web-page-c-t24813/)

lionaneesh 31Jan2011 21:40

How to get source of a Web Page in C
 
Note : The following source will only work with non-chunked encoding servers...The servers which have enabled the encoding set to chunked will not properly work with this source...

And I assume basic knowledge of SOCKETS UNIX API and C language as prerequisites...


Source


Code:

#include<stdio.h>
#include<netdb.h>
#include<sys/types.h>
#include<sys/socket.h>
#include<arpa/inet.h>
#include<string.h>

#define RESPONSE_RECV_LIMIT 3000
#define SOURCE_START_IDENTIFIER "<!DOCTYPE"
#define SOURCE_START_IDENTIFIER2 "<html>"                //this is the name of the identifier that the
#define FILENAME "/"                                // ENTER THE FILENAME HERE
#define PORT        "80"                                        // default for web-browsers

int main(int argc , char *argv[])
{
        if(argc != 2)
        {
                printf("Usage %s : hostname\n",argv[0]);
                return(0);
        }

        char response[RESPONSE_RECV_LIMIT+1];  // + 1 is for null
        char *source;
        int sockfd,newfd,err;
        char ip[INET6_ADDRSTRLEN];
        struct addrinfo *p,hints,*res;
        int len,len_s;
        int yes=1;
        struct sockaddr_storage their_addr;
        socklen_t addr_size;
        void *addr;
        char *ver;
        char request[100];

        sprintf(request,"GET %s HTTP/1.1\r\nHost: %s\r\n\r\n",FILENAME,argv[1]);

        // print the request we are making

        printf("%s\n\n",request);

        memset(&hints,0,sizeof hints);

        hints.ai_socktype=SOCK_STREAM;

        hints.ai_family=AF_UNSPEC;

        if ((err = getaddrinfo(argv[1],PORT, &hints, &res)) != 0)
        {
                fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
                return 1;
        }

        for(p=res;p!=NULL;p=p->ai_next)
        {               
                if( ( sockfd = socket(p->ai_family,p->ai_socktype,p->ai_protocol) ) == -1)
                {
                        printf("Socket error !!!\n");
                        return(0);
                }

                if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1)
                {
                        close(sockfd);
                        perror("client: connect");
                        continue;
                }
        }

        if(send(sockfd,request,strlen(request),0) < strlen(request))
        {
                perror("Send Error!!\n");
        }

        freeaddrinfo(res);

        if( recv(sockfd,response,RESPONSE_RECV_LIMIT,0) == 0 )
        {
                perror("Recv : ");
                return(1);
        }

        close(sockfd); // we dont need it any more

//        printf("%s",response); // for debugging purposes

        source = strstr(response,SOURCE_START_IDENTIFIER);

        if(source == NULL)
        {
                source = strstr(response,SOURCE_START_IDENTIFIER2);               
        }       
        printf("%s\n",source);
        return(0);
}

Compiling :-
Code:

gcc getSource.c -o getSource

Sample



I am providing sample with apache on my server …

You can see the settings here:-

Code:

aneesh@aneesh-laptop:~/articles/C/getSrc$ telnet 127.0.0.1 80

Trying 127.0.0.1...

Connected to 127.0.0.1.

Escape character is '^]'.

GET / HTTP/1.1

HTTP/1.1 400 Bad Request

Date: Mon, 31 Jan 2011 16:04:45 GMT

Server: Apache/2.2.14 (Ubuntu)

Vary: Accept-Encoding

Content-Length: 301

Connection: close

Content-Type: text/html; charset=iso-8859-1

Output :-
Code:

aneesh@aneesh-laptop:~/articles/C/getSrc$ ./getSource 127.0.0.1

GET / HTTP/1.1

Host: 127.0.0.1







<html><body><h1>It works!</h1>

<p>This is the default web page for this server.</p>

<p>The web server software is running but no content has been added, yet.</p>

</body></html>

Hey guyz stay tuned as i am trying hard to add Chunked data functionality to it and maybe i'll write another article on it ...

lionaneesh 2Feb2011 11:51

Re: How to get source of a Web Page in C
 
Thanks for accepting my article..
I hope you guyz like it!!!!!

nicolerisse 18Feb2011 20:01

Re: How to get source of a Web Page in C
 
I donīt accept it...

lionaneesh 18Feb2011 21:00

Re: How to get source of a Web Page in C
 
Quote:

Originally Posted by nicolerisse (Post 79540)
I donīt accept it...

Sorry but what you cant accept...
Please be descriptive in your posts

somay 6May2011 08:57

Re: How to get source of a Web Page in C
 
socket programming in c and c++
Tcp/ip programming also

somay 6May2011 08:59

socket programming
 
socket programming in c and c++
Tcp/ip programming also[/quote]


All times are GMT +5.5. The time now is 23:13.