How to get source of a Web Page in C

Discussion in 'C' started by lionaneesh, Jan 31, 2011.

  1. lionaneesh

    lionaneesh Active Member

    Joined:
    Mar 21, 2010
    Messages:
    848
    Likes Received:
    224
    Trophy Points:
    43
    Occupation:
    Student
    Location:
    India
    Note : The following source will only work with non-chunked encoding servers...The servers which have enabled the encoding set to chunked will not properly work with this source...

    And I assume basic knowledge of SOCKETS UNIX API and C language as prerequisites...


    Source


    Code:
    #include<stdio.h>
    #include<netdb.h>
    #include<sys/types.h>
    #include<sys/socket.h>
    #include<arpa/inet.h>
    #include<string.h>
    
    #define RESPONSE_RECV_LIMIT 3000
    #define SOURCE_START_IDENTIFIER "<!DOCTYPE"
    #define SOURCE_START_IDENTIFIER2 "<html>" 		//this is the name of the identifier that the 
    #define FILENAME "/"		 		// ENTER THE FILENAME HERE
    #define PORT	"80"			 		// default for web-browsers
    
    int main(int argc , char *argv[])
    {
    	if(argc != 2)
    	{
    		printf("Usage %s : hostname\n",argv[0]);
    		return(0);
    	}
    
    	char response[RESPONSE_RECV_LIMIT+1];  // + 1 is for null
    	char *source;
    	int sockfd,newfd,err;
    	char ip[INET6_ADDRSTRLEN];
    	struct addrinfo *p,hints,*res;
    	int len,len_s;
    	int yes=1;
    	struct sockaddr_storage their_addr;
    	socklen_t addr_size;
    	void *addr;
    	char *ver;
    	char request[100];
    
    	sprintf(request,"GET %s HTTP/1.1\r\nHost: %s\r\n\r\n",FILENAME,argv[1]);
    
    	// print the request we are making
    
    	printf("%s\n\n",request);
    
    	memset(&hints,0,sizeof hints);
    
    	hints.ai_socktype=SOCK_STREAM;
    
    	hints.ai_family=AF_UNSPEC;
    
    	if ((err = getaddrinfo(argv[1],PORT, &hints, &res)) != 0)
    	{
    		fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
    		return 1;
    	}
    
    	for(p=res;p!=NULL;p=p->ai_next)
    	{		
    		if( ( sockfd = socket(p->ai_family,p->ai_socktype,p->ai_protocol) ) == -1)
    		{
    			printf("Socket error !!!\n");
    			return(0);
    		}
    
    		if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1) 
    		{
    			close(sockfd);
    			perror("client: connect");
    			continue;
    		}
    	}
    
    	if(send(sockfd,request,strlen(request),0) < strlen(request))
    	{
    		perror("Send Error!!\n");
    	}
    
    	freeaddrinfo(res);
    
    	if( recv(sockfd,response,RESPONSE_RECV_LIMIT,0) == 0 )
    	{
    		perror("Recv : ");
    		return(1);
    	}
    
    	close(sockfd); // we dont need it any more
    
    //	printf("%s",response); // for debugging purposes
    
    	source = strstr(response,SOURCE_START_IDENTIFIER);
    
    	if(source == NULL)
    	{
    		source = strstr(response,SOURCE_START_IDENTIFIER2);		
    	}	
    	printf("%s\n",source);
    	return(0);
    }
    
    Compiling :-
    Code:
    gcc getSource.c -o getSource 
    

    Sample



    I am providing sample with apache on my server …

    You can see the settings here:-

    Code:
    aneesh@aneesh-laptop:~/articles/C/getSrc$ telnet 127.0.0.1 80
    
    Trying 127.0.0.1...
    
    Connected to 127.0.0.1.
    
    Escape character is '^]'.
    
    GET / HTTP/1.1
    
    HTTP/1.1 400 Bad Request
    
    Date: Mon, 31 Jan 2011 16:04:45 GMT
    
    Server: Apache/2.2.14 (Ubuntu)
    
    Vary: Accept-Encoding
    
    Content-Length: 301
    
    Connection: close
    
    Content-Type: text/html; charset=iso-8859-1
    
    Output :-
    Code:
    aneesh@aneesh-laptop:~/articles/C/getSrc$ ./getSource 127.0.0.1
    
    GET / HTTP/1.1
    
    Host: 127.0.0.1
    
    
    
    
    
    
    
    <html><body><h1>It works!</h1>
    
    <p>This is the default web page for this server.</p>
    
    <p>The web server software is running but no content has been added, yet.</p>
    
    </body></html>
    
    
    Hey guyz stay tuned as i am trying hard to add Chunked data functionality to it and maybe i'll write another article on it ...
     
  2. lionaneesh

    lionaneesh Active Member

    Joined:
    Mar 21, 2010
    Messages:
    848
    Likes Received:
    224
    Trophy Points:
    43
    Occupation:
    Student
    Location:
    India
    Thanks for accepting my article..
    I hope you guyz like it!!!!!
     
    Scripting likes this.
  3. nicolerisse

    nicolerisse Banned

    Joined:
    Feb 18, 2011
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    0
    I don´t accept it...
     
  4. lionaneesh

    lionaneesh Active Member

    Joined:
    Mar 21, 2010
    Messages:
    848
    Likes Received:
    224
    Trophy Points:
    43
    Occupation:
    Student
    Location:
    India
    Sorry but what you cant accept...
    Please be descriptive in your posts
     
  5. somay

    somay New Member

    Joined:
    Apr 29, 2011
    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    0
    socket programming in c and c++
    Tcp/ip programming also
     
  6. somay

    somay New Member

    Joined:
    Apr 29, 2011
    Messages:
    3
    Likes Received:
    0
    Trophy Points:
    0
    socket programming

    socket programming in c and c++
    Tcp/ip programming also[/quote]
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice