How to get source of a Web Page in C

lionaneesh's Avatar author of How to get source of a Web Page in C
This is an article on How to get source of a Web Page in C in C.
Note : The following source will only work with non-chunked encoding servers...The servers which have enabled the encoding set to chunked will not properly work with this source...

And I assume basic knowledge of SOCKETS UNIX API and C language as prerequisites...


Source


Code:
#include<stdio.h>
#include<netdb.h>
#include<sys/types.h>
#include<sys/socket.h>
#include<arpa/inet.h>
#include<string.h>

#define RESPONSE_RECV_LIMIT 3000
#define SOURCE_START_IDENTIFIER "<!DOCTYPE"
#define SOURCE_START_IDENTIFIER2 "<html>" 		//this is the name of the identifier that the 
#define FILENAME "/"		 		// ENTER THE FILENAME HERE
#define PORT	"80"			 		// default for web-browsers

int main(int argc , char *argv[])
{
	if(argc != 2)
	{
		printf("Usage %s : hostname\n",argv[0]);
		return(0);
	}

	char response[RESPONSE_RECV_LIMIT+1];  // + 1 is for null
	char *source;
	int sockfd,newfd,err;
	char ip[INET6_ADDRSTRLEN];
	struct addrinfo *p,hints,*res;
	int len,len_s;
	int yes=1;
	struct sockaddr_storage their_addr;
	socklen_t addr_size;
	void *addr;
	char *ver;
	char request[100];

	sprintf(request,"GET %s HTTP/1.1\r\nHost: %s\r\n\r\n",FILENAME,argv[1]);

	// print the request we are making

	printf("%s\n\n",request);

	memset(&hints,0,sizeof hints);

	hints.ai_socktype=SOCK_STREAM;

	hints.ai_family=AF_UNSPEC;

	if ((err = getaddrinfo(argv[1],PORT, &hints, &res)) != 0)
	{
		fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
		return 1;
	}

	for(p=res;p!=NULL;p=p->ai_next)
	{		
		if( ( sockfd = socket(p->ai_family,p->ai_socktype,p->ai_protocol) ) == -1)
		{
			printf("Socket error !!!\n");
			return(0);
		}

		if (connect(sockfd, p->ai_addr, p->ai_addrlen) == -1) 
		{
			close(sockfd);
			perror("client: connect");
			continue;
		}
	}

	if(send(sockfd,request,strlen(request),0) < strlen(request))
	{
		perror("Send Error!!\n");
	}

	freeaddrinfo(res);

	if( recv(sockfd,response,RESPONSE_RECV_LIMIT,0) == 0 )
	{
		perror("Recv : ");
		return(1);
	}

	close(sockfd); // we dont need it any more

//	printf("%s",response); // for debugging purposes

	source = strstr(response,SOURCE_START_IDENTIFIER);

	if(source == NULL)
	{
		source = strstr(response,SOURCE_START_IDENTIFIER2);		
	}	
	printf("%s\n",source);
	return(0);
}
Compiling :-
Code:
gcc getSource.c -o getSource

Sample



I am providing sample with apache on my server …

You can see the settings here:-

Code:
aneesh@aneesh-laptop:~/articles/C/getSrc$ telnet 127.0.0.1 80

Trying 127.0.0.1...

Connected to 127.0.0.1.

Escape character is '^]'.

GET / HTTP/1.1

HTTP/1.1 400 Bad Request

Date: Mon, 31 Jan 2011 16:04:45 GMT

Server: Apache/2.2.14 (Ubuntu)

Vary: Accept-Encoding

Content-Length: 301

Connection: close

Content-Type: text/html; charset=iso-8859-1
Output :-
Code:
aneesh@aneesh-laptop:~/articles/C/getSrc$ ./getSource 127.0.0.1

GET / HTTP/1.1

Host: 127.0.0.1







<html><body><h1>It works!</h1>

<p>This is the default web page for this server.</p>

<p>The web server software is running but no content has been added, yet.</p>

</body></html>
Hey guyz stay tuned as i am trying hard to add Chunked data functionality to it and maybe i'll write another article on it ...
lionaneesh's Avatar, Join Date: Mar 2010
Invasive contributor
Thanks for accepting my article..
I hope you guyz like it!!!!!
Scripting like this
nicolerisse's Avatar
Banned
I don´t accept it...
lionaneesh's Avatar, Join Date: Mar 2010
Invasive contributor
Quote:
Originally Posted by nicolerisse View Post
I don´t accept it...
Sorry but what you cant accept...
Please be descriptive in your posts
somay's Avatar, Join Date: Apr 2011
Newbie Member
socket programming in c and c++
Tcp/ip programming also
somay's Avatar, Join Date: Apr 2011
Newbie Member
socket programming in c and c++
Tcp/ip programming also[/quote]