Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Python (http://www.go4expert.com/articles/python-tutorials/)
-   -   Access Remote URLs in Python With urllib2 (http://www.go4expert.com/articles/access-remote-urls-python-urllib2-t29620/)

pradeep 17Apr2013 16:44

Access Remote URLs in Python With urllib2
 
Python urllib2 library contains functions with enables programmers to access remote URLs by helping out in the operations like HTTP Basic Authentication, cookies, redirects etc. It's Python's equivalent to Perl's LWP or ASP's XMLHttpRequest etc.

The library allows you to add HTTP headers to requests, read response data & headers, error handling etc. Although urllib2 is not limited to HTTP we'll only be covering HTTP in this article. I'll try to explain and demonstrate the usage of urllib2 with a few examples so I am assuming that the reader has basic understanding of URLs or simply put how the web works.

Using urllib To Fetch Remote URLs



The code snippet below straightaway fetches an URL and prints out the received data nothing fancy about it, the simplest example:

Code: Python

import urllib2

res = urllib2.urlopen('http://www.go4expert.com')

print res.read()


As you might be aware that most of the HTTP requests types running over the internet can be distnguised as GET method or the POST method, in next code snippet we'll see how to make GET & POST requests.

Code: Python

import urllib2
import urllib

## that's the GET request
res = urllib2.urlopen('http://www.go4expert.com?printable=yes')

print res.read()

## post requests contain data, so here's how to make a POST request
params = {}

## set post parameters
params['article_id'] = 134
params['text_str'] = 'my_test_program'

## encode the data to percent format
data = urllib.urlencode(params)

req = urllib2.Request('http://www.go4expert.com/post',data)

## make request
res = urllib2.urlopen(req)

## print output
print res.read()


Now, you might need to access URLs which are protected using username/password i.e. HTTP Basic Authentication, in the next code snippet we'll look at how to add username password to it.

Code: Python

import urllib2

## create the auth handler object
basic_auth = urllib2.HTTPBasicAuthHandler()
basic_auth.add_password('Protected', 'www.go4expert.com:80', 'admin', 'Wow#123')

## create opener using auth handler
opener = urllib2.build_opener(basic_auth)

urllib2.install_opener(opener)

## make request
urllib2.urlopen('http://www.go4expert.com/protected')

## print output
print res.read()


Error Handling



The code snippet below will show how to handle exception when using urllib2.

Code: Python

import urllib2

req = urllib2.Request('http://www.go4expert.com/not_found')

try:
    res = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print 'The server returned an error.'
    print 'Error code - ', e.code
except urllib2.URLError, e:
    print 'Could not reach server.'
    print 'Reason - ', e.reason
else:
    print 'Looks good'



All times are GMT +5.5. The time now is 02:29.