Access Remote URLs in Python With urllib2

Discussion in 'Python' started by pradeep, Apr 17, 2013.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    Python urllib2 library contains functions with enables programmers to access remote URLs by helping out in the operations like HTTP Basic Authentication, cookies, redirects etc. It's Python's equivalent to Perl's LWP or ASP's XMLHttpRequest etc.

    The library allows you to add HTTP headers to requests, read response data & headers, error handling etc. Although urllib2 is not limited to HTTP we'll only be covering HTTP in this article. I'll try to explain and demonstrate the usage of urllib2 with a few examples so I am assuming that the reader has basic understanding of URLs or simply put how the web works.

    Using urllib To Fetch Remote URLs



    The code snippet below straightaway fetches an URL and prints out the received data nothing fancy about it, the simplest example:

    Code:
    import urllib2
    
    res = urllib2.urlopen('http://www.go4expert.com')
    
    print res.read()
    
    As you might be aware that most of the HTTP requests types running over the internet can be distnguised as GET method or the POST method, in next code snippet we'll see how to make GET & POST requests.

    Code:
    import urllib2
    import urllib
    
    ## that's the GET request
    res = urllib2.urlopen('http://www.go4expert.com?printable=yes')
    
    print res.read()
    
    ## post requests contain data, so here's how to make a POST request
    params = {}
    
    ## set post parameters
    params['article_id'] = 134
    params['text_str'] = 'my_test_program'
    
    ## encode the data to percent format
    data = urllib.urlencode(params)
    
    req = urllib2.Request('http://www.go4expert.com/post',data)
    
    ## make request
    res = urllib2.urlopen(req)
    
    ## print output
    print res.read()
    
    Now, you might need to access URLs which are protected using username/password i.e. HTTP Basic Authentication, in the next code snippet we'll look at how to add username password to it.

    Code:
    import urllib2
    
    ## create the auth handler object
    basic_auth = urllib2.HTTPBasicAuthHandler()
    basic_auth.add_password('Protected', 'www.go4expert.com:80', 'admin', 'Wow#123')
    
    ## create opener using auth handler
    opener = urllib2.build_opener(basic_auth)
    
    urllib2.install_opener(opener)
    
    ## make request
    urllib2.urlopen('http://www.go4expert.com/protected')
    
    ## print output
    print res.read()
    

    Error Handling



    The code snippet below will show how to handle exception when using urllib2.

    Code:
    import urllib2
    
    req = urllib2.Request('http://www.go4expert.com/not_found')
    
    try:
        res = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print 'The server returned an error.'
        print 'Error code - ', e.code
    except urllib2.URLError, e:
        print 'Could not reach server.'
        print 'Reason - ', e.reason
    else:
        print 'Looks good'
    
     
    shabbir likes this.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice