1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Access Remote URLs in Python With urllib2

Discussion in 'Python' started by pradeep, Apr 17, 2013.

  1. pradeep

    pradeep Team Leader

    Apr 4, 2005
    Likes Received:
    Trophy Points:
    Kolkata, India
    Home Page:
    Python urllib2 library contains functions with enables programmers to access remote URLs by helping out in the operations like HTTP Basic Authentication, cookies, redirects etc. It's Python's equivalent to Perl's LWP or ASP's XMLHttpRequest etc.

    The library allows you to add HTTP headers to requests, read response data & headers, error handling etc. Although urllib2 is not limited to HTTP we'll only be covering HTTP in this article. I'll try to explain and demonstrate the usage of urllib2 with a few examples so I am assuming that the reader has basic understanding of URLs or simply put how the web works.

    Using urllib To Fetch Remote URLs

    The code snippet below straightaway fetches an URL and prints out the received data nothing fancy about it, the simplest example:

    import urllib2
    res = urllib2.urlopen('http://www.go4expert.com')
    print res.read()
    As you might be aware that most of the HTTP requests types running over the internet can be distnguised as GET method or the POST method, in next code snippet we'll see how to make GET & POST requests.

    import urllib2
    import urllib
    ## that's the GET request
    res = urllib2.urlopen('http://www.go4expert.com?printable=yes')
    print res.read()
    ## post requests contain data, so here's how to make a POST request
    params = {}
    ## set post parameters
    params['article_id'] = 134
    params['text_str'] = 'my_test_program'
    ## encode the data to percent format
    data = urllib.urlencode(params)
    req = urllib2.Request('http://www.go4expert.com/post',data)
    ## make request
    res = urllib2.urlopen(req)
    ## print output
    print res.read()
    Now, you might need to access URLs which are protected using username/password i.e. HTTP Basic Authentication, in the next code snippet we'll look at how to add username password to it.

    import urllib2
    ## create the auth handler object
    basic_auth = urllib2.HTTPBasicAuthHandler()
    basic_auth.add_password('Protected', 'www.go4expert.com:80', 'admin', 'Wow#123')
    ## create opener using auth handler
    opener = urllib2.build_opener(basic_auth)
    ## make request
    ## print output
    print res.read()

    Error Handling

    The code snippet below will show how to handle exception when using urllib2.

    import urllib2
    req = urllib2.Request('http://www.go4expert.com/not_found')
        res = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print 'The server returned an error.'
        print 'Error code - ', e.code
    except urllib2.URLError, e:
        print 'Could not reach server.'
        print 'Reason - ', e.reason
        print 'Looks good'
    shabbir likes this.

Share This Page