Parallel Processing in Python Scripts

Discussion in 'Python' started by pradeep, May 22, 2012.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    At times it's possible to complete a batch job using parallel processing using fork instead of threads which needs much more careful planning. On the other hand fork is easy to implement, although it might not be as efficient and flexible as threads.

    Forking is an important part of *nix design, by forking shells enable us to chain many commands using pipes. The basic idea of forking is to create a clone of the current process, and the newly created processes are called child processes and they can tell themselves apart by checking the return value of the fork() call.

    In this article we'll see how to implement forking in Python scripts.

    Forking in Python



    Forking in started with fork() system call, this will create a clone of the running process, the sample code below will help in understanding better.

    Code:
    #!/usr/bin/python
    
    import os, sys, time
    
    ## following call will clone the current process if it's a parent and return
    ## the id of the newly created process
    ## when called inside a child will return 0
    pid = os.fork()
    
    if pid == 0:
        print "I am child"
        ## simulate some processing
        time.sleep(5)
        print "Bye from child"
        ## exit when finished
        sys.exit(0);
    else:
        print "I am parent"
        ## now wait for the child to finish and cleanup
        os.waitpid(pid, 0)
    

    Moving Ahead



    After fully understanding the basic example, you can experiment with your ideas of parallel processing, I have tried an example where the parent process forks child processes to fetch urls from an array.

    Code:
    import os, sys, time, urllib
    
    urls = ['http://docs.python.org/library/time.html','http://www.go4expert.com','http://whatanindianrecipe.com']
    
    for uri in urls:
        pid = os.fork()
    
        if pid == 0:
            print "fetching %s" % uri
            url_data = urllib.urlretrieve(uri)
            sys.exit(0);
    
    ## wait for all child processes
    os.wait()
    
    You can do much bigger things, your imagination is the limit. Enjoy coding in Python.

    References



    http://docs.python.org/library/os.html
     
    Last edited by a moderator: May 22, 2012
    shabbir likes this.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice