Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Perl (http://www.go4expert.com/articles/perl-tutorials/)
-   -   Use Parallel Processing For Faster Perl Scripts (http://www.go4expert.com/articles/parallel-processing-faster-perl-scripts-t9083/)

pradeep 28Feb2008 13:03

Use Parallel Processing For Faster Perl Scripts
 

Introduction



Usually we run various scripts like newsletter mailer, backup scripts, etc. which take quite a lot of time, making us think of some ways to make it faster. One way to make it faster is to run some operations in parallel, like sending email to 20 subscribers in parallel for the newsletter mailer, instead of mailing one by one. Implenting parallel processing would bring in a huge difference in the taken to run the script.

Here we'll see how to implement parallel processing in Perl. For this purpose we'll use Parallel::ForkManager - a powerful object-oriented CPAN module - which can be downloaded from http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/ForkManager.pm or you can use the CPAN shell to install the module.

How To Use Parallel::ForkManager



Being an fully object-oriented module, we'll first need to create an instance of Parallel::ForkManager specifying the number of parallel process to fork.

Code: Perl

my $pm = new Parallel::ForkManager(50);


Be careful while chosing the number of parallel processes, you'd not want to crash the server. You may also change this number later on in your code like this,

Code: Perl

$pm->set_max_procs($max_processes);


The forking of a new parallel process is done with the start method, and you must define the point at which the process ends, which is done using the finish method. A loop is usually used for this purpose, let's see an example to get a better idea.

Code: Perl

for(1..100)
  {
      $pm->start and next;
      ## your code for parallel processing goes here
      ## do your stuff
      $pm->finish; ## end point of the parallel process
  }


You may also need the method wait_all_children, which forces the parent process to wait until all the child processes have finished executing.

A Simple Example



The best example I feel is sending out newsletters to your subcribers/registered users. Say your site has around 10,000 registered users, you might want to send out newsletter to them every week, you have all your email ids in a database, the newsletter text is kept in a file. Let's see how to go about writing such a program using parallel forking.

Code: Perl

use Parallel::ForkManager;
  use DBI;
  use strict;
 
  ## connect to database and query the database, we will have a statement handler $stmt
 
  ## open the newsletter text file and get the contents in the variable $mail_text
 
  my $pm = new Parallel::ForkManager(50);
  while(my($email_id) = $stmt->fetchrow_array())
  {
      $pm->start and next;
      open(SM,'|/usr/sbin/sendmail -f');
      print SM "To: $email\n";
      print SM "Subject: Newsletter\n\n";
      print SM "$mail_text";
      close(SM);
      $pm->finish
  }
 
  $pm->wait_all_children; ## wait for the child processes


Unfortuantely we won't be able to use database handlers inside the child processes, I tried to do so but it didn't work out. You may read about the limitations of the module here http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/ForkManager.pm#BUGS_AND_LIMITATIONS

amlan_das 22Mar2008 13:32

Re: Use Parallel Processing For Faster Perl Scripts
 
Code: PERL

## Changed Above Code A Little
use Parallel::ForkManager
use DBI;   
use strict;   

## connect to database and query the database, we will have a statement handler $stmt     
## open the newsletter text file and get the contents in the variable $mail_text     

my $pm = new Parallel::ForkManager(50);   
while(my($email_id) = $stmt->fetchrow_array())   
{     
  $pm->start and next;     
  open(SM,'|/usr/sbin/sendmail -f');       
  print SM "To: $email_id\n";      # The mail id variable was incorrected
  print SM "Subject: Newsletter\n\n";       
  print SM "$mail_text";     
  close(SM);       
  $pm->finish   }     
$pm->wait_all_children; ## wait for the child processes



All times are GMT +5.5. The time now is 06:19.