I have to write a perl script for a project I am working on and I am kind of stuck, so I would appreciate any feedback. I am working with an epistolary novel (a novel that consists of a bunch of letters written by different characters) and I am supposed to separate out the letters by each character so that I end up with a bunch of files, each containing all of the letters from a particular character. So I want to separate all of the letters written by a character named Anna Howe, and I noticed that her letters begin with either Jan., Febr., or March and then a date, and her letters end with <p class="left">ANNA HOWE, and usually there is something in between the p class-"left" and her name, such as yours truly or your sister. So, I am trying to match the beginning of the letter and the ending of the letter using regular expressions, and then somehow telling the script to output the content in between those two patterns to a file named "Anna". This is the script that I wrote, but it doesn't work:
Code:
#!/usr/bin/perl -w
#Attempting to extract Anna's letters from Clarissa Text.

$input = "Clarissa_CleanText_Vol1";
$output = "Anna";

open(INPUT, $input) || die("couldn't open $input");
open(OUTPUT, ">$output");


@curr_letter_lines = (); # array that holds the lines of the current letter

$ref_to_curr_lines_arr = \@curr_letter_lines;

while(defined($inline = <INPUT>)){
    if(begin_letter($inline) eq "yes"){
        @curr_letter_lines = ($inline);    
        

    }elsif(end_letter($inline) eq "yes" ){ 

      
        $ref_to_curr_lines_arr = \@curr_letter_lines;
        print_curr_letter_lines(\@curr_letter_lines, $output);
        
 @curr_letter_lines = ();

    }else{ 
    push(@curr_letter_lines, $inline);
    }
}

sub print_curr_letter_lines{

    my($ref_to_curr_letter_lines_arr, $output) = @_;
    foreach $line(@{$ref_to_curr_letter_lines_arr}) {
        print(OUTPUT $line);  
    }
}


sub begin_letter {

    my ($all_letters) = @_;

    my ($want_begin_anna) = "no";

    if ($all_letters =~ /^Jan\.|Febr\.|March \d{2}\./) {

        $want_begin_anna = "yes";
    }
    return $want_begin_anna;
}


sub end_letter {

    my ($all_letters) = @_;

    my ($want_end_anna)    = "no";

    if ($all_letters =~ /<p class="left">.*?ANNA HOWE\./) {

        $want_end_anna = "yes"

    }
    return $want_end_anna;
}

Last edited by shabbir; 6Jul2012 at 08:29.. Reason: Code blocks