Go4Expert

Go4Expert (http://www.go4expert.com/)
-   Perl (http://www.go4expert.com/forums/perl/)
-   -   extract columns by matching ids in two files (http://www.go4expert.com/forums/extract-columns-matching-ids-files-t28151/)

sheen 8Apr2012 03:32

extract columns by matching ids in two files
 
Blocks of code should be set as style "Formatted" like this.
Code: Cpp
Hello,

I want to extract columns from file2 to file3 by matching ids between file1 and file2. The extracted columns should be in same order as file1 ids.

for example:

file1.txt
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771
3 0 9
3RE 6 3 1



output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771
3 0 9

Please suggest me something.

Thanks,
/S

dearvivekkumar 12Apr2012 11:05

Re: extract columns by matching ids in two files
 
Code:

/*
file1.txt
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1



output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9
*/

#include <fstream>
#include <string>
#include <vector>
#include <map>

void ExtractCol()
{
        do
        {
                /*
                * Open file one collects its data line-by-line in vector of string.
                */
                std::fstream file;
                file.open("file1.txt", std::ios::in);
                if(!file)
                {
                        break;
                }
                std::vector<std::string> file1Data;
                std::string line("");
                while(!file.eof())
                {
                        line.clear();
                        std::getline(file, line, '\n');
                        file1Data.push_back(line);
                }
                file.close();

                /*
                * Open file2 and collects its data in string-string map.
                * the first word of each line in file 2 will acts as a
                * key for the map and rest part of each line will be
                * stored as its value.
                */
                file.open("file2.txt", std::ios::in);
                if(!file)
                        break;

                typedef std::pair<std::string, std::string> strstrpair;
                typedef std::map<std::string, std::string> strstrmap;
                strstrmap file2Data;
                while(!file.eof())
                {
                        line.clear();
                        std::getline(file, line, '\n');
                        size_t found = line.find_first_of(" ");
                        file2Data.insert(strstrpair(line.substr(0, found), line.substr(found+1, line.length() - 1)));
                }
                file.close();

                /*
                * Prepare data for file 3.
                * We need to put those lines of file 2 in file3 which
                * is common in both file1 and file2's starting word.
                */
                std::string file3Data("");
                for(std::vector<std::string>::iterator it = file1Data.begin(); it != file1Data.end(); ++it)
                {
                        strstrmap::iterator it2;
                        it2 = file2Data.find(*it);
                        if(it2 != file2Data.end())
                        {
                                file3Data.append(*it);
                                file3Data.append(" ");
                                file3Data.append(it2->second);
                                file3Data.append("\n");
                        }
                }

                /*
                * finally create file 3.
                */
                file.open("file3.txt", std::ios::out|std::ios::trunc);
                if(!file)
                        break;
                file.write(file3Data.c_str(), file3Data.length());
                file.close();
        }while(false);
}


ccharley 30Apr2012 08:32

Re: extract columns by matching ids in two files
 
Hello Sheen,

Perl could solve this problem with code like that below. Notice the $trie, (pronounced 'try'), variable. Starting with perl 5.10 I believe, perl uses a trie to search for alternating strings. It is Big O1 or constant and scales well.

My code builds a trie of the alternating values in file1. Then, it reads file 2 and if the beginning of any line matches the trie, it prints out that line from file 2. If you want that in a third file, simply open a file for wring and print there. My example just prints to STDOUT, (the console window).

Chris

Code:

#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $file1 = <<EOF;
1823
607
R2A9
802
771
EOF

my $file2 = <<EOF;
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1
EOF

my $trie;
{
        local $/;
        open my $fh, "<", \$file1;
        $trie = join "|", split /\n/, <$fh>;
        close $fh or die $!;
}

open my $fh, "<", \$file2;
/^(?:$trie)/ && print  while <$fh>;
close $fh or die $!;

The output is:

Code:

C:\Old_Data\perlp>perl t.pl
1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9


ccharley 30Apr2012 20:20

Re: extract columns by matching ids in two files
 
Oh, just saw that you were looking for a Cpp solution.


All times are GMT +5.5. The time now is 04:50.