extract columns by matching ids in two files

sheen's Avatar, Join Date: Apr 2012
Newbie Member
Blocks of code should be set as style "Formatted" like this.
Code: Cpp
Hello,

I want to extract columns from file2 to file3 by matching ids between file1 and file2. The extracted columns should be in same order as file1 ids.

for example:

file1.txt
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771
3 0 9
3RE 6 3 1



output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771
3 0 9

Please suggest me something.

Thanks,
/S
0
dearvivekkumar's Avatar, Join Date: Feb 2012
Go4Expert Member
Code:
/*
file1.txt 
1823
607
R2A9
802
771

file2.txt
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1



output file3.txt should be printed in this way

1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9
*/

#include <fstream>
#include <string>
#include <vector>
#include <map>

void ExtractCol()
{
	do
	{
		/*
		 * Open file one collects its data line-by-line in vector of string.
		 */
		std::fstream file;
		file.open("file1.txt", std::ios::in);
		if(!file)
		{
			break;
		}
		std::vector<std::string> file1Data;
		std::string line("");
		while(!file.eof())
		{
			line.clear();
			std::getline(file, line, '\n');
			file1Data.push_back(line);
		}
		file.close();

		/*
		 * Open file2 and collects its data in string-string map.
		 * the first word of each line in file 2 will acts as a
		 * key for the map and rest part of each line will be 
		 * stored as its value.
		 */
		file.open("file2.txt", std::ios::in);
		if(!file)
			break;

		typedef std::pair<std::string, std::string> strstrpair;
		typedef std::map<std::string, std::string> strstrmap;
		strstrmap file2Data;
		while(!file.eof())
		{
			line.clear();
			std::getline(file, line, '\n');
			size_t found = line.find_first_of(" ");
			file2Data.insert(strstrpair(line.substr(0, found), line.substr(found+1, line.length() - 1)));
		}
		file.close();

		/*
		 * Prepare data for file 3.
		 * We need to put those lines of file 2 in file3 which 
		 * is common in both file1 and file2's starting word.
		 */
		std::string file3Data("");
		for(std::vector<std::string>::iterator it = file1Data.begin(); it != file1Data.end(); ++it)
		{
			strstrmap::iterator it2;
			it2 = file2Data.find(*it);
			if(it2 != file2Data.end())
			{
				file3Data.append(*it);
				file3Data.append(" ");
				file3Data.append(it2->second);
				file3Data.append("\n");
			}
		}

		/* 
		 * finally create file 3.
		 */
		file.open("file3.txt", std::ios::out|std::ios::trunc);
		if(!file)
			break;
		file.write(file3Data.c_str(), file3Data.length());
		file.close();
	}while(false);
}
0
ccharley's Avatar, Join Date: Apr 2012
Newbie Member
Hello Sheen,

Perl could solve this problem with code like that below. Notice the $trie, (pronounced 'try'), variable. Starting with perl 5.10 I believe, perl uses a trie to search for alternating strings. It is Big O1 or constant and scales well.

My code builds a trie of the alternating values in file1. Then, it reads file 2 and if the beginning of any line matches the trie, it prints out that line from file 2. If you want that in a third file, simply open a file for wring and print there. My example just prints to STDOUT, (the console window).

Chris

Code:
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;

my $file1 = <<EOF;
1823
607
R2A9
802
771
EOF

my $file2 = <<EOF;
1823 1 2 4
22 11 4 29
607 12 3 3
R2A9 34 4 9
D33 2 1 0
802 30 8 1
771 3 0 9
3RE 6 3 1
EOF

my $trie;
{
	local $/;
	open my $fh, "<", \$file1;
	$trie = join "|", split /\n/, <$fh>;
	close $fh or die $!;
}

open my $fh, "<", \$file2;
/^(?:$trie)/ && print  while <$fh>;
close $fh or die $!;
The output is:

Code:
C:\Old_Data\perlp>perl t.pl
1823 1 2 4
607 12 3 3
R2A9 34 4 9
802 30 8 1
771 3 0 9
0
ccharley's Avatar, Join Date: Apr 2012
Newbie Member
Oh, just saw that you were looking for a Cpp solution.