1. We have moved from vBulletin to XenForo and you are viewing the site in the middle of the move. Though the functional aspect of everything is working fine, we are still working on other changes including the new design on Xenforo.
    Dismiss Notice

Intersection of two strings

Discussion in 'C++' started by CrazyGal, Jul 22, 2009.

  1. CrazyGal

    CrazyGal New Member

    Hi,

    Could someone tell me if there's a better way of doing this? I was asked this question in an interview and I wrote a piece of code of O(n2).

    Code:
    // This function returns a new string of the common alphabets found in a and b
    char* intersection(char*a, char* b)
    {
       static char temp[20];
       int i = 0;
       if ( (a==NULL) || (b==NULL))
            exit(0);
       else
        {
             while ( a != NULL)
             {
                   if(strchr(b, *a))
                   {
                        temp[i] = *a;
                        i++;
                        a++;
                   }
             }  
        }
    
        return temp;
    }
    
    Thanks,
    Tina
     
    Last edited by a moderator: Jul 22, 2009
  2. CrazyGal

    CrazyGal New Member

    Resending the query with code tags.

    Intersection of two strings
    Hi,

    Could someone tell me if there's a better way of doing this? I was asked this question in an interview and I wrote a piece of code of O(n2).

    // This function returns a new string of the common alphabets found in a and b


    Thanks,
    Tina
     
  3. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    I think this will work too :
    Code:
    #include <algorithm>
    #include <iostream>
    #include <iterator>
    #include <set>
    #include <string>
    
    using namespace std;
    
    int main()
    {
        string a = "abcdk57";
        string b = "abcdek5";
        string c;
    
        set_intersection(
            a.begin(), a.end(),
            b.begin(), b.end(),
            back_inserter(c));
    
        cout << c << '\n';
    }
    
     
  4. CrazyGal

    CrazyGal New Member

    Yea, true. But could there be some way which doesn't use the standard calls?

    Thanks,
    Tina
     
  5. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Hi Tina,

    So, you want it without using STL calls.
    Now, I have quite a few things to discuss :

    (1) What exactly do you want your func to return ?? Please answer the example :
    Let StringA = "THIS IS COOL"
    And String B = "THIS IS NOT COOL"
    What do you want your func to return ??

    I think if you want to return "Intersection", you should return : "CHILOST" (only the common characters). But, you code (though has a small error) returns "THIS IS COOL".

    (2) The second thing I wanna say is, the error in your code :

    Look carefully :

    Code:
    char* intersection(char*a, char* b)
    {
        static char temp[20];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            exit(0);
        else
        {
            while ( a != NULL)
            {
                if(strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                    [B][COLOR="Red"]a++; // here is the error[/COLOR][/B]
                }
            }
        }
        return temp;
    }
    You see the error ? When a character in a is not found in b, the program goes into an infinite loop !! So, the correct code will be :


    Code:
    char* intersection(char*a, char* b)
    {
        static char temp[20];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            exit(0);
        else
        {
            while ( a != NULL)
            {
                if(strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                }
                a++;
            }
        }
        return temp;
    }
    (3) The third thing, I think I have a better code .. (probably O(m+n))

    Code:
    #include <cstring>
    #include <stdio.h>
    #include <ctime>
    
    char* Intersection_1(char* a, char* b)
    {
        static char temp[300];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            return NULL;
        else
        {
            while ( *a != '\0')
            {
                if (strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                }
                a++;
            }
        }
    
        return temp;
    }
    
    char* Intersection_2(char* StringA, char* StringB)
    {
        static char C[300];
        char* pC = C;
        static char Table[2][200];
        int EndOfA = strchr(StringA, '\0')-StringA;
        int EndOfB = strchr(StringB, '\0')-StringB;
    
        for (int i=0; i < EndOfA; ++i)   (*(*Table+StringA[i]))++;
        for (int i=0; i < EndOfB; ++i)   (*(*(Table+1)+StringB[i]))++;
    
        for (int i=32; i < 127; ++i)
        {
            if ( (*(*Table+i)) && (*(*(Table+1)+i)) )
                *(pC++) = i;
        }
        *pC = '\0';
    
        return C;
    }
    
    
    int main()
    {
        clock_t t1, t2;
    
        char A[200] = "THIS PROGRAM IS WRITTEN BY SASWAT PADHI. THIS IS THE FIRST TEST STRING. *&%$*%#*$%#(%^)(UFSUDEIRTGITuffifiiiugiu length = 125";
        char B[200] = "SGSYEDBSDJDISYD#**EFHHWSKDLJD_))@#**** RANDOM DATA IN THIS STRING **** YEOYRHWEJQPWEI[QIWEQJWDBKDFB,SDBLHSYWERYWOYR8423RH LENGTH = 134";
        char* C;
        float diff;
    
        t1 = clock();
        for (int i = 0; i < 100000; ++i)       C = Intersection_1(A,B);
        t2 = clock();
        diff = ((float)t2 - (float)t1) / 1000.0F;
        printf("\nC = \"%s\"\nAlgo 1 --> Running Time for 100000 iterations : %lf sec\n", C, diff);
    
    
        t1 = clock();
        for (int i = 0; i < 100000; ++i)       C = Intersection_2(A,B);
        t2 = clock();
        diff = ((float)t2 - (float)t1) / 1000.0F;
        printf("\n\nC = \"%s\"\nAlgo 2 --> Running Time for 100000 iterations : %lf sec\n", C, diff);
    
        return 0;
    }
    The program compares two LONG strings 1 lakh (10^5) times. The output :

    Code:
    C = "THIS PROGRAM IS WRITTEN BY SASWAT PADHI THIS IS THE FIRST TEST STRING **#*#
    )FSDEIRTGIT  = 12"
    Algo 1 ([B]Your Algo[/B])--> Running Time for 100000 iterations : 1.218000 sec
    
    
    C = ")*12=ABDFGHIMNOPRSTWY"
    Algo 2 ([B]My Algo[/B])--> Running Time for 100000 iterations : 0.344000 sec
     
    Last edited: Jul 24, 2009
    shabbir likes this.
  6. CrazyGal

    CrazyGal New Member

    Hi,

    Thanks for your response. Its a good one. I see you're trying to build a histogram. Yes, I need to print only the common characters. So, we need to have a huge array capable of holding all the ASCII characters. Doesn't it take more memory? Or it's something we need to sacrifice to getter better performance?

    Please let me know.

    Thanks for writing to me.

    Tina
     
  7. xpi0t0s

    xpi0t0s Mentor

    It's not that huge, ASCII only constitutes 128 characters, or 256 if you count all 8-bit possibilities of extended ASCII.

    There is often a space/performance tradeoff in optimising code. You can do stuff really quick and take up loads of memory, or you can do stuff in virtually no memory and take ages, and there are often points in between.

    So you can reduce the histogram from Saswat's 400 chars to just 32 bytes as follows:
    set all 32 bytes to zero
    scan the first string and for each character present, take the character code and split it in two: the lower 3 bits and the upper 5 bits.
    The upper 5 bits divided by 8 gives you the index into the 32-byte array.
    The lower 3 bits identify one of the 8 bits in that byte.
    Set the relevant bit (1 << that_3_bit_number) to 1.

    Scan the second string and again split the character code into two parts (5 and 3 bits).
    Check the relevant bit and if it's 1 then you have a duplicate character - output it.

    This prints duplicates, of course, so if string 1 is THIS IS COOL and string 2 is XPIOTOS then the program output will be IOTOS - the duplicate O's may be a problem. If not then this is your solution.

    If it is then you can have a second array of 32 bytes, and when you parse the 2nd string update the 2nd array in the same way as when you parsed the 1st string.
    To pick out updates scan both sets of 256 bits and where there is a 1 in both, you have a common character. Output then would be IOST (in some order).

    Or you could just have a single array of 256 ints, initially zero, scan the first string, set relevant flag to 1, scan the 2nd string and shift the relevant flag left by 1. Then scan the ints and where a flag is 2 you have a character that was in both strings.
     
  8. mayjune

    mayjune New Member

    @Crazygal
    how's it n^2? As theres only one while loop, no nested loops nothing... how can it be n^2 then?
    Please clarify....

    @saswat
    why is the iteration from 37 to 130? I didn't get it.... is it just for this particular string? or otherwise..?
     
  9. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    2:47 AM !?? Don't you sleep mayjune ?? :p

    CrazyGal is right, it's O(n^2), but it would be better to say it's O(n*m).
    You see the use of strchr(), which searches for a character inside string b. It performs 'm' operations : matches the character with each char of string b.
    As strchr() is executed in each iteration of the "while" loop, total number of operations is n*m.
    Hence, the time complexity is O(n*m). :)

    Oops, the loop should be from 32 to 127. (Edited my post)
    The idea is to scan the range of "printable" ASCII chars only ;)
     
  10. CrazyGal

    CrazyGal New Member

    Hi,

    Thanks for your response xpi0t0s. It seems little complicated for me to get what you're trying to say. I'll try to code your algorithm and figure out if I could understand the logic.

    Thanks again.

    Tina
     
  11. xpi0t0s

    xpi0t0s Mentor

    No, because O syntax doesn't work that way. O(n^2) is correct, because if all lengths are set to n and n increases, the time for the algorithm increases along the line n*n. So if n increases from 4 to 5 to 6, then the time increases according to a y=x^2 relationship from 16 to 25 to 36.

    The "histogram" algorithm is O(n) even though there are 2 parses and you might think it's O(2n), but again that's misunderstanding what the O syntax is designed for. As n increases, the time for the algorithm only increases linearly, e.g. if n increases by 5% then the overall time increases by 5%.

    Correct, and that's why it's O(n^2).

    See http://en.wikipedia.org/wiki/Big_O_notation for more information than you ever wanted to know about Big O notation.
     
  12. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    I don't agree.
    O(n^2) is applicable when there is one parameter(n) that affects the time complexity of your algo. But, when there are more than one variable say m,n,p .. you can't say O(mnp) = O(n^3). ( See the complexity of (m x n)*(n x p) matrix multiplication : Complexity of matrix algebra(Wikipedia) )

    Tina's algorithm has two parameters : m, n which represent lengths of A and B respectively. So, the time complexity should be O(mn). And, I know O(mn) algos exist. You can just google "O(mn) algorithm".

    O(mn) = O(n) if m is constant.
    O(mn) = O(n^2) if m == n and m is not an independent variable

    I mentioned in my post, that it's O(m+n).
    Now, the explanation above justifies that.

    I saw the link and found something related at : Multiple Variables.
    Please go through.

    PS: Please go through the attached PDF as well. It deals exclusively with multiple variables.
     

    Attached Files:

  13. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Still waiting for your reply, xpi0t0s.
     
  14. xpi0t0s

    xpi0t0s Mentor

    Sorry, didn't know one was expected.
    Yep, I was wrong, you were right, I bow to your superior intellect.
     
  15. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Now, that sounds sarcastic.
    You are the mentor here.

    It's not about intellect. I didn't develop some path breaking algo or something -- I just wrote what I had understood from the theories.

    I have just passed my 12th grade. I expected a reply from your side, because we (me, mayjune and many others here) consider you much more experienced in programming, than we people. I just wanted to be sure that what I have understood was not a misunderstanding. There are many things in C++ that I didn't know (or misunderstood) and you have clarified them beautifully.

    So, don't take that -vely. I didn't want to hear from you that I am right and you are wrong.
     
  16. xpi0t0s

    xpi0t0s Mentor

    OK fair enough. I was operating on the basis of guesswork, some basic knowledge of O(x) syntax and a quick look at the Wiki article, so I thought O(m*n) didn't exist and was equivalent to O(n^2). I shouldn't really try to talk authoritatively on a subject I don't really know.
     
  17. mayjune

    mayjune New Member

    its ok xpi0t0s, one mistake out of 10000000 is allowed, since i have never seen you making any mistakes before :p :D
     

Share This Page