Intersection of two strings

Discussion in 'C++' started by CrazyGal, Jul 22, 2009.

  1. CrazyGal

    CrazyGal New Member

    Joined:
    May 24, 2009
    Messages:
    21
    Likes Received:
    0
    Trophy Points:
    0
    Hi,

    Could someone tell me if there's a better way of doing this? I was asked this question in an interview and I wrote a piece of code of O(n2).

    Code:
    // This function returns a new string of the common alphabets found in a and b
    char* intersection(char*a, char* b)
    {
       static char temp[20];
       int i = 0;
       if ( (a==NULL) || (b==NULL))
            exit(0);
       else
        {
             while ( a != NULL)
             {
                   if(strchr(b, *a))
                   {
                        temp[i] = *a;
                        i++;
                        a++;
                   }
             }  
        }
    
        return temp;
    }
    
    Thanks,
    Tina
     
    Last edited by a moderator: Jul 22, 2009
  2. CrazyGal

    CrazyGal New Member

    Joined:
    May 24, 2009
    Messages:
    21
    Likes Received:
    0
    Trophy Points:
    0
    Resending the query with code tags.

    Intersection of two strings
    Hi,

    Could someone tell me if there's a better way of doing this? I was asked this question in an interview and I wrote a piece of code of O(n2).

    // This function returns a new string of the common alphabets found in a and b


    Thanks,
    Tina
     
  3. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    I think this will work too :
    Code:
    #include <algorithm>
    #include <iostream>
    #include <iterator>
    #include <set>
    #include <string>
    
    using namespace std;
    
    int main()
    {
        string a = "abcdk57";
        string b = "abcdek5";
        string c;
    
        set_intersection(
            a.begin(), a.end(),
            b.begin(), b.end(),
            back_inserter(c));
    
        cout << c << '\n';
    }
    
     
  4. CrazyGal

    CrazyGal New Member

    Joined:
    May 24, 2009
    Messages:
    21
    Likes Received:
    0
    Trophy Points:
    0
    Yea, true. But could there be some way which doesn't use the standard calls?

    Thanks,
    Tina
     
  5. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    Hi Tina,

    So, you want it without using STL calls.
    Now, I have quite a few things to discuss :

    (1) What exactly do you want your func to return ?? Please answer the example :
    Let StringA = "THIS IS COOL"
    And String B = "THIS IS NOT COOL"
    What do you want your func to return ??

    I think if you want to return "Intersection", you should return : "CHILOST" (only the common characters). But, you code (though has a small error) returns "THIS IS COOL".

    (2) The second thing I wanna say is, the error in your code :

    Look carefully :

    Code:
    char* intersection(char*a, char* b)
    {
        static char temp[20];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            exit(0);
        else
        {
            while ( a != NULL)
            {
                if(strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                    [B][COLOR="Red"]a++; // here is the error[/COLOR][/B]
                }
            }
        }
        return temp;
    }
    You see the error ? When a character in a is not found in b, the program goes into an infinite loop !! So, the correct code will be :


    Code:
    char* intersection(char*a, char* b)
    {
        static char temp[20];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            exit(0);
        else
        {
            while ( a != NULL)
            {
                if(strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                }
                a++;
            }
        }
        return temp;
    }
    (3) The third thing, I think I have a better code .. (probably O(m+n))

    Code:
    #include <cstring>
    #include <stdio.h>
    #include <ctime>
    
    char* Intersection_1(char* a, char* b)
    {
        static char temp[300];
        int i = 0;
        if ( (a==NULL) || (b==NULL))
            return NULL;
        else
        {
            while ( *a != '\0')
            {
                if (strchr(b, *a))
                {
                    temp[i] = *a;
                    i++;
                }
                a++;
            }
        }
    
        return temp;
    }
    
    char* Intersection_2(char* StringA, char* StringB)
    {
        static char C[300];
        char* pC = C;
        static char Table[2][200];
        int EndOfA = strchr(StringA, '\0')-StringA;
        int EndOfB = strchr(StringB, '\0')-StringB;
    
        for (int i=0; i < EndOfA; ++i)   (*(*Table+StringA[i]))++;
        for (int i=0; i < EndOfB; ++i)   (*(*(Table+1)+StringB[i]))++;
    
        for (int i=32; i < 127; ++i)
        {
            if ( (*(*Table+i)) && (*(*(Table+1)+i)) )
                *(pC++) = i;
        }
        *pC = '\0';
    
        return C;
    }
    
    
    int main()
    {
        clock_t t1, t2;
    
        char A[200] = "THIS PROGRAM IS WRITTEN BY SASWAT PADHI. THIS IS THE FIRST TEST STRING. *&%$*%#*$%#(%^)(UFSUDEIRTGITuffifiiiugiu length = 125";
        char B[200] = "SGSYEDBSDJDISYD#**EFHHWSKDLJD_))@#**** RANDOM DATA IN THIS STRING **** YEOYRHWEJQPWEI[QIWEQJWDBKDFB,SDBLHSYWERYWOYR8423RH LENGTH = 134";
        char* C;
        float diff;
    
        t1 = clock();
        for (int i = 0; i < 100000; ++i)       C = Intersection_1(A,B);
        t2 = clock();
        diff = ((float)t2 - (float)t1) / 1000.0F;
        printf("\nC = \"%s\"\nAlgo 1 --> Running Time for 100000 iterations : %lf sec\n", C, diff);
    
    
        t1 = clock();
        for (int i = 0; i < 100000; ++i)       C = Intersection_2(A,B);
        t2 = clock();
        diff = ((float)t2 - (float)t1) / 1000.0F;
        printf("\n\nC = \"%s\"\nAlgo 2 --> Running Time for 100000 iterations : %lf sec\n", C, diff);
    
        return 0;
    }
    The program compares two LONG strings 1 lakh (10^5) times. The output :

    Code:
    C = "THIS PROGRAM IS WRITTEN BY SASWAT PADHI THIS IS THE FIRST TEST STRING **#*#
    )FSDEIRTGIT  = 12"
    Algo 1 ([B]Your Algo[/B])--> Running Time for 100000 iterations : 1.218000 sec
    
    
    C = ")*12=ABDFGHIMNOPRSTWY"
    Algo 2 ([B]My Algo[/B])--> Running Time for 100000 iterations : 0.344000 sec
     
    Last edited: Jul 24, 2009
    shabbir likes this.
  6. CrazyGal

    CrazyGal New Member

    Joined:
    May 24, 2009
    Messages:
    21
    Likes Received:
    0
    Trophy Points:
    0
    Hi,

    Thanks for your response. Its a good one. I see you're trying to build a histogram. Yes, I need to print only the common characters. So, we need to have a huge array capable of holding all the ASCII characters. Doesn't it take more memory? Or it's something we need to sacrifice to getter better performance?

    Please let me know.

    Thanks for writing to me.

    Tina
     
  7. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    It's not that huge, ASCII only constitutes 128 characters, or 256 if you count all 8-bit possibilities of extended ASCII.

    There is often a space/performance tradeoff in optimising code. You can do stuff really quick and take up loads of memory, or you can do stuff in virtually no memory and take ages, and there are often points in between.

    So you can reduce the histogram from Saswat's 400 chars to just 32 bytes as follows:
    set all 32 bytes to zero
    scan the first string and for each character present, take the character code and split it in two: the lower 3 bits and the upper 5 bits.
    The upper 5 bits divided by 8 gives you the index into the 32-byte array.
    The lower 3 bits identify one of the 8 bits in that byte.
    Set the relevant bit (1 << that_3_bit_number) to 1.

    Scan the second string and again split the character code into two parts (5 and 3 bits).
    Check the relevant bit and if it's 1 then you have a duplicate character - output it.

    This prints duplicates, of course, so if string 1 is THIS IS COOL and string 2 is XPIOTOS then the program output will be IOTOS - the duplicate O's may be a problem. If not then this is your solution.

    If it is then you can have a second array of 32 bytes, and when you parse the 2nd string update the 2nd array in the same way as when you parsed the 1st string.
    To pick out updates scan both sets of 256 bits and where there is a 1 in both, you have a common character. Output then would be IOST (in some order).

    Or you could just have a single array of 256 ints, initially zero, scan the first string, set relevant flag to 1, scan the 2nd string and shift the relevant flag left by 1. Then scan the ints and where a flag is 2 you have a character that was in both strings.
     
  8. mayjune

    mayjune New Member

    Joined:
    Jun 14, 2009
    Messages:
    814
    Likes Received:
    33
    Trophy Points:
    0
    Occupation:
    Student
    Location:
    Pune,Delhi
    @Crazygal
    how's it n^2? As theres only one while loop, no nested loops nothing... how can it be n^2 then?
    Please clarify....

    @saswat
    why is the iteration from 37 to 130? I didn't get it.... is it just for this particular string? or otherwise..?
     
  9. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    2:47 AM !?? Don't you sleep mayjune ?? :p

    CrazyGal is right, it's O(n^2), but it would be better to say it's O(n*m).
    You see the use of strchr(), which searches for a character inside string b. It performs 'm' operations : matches the character with each char of string b.
    As strchr() is executed in each iteration of the "while" loop, total number of operations is n*m.
    Hence, the time complexity is O(n*m). :)

    Oops, the loop should be from 32 to 127. (Edited my post)
    The idea is to scan the range of "printable" ASCII chars only ;)
     
  10. CrazyGal

    CrazyGal New Member

    Joined:
    May 24, 2009
    Messages:
    21
    Likes Received:
    0
    Trophy Points:
    0
    Hi,

    Thanks for your response xpi0t0s. It seems little complicated for me to get what you're trying to say. I'll try to code your algorithm and figure out if I could understand the logic.

    Thanks again.

    Tina
     
  11. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    No, because O syntax doesn't work that way. O(n^2) is correct, because if all lengths are set to n and n increases, the time for the algorithm increases along the line n*n. So if n increases from 4 to 5 to 6, then the time increases according to a y=x^2 relationship from 16 to 25 to 36.

    The "histogram" algorithm is O(n) even though there are 2 parses and you might think it's O(2n), but again that's misunderstanding what the O syntax is designed for. As n increases, the time for the algorithm only increases linearly, e.g. if n increases by 5% then the overall time increases by 5%.

    Correct, and that's why it's O(n^2).

    See http://en.wikipedia.org/wiki/Big_O_notation for more information than you ever wanted to know about Big O notation.
     
  12. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    I don't agree.
    O(n^2) is applicable when there is one parameter(n) that affects the time complexity of your algo. But, when there are more than one variable say m,n,p .. you can't say O(mnp) = O(n^3). ( See the complexity of (m x n)*(n x p) matrix multiplication : Complexity of matrix algebra(Wikipedia) )

    Tina's algorithm has two parameters : m, n which represent lengths of A and B respectively. So, the time complexity should be O(mn). And, I know O(mn) algos exist. You can just google "O(mn) algorithm".

    O(mn) = O(n) if m is constant.
    O(mn) = O(n^2) if m == n and m is not an independent variable

    I mentioned in my post, that it's O(m+n).
    Now, the explanation above justifies that.

    I saw the link and found something related at : Multiple Variables.
    Please go through.

    PS: Please go through the attached PDF as well. It deals exclusively with multiple variables.
     

    Attached Files:

  13. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    Still waiting for your reply, xpi0t0s.
     
  14. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    Sorry, didn't know one was expected.
    Yep, I was wrong, you were right, I bow to your superior intellect.
     
  15. SaswatPadhi

    SaswatPadhi ~ Б0ЯИ Τ0 С0δЭ ~

    Joined:
    May 5, 2009
    Messages:
    1,342
    Likes Received:
    55
    Trophy Points:
    0
    Occupation:
    STUDENT !
    Location:
    Orissa, INDIA
    Home Page:
    http://www.crackingforfun.blogspot.com
    Now, that sounds sarcastic.
    You are the mentor here.

    It's not about intellect. I didn't develop some path breaking algo or something -- I just wrote what I had understood from the theories.

    I have just passed my 12th grade. I expected a reply from your side, because we (me, mayjune and many others here) consider you much more experienced in programming, than we people. I just wanted to be sure that what I have understood was not a misunderstanding. There are many things in C++ that I didn't know (or misunderstood) and you have clarified them beautifully.

    So, don't take that -vely. I didn't want to hear from you that I am right and you are wrong.
     
  16. xpi0t0s

    xpi0t0s Mentor

    Joined:
    Aug 6, 2004
    Messages:
    3,009
    Likes Received:
    203
    Trophy Points:
    63
    Occupation:
    Senior Support Engineer
    Location:
    England
    OK fair enough. I was operating on the basis of guesswork, some basic knowledge of O(x) syntax and a quick look at the Wiki article, so I thought O(m*n) didn't exist and was equivalent to O(n^2). I shouldn't really try to talk authoritatively on a subject I don't really know.
     
  17. mayjune

    mayjune New Member

    Joined:
    Jun 14, 2009
    Messages:
    814
    Likes Received:
    33
    Trophy Points:
    0
    Occupation:
    Student
    Location:
    Pune,Delhi
    its ok xpi0t0s, one mistake out of 10000000 is allowed, since i have never seen you making any mistakes before :p :D
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice