Hash Tables Tutorial for Complete Beginners

Discussion in 'C' started by Trinity, Sep 2, 2012.

  1. Trinity

    Trinity New Member

    Joined:
    Nov 23, 2011
    Messages:
    28
    Likes Received:
    11
    Trophy Points:
    3
    Occupation:
    Software Engineer

    What are Hash Tables?



    Hash table is a data structure to store key value pairs for all situations. So it is like a table with each entry has a key and a value corresponding to it. So, it is an efficient way of mapping and accessing data.

    An abstract illustration of hash table:
    Code:
    [key 1] -> Data 1
    [key 2] -> Data 2
    [key 3] -> Data 3
    
    The key can be of any type, i.e. integer, float, char, string, etc.
    A common example of hash table could be - A directory storing the name of states with its std code which we are going to implement in this article.

    How Hash Tables work?



    When coming to how the hash tables work, we talk about three main following things:

    • Hash Function - Hash Function is a function which leverages accessing and searching data in the hash table. It could be any algorithm customised and optimised as per a particular scenario. It should be a logic which takes our key as the input, and generally gives out the value corresponding to it in the hash table. However, with the same input key to the logic, output should be same every time.
    • Key Value Pair - The Key-Value pair is the basic essence of the hash table data structure. For the suited situations, it makes searching really efficient and optimised.
    • Collisions - Collision is when for one key, we have more than one data value to be stored. A hash table would never guarantee that for every key, there would be just one data value corresponding to it. This is because generally a hash table would have a fixed number of keys. Any new entry after the size of hash table has to be accommodated causing the collision. However, a good hash table would always try to avoid collisions and use efficient ways to handle collisions, and at the same time optimize the size of the table. This introduces the trade off.
    This surface outs another important feature of a good hash table which is uniform distribution.

    Hash Tables Implementation in C



    Here I have tried to implement a simple hash table in C. The scenario is: A telephone directory storing std codes for each state.

    All the states would be used as key and its std code as its value. Here is the general depiction of our hash table:
    Code:
    [PB] -> 176
    [JK] -> 172
    [HP] -> 177
    ...
    
    To think of, basic implementation data structure taken is an array of std codes. For each state, a two character code is defined. For each state two character code, we use a hash function to compute the index, which can be used to determine/store the std code in the array.

    To start with, here is our basic data structure of a fixed size i.e. an array
    Code:
    int code[TOTAL];
    
    We need a hash function for computing index out of the state. The algorithm used is as follows:

    • Each state is a key
    • Each key is stored as a two uppercase character string code.
    • Compute a value corresponding to each character by subtracting value '65' from its ASCII value. In this article, this value for each character will be called as its character number.
    • Now multiply the character number of the second character by 26 and add it to the character number of the first character. The formulae is:
      Code:
      val = (state[1] *26 ) + state[0]
      
      Note, the value 26 is a magic number taken to reduce the collisions. However, the trade off is, it will increase the size of the hash table.
    • Now this value is the index in our std code array which will store the std code for that particular state.

    Here is how our hash function looks like:
    Code:
    int hashForKey(char* state)
    {
        int index;
    
        if (strlen(state) == 0)
            return 0;
    
        //state string has to be just 2 chars
        if (strlen(state) != 2)
        {
            printf("Invalid State string\n");
            return -1;
        }
        index = (state[0] - ASCII_A) + ((state[1] - ASCII_A) * MAGICNUM) ;
        if (index < 0)
        {
            printf("Error in hash function\n");
            return -1;
        }
        return index;
    }
    
    Now retrieving/storing can happen with the help of this hash function. The max entries this hash table can store is 26*26, a limit, defined and possible with our English Character Set.

    Complete Code For Hash Tables



    Here is the complete implementation program:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define TOTAL (26*26)
    #define MAGICNUM 26
    #define ASCII_A 65
    
    //Global array for value
    int code[TOTAL];
    
    //Hash function for key
    int hashForKey(char* state)
    {
        int index;
    
        if (strlen(state) == 0)
            return 0;
    
        //state string has to be just 2 chars
        if (strlen(state) != 2)
        {
            printf("Invalid State string\n");
            return -1;
        }
        index = (state[0] - ASCII_A) + ((state[1] - ASCII_A) * MAGICNUM) ;
        if (index < 0)
        {
            printf("Error in hash function\n");
            return -1;
        }
        return index;
    }
    
    void createEntry(char* state, int codeNum)
    {
        int index = hashForKey(state);
        if (index < 0)
        {
            printf("Error creating entry\n");
        }
    
        code[index] = codeNum;
    }
    
    void deleteEntry(char* state)
    {
        int index = hashForKey(state);
        if (index < 0)
        {
            printf("Error deleting entry\n");
        }
        code[index] = 0;
    }
    
    void initHashTable ()
    {
        int i;
        for (i = 0; i < TOTAL; i++)
            code[i] = 0;
    }
    
    int getCode( char *state)
    {
        int index = hashForKey(state);
        if (index < 0)
        {
            printf("Error getting state code\n");
        }
    
        return code[index];
    }
    
    int main()
    {
        char state[10][3] = {""};
        int codeNum[10];
        int i;
        int codeObtained = 0;
    
        for (i = 0; i < 10; i+=2)
        {
            printf ("Enter state (should be 2 chars in upper case)\n");
            scanf("%s", state[i]);
    
            printf ("Enter its code (integral - 3 numbers)\n");
            scanf("%d", &codeNum[i]);
    
            createEntry(state[i], codeNum[i]);
        }
    
        for (i = 9; i >=0; i--)
        {
            codeObtained = getCode(state[i]);
            printf("%s \t %d\n", state[i], codeObtained);
        }
    
        return 0;
    }
    
    Output of program:
    Code:
          0
    KA      800
          0
    WB      144
          0
    JK      167
          0
    HP      177
          0
    PB      176
    
    
    However, this program is just a simple implementation of a hash table to give a general idea to a beginner. There is a huge scope of improvement and optimisation. Once, you understand it, a few limitations in context of hash table, which can be worked upon are:
    • A better magic number, which can reduce the fixed size of hash table considerably.
    • To handle collisions if any
    • A better algorithm to have the states as key in a more optimised way.

    Conclusion



    For this article, I would say we do not have any conclusion, instead it is the beginning. A beginning to learn an entirely new data structure which has a really high scope, can be complicated and optimised at the same time for many scenarios.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice