What are Hash Tables?
Hash table is a data structure to store key value pairs for all situations. So it is like a table with each entry has a key and a value corresponding to it. So, it is an efficient way of mapping and accessing data.
An abstract illustration of hash table:
Code:
[key 1] -> Data 1 [key 2] -> Data 2 [key 3] -> Data 3
A common example of hash table could be - A directory storing the name of states with its std code which we are going to implement in this article.
How Hash Tables work?
When coming to how the hash tables work, we talk about three main following things:
- Hash Function - Hash Function is a function which leverages accessing and searching data in the hash table. It could be any algorithm customised and optimised as per a particular scenario. It should be a logic which takes our key as the input, and generally gives out the value corresponding to it in the hash table. However, with the same input key to the logic, output should be same every time.
- Key Value Pair - The Key-Value pair is the basic essence of the hash table data structure. For the suited situations, it makes searching really efficient and optimised.
- Collisions - Collision is when for one key, we have more than one data value to be stored. A hash table would never guarantee that for every key, there would be just one data value corresponding to it. This is because generally a hash table would have a fixed number of keys. Any new entry after the size of hash table has to be accommodated causing the collision. However, a good hash table would always try to avoid collisions and use efficient ways to handle collisions, and at the same time optimize the size of the table. This introduces the trade off.
Hash Tables Implementation in C
Here I have tried to implement a simple hash table in C. The scenario is: A telephone directory storing std codes for each state.
All the states would be used as key and its std code as its value. Here is the general depiction of our hash table:
Code:
[PB] -> 176 [JK] -> 172 [HP] -> 177 ...
To start with, here is our basic data structure of a fixed size i.e. an array
Code:
int code[TOTAL];
- Each state is a key
- Each key is stored as a two uppercase character string code.
- Compute a value corresponding to each character by subtracting value '65' from its ASCII value. In this article, this value for each character will be called as its character number.
- Now multiply the character number of the second character by 26 and add it to the character number of the first character. The formulae is:
Note, the value 26 is a magic number taken to reduce the collisions. However, the trade off is, it will increase the size of the hash table.Code:val = (state[1] *26 ) + state[0]
- Now this value is the index in our std code array which will store the std code for that particular state.
Here is how our hash function looks like:
Code:
int hashForKey(char* state)
{
int index;
if (strlen(state) == 0)
return 0;
//state string has to be just 2 chars
if (strlen(state) != 2)
{
printf("Invalid State string\n");
return -1;
}
index = (state[0] - ASCII_A) + ((state[1] - ASCII_A) * MAGICNUM) ;
if (index < 0)
{
printf("Error in hash function\n");
return -1;
}
return index;
}
Complete Code For Hash Tables
Here is the complete implementation program:
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TOTAL (26*26)
#define MAGICNUM 26
#define ASCII_A 65
//Global array for value
int code[TOTAL];
//Hash function for key
int hashForKey(char* state)
{
int index;
if (strlen(state) == 0)
return 0;
//state string has to be just 2 chars
if (strlen(state) != 2)
{
printf("Invalid State string\n");
return -1;
}
index = (state[0] - ASCII_A) + ((state[1] - ASCII_A) * MAGICNUM) ;
if (index < 0)
{
printf("Error in hash function\n");
return -1;
}
return index;
}
void createEntry(char* state, int codeNum)
{
int index = hashForKey(state);
if (index < 0)
{
printf("Error creating entry\n");
}
code[index] = codeNum;
}
void deleteEntry(char* state)
{
int index = hashForKey(state);
if (index < 0)
{
printf("Error deleting entry\n");
}
code[index] = 0;
}
void initHashTable ()
{
int i;
for (i = 0; i < TOTAL; i++)
code[i] = 0;
}
int getCode( char *state)
{
int index = hashForKey(state);
if (index < 0)
{
printf("Error getting state code\n");
}
return code[index];
}
int main()
{
char state[10][3] = {""};
int codeNum[10];
int i;
int codeObtained = 0;
for (i = 0; i < 10; i+=2)
{
printf ("Enter state (should be 2 chars in upper case)\n");
scanf("%s", state[i]);
printf ("Enter its code (integral - 3 numbers)\n");
scanf("%d", &codeNum[i]);
createEntry(state[i], codeNum[i]);
}
for (i = 9; i >=0; i--)
{
codeObtained = getCode(state[i]);
printf("%s \t %d\n", state[i], codeObtained);
}
return 0;
}
Code:
0
KA 800
0
WB 144
0
JK 167
0
HP 177
0
PB 176
- A better magic number, which can reduce the fixed size of hash table considerably.
- To handle collisions if any
- A better algorithm to have the states as key in a more optimised way.
Conclusion
For this article, I would say we do not have any conclusion, instead it is the beginning. A beginning to learn an entirely new data structure which has a really high scope, can be complicated and optimised at the same time for many scenarios.