getopt Example - How to Access & Parse Command Line Arguments

Discussion in 'C' started by poornaMoksha, Oct 10, 2011.

  1. poornaMoksha

    poornaMoksha New Member

    Joined:
    Jan 29, 2011
    Messages:
    150
    Likes Received:
    33
    Trophy Points:
    0
    Occupation:
    Software developer
    Location:
    India
    A lot many times you would have used a Linux command with some arguments to it.

    Like :

    Code:
    cp -r /home/user/Desktop/abc /home/abc
    Code:
    rm -rf abc
    In the above examples, 'cp' or 'rm' are the name of binaries (written in 'C') while the rest of the stuff is the command line arguments to the respective 'C' program.

    Ever wondered how these command line arguments are handled in the code ? Lets discuss it here...


    Accessing command line arguments



    You would have seen the signature of main() function being used as :

    Code:
    int main(int argc, char *argv[])
    OR
    int main(int argc, char **argv)
    
    Well, the first argument in the above shown declaration is the number of command line arguments (including the name of the binary). While the second argument is the array of pointers containing the base addresses of the strings of arguments.

    So, we can see that if we know how to parse the array of pointers 'argv', then we can easily access all the command line arguments. Lets understand this concept through an example. Here is a piece of code explaining this :

    Code:
    #include<stdio.h>
    
    int main(int argc, char *argv[])
    {
        int n = 0;
    
        while(n < argc)
        {
            printf("\n Argument number [%d] is [%s]\n", n, argv[n]);
            n++;
        }
    
        return 0;
    }
    In the above piece of code, we iterate over each argument and print it. We keep on iterating until all the arguments are printed.

    Here is the output :

    Code:
    $ ./cmdline 
    
     Argument number [0] is [./cmdline]
    We see that since there was only one command line argument './cmdline', so our program printed only one argument. This shows that the name of the binary is included in the list of command line arguments to main().

    Lets give some more arguments :

    Code:
    $ ./cmdline testarg1 testarg2 testarg3
    
     Argument number [0] is [./cmdline]
    
     Argument number [1] is [testarg1]
    
     Argument number [2] is [testarg2]
    
     Argument number [3] is [testarg3]
    Here we see that we gave 3 more arguments in addition to the name of the binary and our program was able to print all the arguments.

    Now, let us try once more and give some more familiar type of arguments(like we do give with some Linux commands) :

    Code:
    $ ./cmdline -a abc -r -d pqr
    
     Argument number [0] is [./cmdline]
    
     Argument number [1] is [-a]
    
     Argument number [2] is [abc]
    
     Argument number [3] is [-r]
    
     Argument number [4] is [-d]
    
     Argument number [5] is [pqr]
    The output was as expected!!!!

    So, now we get a basic Idea about how to access command line arguments inside the code.

    Parsing the command line arguments



    Since now you understand how to access the command line arguments, can you think over the parsing mechanism ?

    Lets try the most basic one. Suppose we want to write a program which expects 3 arguments in the form :

    <name of the binary> <operation> <val1> <val2>

    <operation> : could be any one of 'add', 'subtract', 'multiply', 'divide'
    <val1> : First numeric value
    <val2> : Second numeric value

    Here is the logic :

    Code:
    #include<stdio.h>
    #include<string.h>
    #include<stdbool.h>
    #include<stdlib.h>
    
    int main(int argc, char *argv[])
    {
        int n = 1; // n =1, since we do not want to iterate over the name of binary.
        bool add=false, subtract=false, multiply=false, divide=false;
        int val1 = 0, val2 = 0;
    
        if(argc != 4)
        {
            printf("\n Usage : <binary-name> <operation> <val1> <val2>\n");
            return 1;
        }
    
            if(!strncmp("add",argv[n], sizeof("add")))
            {
                add = true;
            }                                
            else if(!strncmp("substract",argv[n], sizeof("subtract")))
            {
                subtract = true;
            }                                
            else if(!strncmp("divide",argv[n], sizeof("divide")))
            {
                divide = true;
            }                                
            else if(!strncmp("multiply",argv[n],sizeof("multiply")))
            {
                multiply = true;
            }  
            else
            {
                printf("\n Wrong option \n");
                return 1;
            }       
    
        n++;
    
        val1 = atoi(argv[n]); // atoi is used to convert the command line arg from string to integer
        n++; 
        val2 = atoi(argv[n]);
    
        if(add)
        {
            printf("\n The request operation was to add and the result is [%d]\n", val1+val2);
            return 0;
        }                      
        if(subtract)
        {
            printf("\n The request operation was to subtract and the result is [%d]\n", val1-val2);
            return 0;
        }
        if(multiply)
        {
            printf("\n The request operation was to multiply and the result is [%d]\n", val1*val2);
            return 0;
        }
        if(divide)
        {
            if(val2 == 0)
            {
                printf("\n The request operation was to divide, but ignoring the request as val2 is zero\n");
                return 1;
            }
            printf("\n The request operation was to divide and the result is [%d]\n", val1/val2);
            return 0;
        }
    
        return 0;
    }
    In the above piece of code :
    • First the logic make sure that there are sufficient number of arguments, else we return error.
    • Next the logic fetches the second command line argument (first being the name of the binary) to know the operation intended by the user.
    • Then it fetches the value arguments
    • Now, based on the operation intended, the logic carries out the operation and prints the result.
    I tried all the operations and here is the output :

    Code:
    $ ./cmdline add 2 3
    
     The request operation was to add and the result is [5]
    $ ./cmdline divide 2 3
    
     The request operation was to divide and the result is [0]
    $ ./cmdline multiply 2 3
    
     The request operation was to multiply and the result is [6]
    $ ./cmdline subtract 2 3
    
     The request operation was to subtract and the result is [-1]

    A practical problem



    Parsing command line arguments seems good until now. But what if user gave the command line argument in the following way :

    Code:
     ./cmdline 20 10 add 
    Lets run the command and see the output, here is the output I got :

    Code:
    $ ./cmdline 20 10 add
    
     Wrong option
    So, the program gave an error of 'wrong option' or rather the logic wanted to say that it did not find a valid operation as argument with index '1'. As the argument with index '1' here is 20 which is val1.

    People may argue that the program behaved correctly as operation name should be given as first argument after binary name but what if I say that I want to give my users a flexibility where in they can write the three arguments (after binary name) in any order they want???

    Any Ideas, about the question that I brought up?

    Solution 1 - My Logic



    Well, I know going with the current logic it becomes difficult.

    One idea strikes in mind that how about making the binary run in the following way :

    <binary-name> -o <operation> -v1<value1> -v2 <value2>

    The above way sounds good and yes, this is the way we have been using standard Linux commands. Lets tweak our code to make it compatible with above kind of command line arguments. Here is the logic :

    Code:
    #include<stdio.h>
    #include<string.h>
    #include<stdbool.h>
    #include<stdlib.h>
    
    int main(int argc, char *argv[])
    {
        int n = 1; // n =1, since we do not want to iterate over the name of binary.
        bool add=false, subtract=false, multiply=false, divide=false;
        int val1 = 0, val2 = 0;
    
        if(argc != 7)
        {
            printf("\n Usage : <binary-name> -o <operation> -v1 <val1> -v2 <val2>\n");
            return 1;
        }
    
        while( n < argc)
        {
        if(!strncmp("-o",argv[n],sizeof("-o")))
        {
            n++;
            if(!strncmp("add",argv[n], sizeof("add")))
            {
                add = true;
            }                                
            else if(!strncmp("subtract",argv[n], sizeof("subtract")))
            {
                subtract = true;
            }                                
            else if(!strncmp("divide",argv[n], sizeof("divide")))
            {
                divide = true;
            }                                
            else if(!strncmp("multiply",argv[n],sizeof("multiply")))
            {
                multiply = true;
            }  
            else
            {
                printf("\n Wrong option \n");
                return 1;
            }
            n++;
        }
        else if(!strncmp("-v1",argv[n],sizeof("-v1")))
        {
            n++;
            val1 = atoi(argv[n]);
            n++;
        }
        else if(!strncmp("-v2",argv[n],sizeof("-v2")))
        {
            n++;
            val2 = atoi(argv[n]);
            n++;
        }
        }     
    
        if(add)
        {
            printf("\n The request operation was to add and the result is [%d]\n", val1+val2);
            return 0;
        }                      
        if(subtract)
        {
            printf("\n The request operation was to subtract and the result is [%d]\n", val1-val2);
            return 0;
        }
        if(multiply)
        {
            printf("\n The request operation was to multiply and the result is [%d]\n", val1*val2);
            return 0;
        }
        if(divide)
        {
            if(val2 == 0)
            {
                printf("\n The request operation was to divide, but ignoring the request as val2 is zero\n");
                return 1;
            }
            printf("\n The request operation was to divide and the result is [%d]\n", val1/val2);
            return 0;
        }
    
        return 0;
    }
    In the above logic :
    1. I tweaked the code so that it now accepts arguments in the standard way.
    2. Through this logic now I have given the flexibility to users for specifying arguments in any order.
    3. Now, I tried to run the above code with different orders in which arguments can be supplied by the user.
    Here is the output :

    Code:
    $ ./cmdline 
    
     Usage : <binary-name> -o <operation> -v1 <val1> -v2 <val2> 
    $ ./cmdline -v1 20 -o add -v2 10
    
     The request operation was to add and the result is [30]
    $ ./cmdline -v1 20 -o multiply -v2 10
    
     The request operation was to multiply and the result is [200]
    $ ./cmdline -v1 20 -v2 10 -o multiply
    
     The request operation was to multiply and the result is [200]
    $ ./cmdline -v1 20 -v2 10 -o divide
    
     The request operation was to divide and the result is [2]
    $ ./cmdline -v2 10 -o subtract -v1 20
    
     The request operation was to subtract and the result is [10]

    Solution 2 - Using getopt() function



    Well, to achieve what solution-1 above did achieve. The 'C' library provides a built-in function getopt().

    This function is defined in unistd.h and the signature of this function is as follows :

    Code:
    int getopt( int argc, char *const argv[], const char *optstring );
    The first two arguments are same as the arguments that main() function receives while the third argument is a string which is cooked in a special way so that getopt() function can understand which argument is a token (like '-o') and which is not(some arguments are not like '-o <value associated>', they are just like '-a').

    The token which expects some value is followed by a ':' in the optsring.
    So, an optsring "a:b:c" would signify that -a would expect some value, -b would expect some value while -c would not expect some value.

    When this function is called each time, it returns the next argument and sets some global variables.

    * optarg -- A pointer to the current option argument, if there is one.
    * optind -- An index of the next argv pointer to process when getopt() is called again.
    * optopt -- This is the last known option.

    In our case, since all the three tokens expect some values, so we have kept the optstring to be : "o:x:y:"

    Lets look at the code now :

    Code:
    #include<stdio.h>
    #include<string.h>
    #include<stdbool.h>
    #include<stdlib.h>
    #include<unistd.h>
    
    int main(int argc, char *argv[])
    {
    //    int n = 1; // n =1, since we do not want to iterate over the name of binary.
        bool add=false, subtract=false, multiply=false, divide=false;
        int val1 = 0, val2 = 0;
    
        if(argc != 7)
        {
            printf("\n Usage : <binary-name> -o <operation> -x <val1> -y <val2>\n");
            return 1;
        }
    
        int opt = 0;
        opt = getopt( argc, argv, "o:x:y:");
        while( opt != -1 ) 
        {
           //printf("\n opt = [%d]\n",opt);
             //printf("\n optind = [%d]\n",optind);
            //printf("\n Inside while loop\n");
            switch( opt ) 
            {
                case 'o':
                    //printf("\n 'o' detected \n");
                    if(!strncmp("add",optarg, sizeof("add")))
            {
                add = true;
            }                                
            else if(!strncmp("subtract",optarg, sizeof("subtract")))
            {
                substract = true;
            }                                
            else if(!strncmp("divide",optarg, sizeof("divide")))
            {
                divide = true;
            }                                
            else if(!strncmp("multiply",optarg,sizeof("multiply")))
            {
                multiply = true;
            }  
            else
            {
                //printf("\n Wrong option \n");
                return 1;
            }
                    break;
                case 'x':
                    //printf("\n 'x' detected \n");
                    val1 = atoi(optarg);
                    //printf("\n val1 = [%d]\n",val1);
                    break;
                case 'y':
                    //printf("\n 'y' detected \n");
                    val2 = atoi(optarg);
                    //printf("\n val2 = [%d]\n",val2);
                    break;
                case '?':
                    //printf("\n '?' detected \n");
                    //printf("\n Usage : <binary-name> -o <operation> -x <val1> -y <val2>\n");
                    return 1;
    
                default :
                     //printf("\n Wrong argument passed \n"); //shouldn't ideally get here
                     return 1;
           }  
           opt = getopt( argc, argv, "o:x:y:");
           //printf("\n opt = [%d]\n",opt);
           //printf("\n A loop for while complete\n");
        }                  
        
    
        if(add)
        {
            printf("\n The request operation was to add and the result is [%d]\n", val1+val2);
            return 0;
        }                      
        if(subtract)
        {
            printf("\n The request operation was to subtract and the result is [%d]\n", val1-val2);
            return 0;
        }
        if(multiply)
        {
            printf("\n The request operation was to multiply and the result is [%d]\n", val1*val2);
            return 0;
        }
        if(divide)
        {
            if(val2 == 0)
            {
                printf("\n The request operation was to divide, but ignoring the request as val2 is zero\n");
                return 1;
            }
            printf("\n The request operation was to divide and the result is [%d]\n", val1/val2);
            return 0;
        }
    
        return 0;
    }
    In the above code :
    • We used getopt() function instead of our code.
    • This function does exactly what we intended.
    • Each time we call this function, it returns the token (like 'x') and sets the value of 'optarg' to the value of token
    • So, this way, this function makes our work easy.
    Lets look at the output :

    Code:
    $ ./cmdline -o add -x 20 -y 10
    
     The request operation was to add and the result is [30]
    $ ./cmdline -o divide -x 20 -y 10
    
     The request operation was to divide and the result is [2]
    $ ./cmdline  -x 20 -y 10 -o add
    
     The request operation was to add and the result is [30]

    Conclusion



    To Conclude, this article explained how command line arguments are accessed in code and how getopt() function helps us to parse them easily. This getopt() function is used in most of the command line utilities that we use in our day today Linux work on terminal.

    Stay tuned for more!!!!
     
  2. Alex.Gabriel

    Alex.Gabriel New Member

    Joined:
    Oct 23, 2011
    Messages:
    86
    Likes Received:
    7
    Trophy Points:
    0
    Occupation:
    Linux system administrator
    Location:
    Italy
    Home Page:
    http://blog.evilcoder.net
    Good article. verry good
     
  3. poornaMoksha

    poornaMoksha New Member

    Joined:
    Jan 29, 2011
    Messages:
    150
    Likes Received:
    33
    Trophy Points:
    0
    Occupation:
    Software developer
    Location:
    India
    Thanks!!
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice