Interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD)

Discussion in 'C' started by Karpov2007, Apr 7, 2009.

  1. Karpov2007

    Karpov2007 Banned

    Joined:
    Oct 14, 2007
    Messages:
    66
    Likes Received:
    2
    Trophy Points:
    0
    Occupation:
    scientific adviser of OOO "Program Verification Sy
    Location:
    Russia, Tula
    Home Page:
    http://www.viva64.com/
    Abstract
    Introduction
    The interview text
    References


    Abstract



    This is an interview with Dmitriy Vyukov - the author of Relacy Race Detector (RRD) tool intended for verifying parallel applications. In this article you will learn about the history of creating RRD, its basic abilities and also about some other similar tools and the way they differ from RRD.

    Introduction



    We draw your attention to the interview with the author of Relacy Race Detector (RRD) verifier for testing multi-thread algorithms. Prospects of using RRD and other tools for testing parallel applications and related topics are discussed in the article.

    The questions are asked by (the questions are in bold):

    Andrey Nikolaevich Karpov. One of the founders of "Program Verification Systems" company, is engaged in developing static code analysis tools. Participates in developing Viva64 and VivaMP tools for testing 64-bit and parallel applications. Supports the open library VivaCore intended for parsing C/C++ code.

    The questions are answered by:

    Dmitriy Sergeevich Vyukov. A developer of high-performance C/C++ software in the sphere of client/server systems and network servers. In his spare time develops innovative synchronization algorithms, programming models for multi-core processors and systems of multi-thread code verification. The author of Relacy Race Detector (RRD) tool.

    The interview text



    Hello, Dmitriy. Please, tell us some words about yourself. In what sphere are you working and in what projects are you participating?

    To the best of my ability I'm involved in everything relating to multi-threading and parallelism: scalable synchronization algorithms, programming models for multi-core processors, multi-thread code verification and so on. I publish my developments concerning synchronization algorithms in the group Scalable Synchronization Algorithms (http://www.viva64.com/go.php?url=195). I have also developed and now support the tool for verifying multi-thread code Relacy Race Detector (RRD) (http://www.viva64.com/go.php?url=194 ).

    What encouraged you to create Relacy Race Detector verifier?

    RRD appeared rather spontaneously. There had been three preconditions for its creation.

    The first one - I develop synchronization algorithms and testing, and error localization in them is a very serious problem: errors occur very seldom or don't occur at all on some computers (for example, on computers with less than 4 processors or on computers with a certain OS version). But if an error does occur regularly, it is often very hard to understand its cause (i.e. at what moment and what exactly goes wrong). This led to the idea that it would be good to have some "tools" for solving the problem.

    The second precondition - during the time of dealing with synchronization algorithms some set of methods has been collected which I used for testing and locating errors. One of the main methods is inserting a large number of lines like those shown below into the program code:

    Code:
    if ((rand() % 1000) == 0) Sleep (rand() % 10);
    and further stress-testing of the program. This method allows you to execute far more various interleavings of threads. This is actually the basic principle of RRD's operation.

    The third precondition appeared when I finally understood how I could assemble all my methods into an automatic testing tool, how I could perform the necessary tooling of a program in a simple way and how I could provide high effectiveness of the tool. The rest was easy - the first operational prototype (which really found one specially introduced error) had been ready by the night. Although, of course, improving RRD up to more or less acceptable tool took much more time.

    Please, tell us about RRD more detailed. On what principles and algorithms is it based? In what spheres can it be used most effectively?

    RRD is a tool of dynamic verification without storing states. It is intended, first of all, for testing multi-thread algorithms (synchronization algorithms, multi-thread data structures and so on). For a user operation with RRD looks like this: in the beginning the algorithm being tested is implemented. Implementation can be expressed through synchronization primitives C++09, POSIX threads (pthread), Win32 API, C#/.NET, Java. But you should use the listed API not directly but with "wrappings" provided by RRD; syntax is nearly the same but there are some differences. When the tested algorithm is implemented you need to implement one or several unit tests for the algorithm. After that you can launch them for execution and RRD will see to effective execution of the tests, that is, as many different interleavings of threads will be checked as possible. During execution of each interleaving RRD will perform a lot of different checks of the algorithm's correctness, including both user's asserts and invariants, and basic embedded checks - data races, addresses to the released memory, double memory releases, memory leaks, deadlocks, livelocks, incorrect use of API (for example, recursive capture of a non-recursive mutex) and so on. When detecting an error RRD shows a detailed history of execution which has led to the error. Possessing such a history you can easily locate the error (the history contains such details as deviation from the sequentially consistent order, instances of ABA problems, false awakenings at condition variables etc).

    A lot of embedded checks and thoroughness with which RRD carry them out, allow you in most cases to avoid performing any user's checks in the code at all. For example, if we're testing the reader-writer mutex it is enough just to create several threads which will capture the mutex for rewriting and change the same variable. If the mutex' algorithm doesn't provide mutual exception the race at the protected variable will be automatically detected; if the algorithm is subject to a deadlock or a livelock, RRD will find this out automatically as well. But if we're testing a queue of producer-consumer type and the queue must provide FIFO order of messages, we'll have to program this check manually.

    Now some words about the inner structure of RRD and about the algorithms used in it. RRD tools all the addresses to variables, synchronization primitives and API calls. This allows you to introduce all the necessary checks into them and also to fully control the thread switch. RRD contains 3 thread schedulers (you choose the scheduler when launching a test).

    The simplest scheduler is a so called random scheduler. After each primary action performed by a program (address to a variable, a synchronization primitive or an API-call) the scheduler chooses a thread at random and switches control to it. This scheduler is good for preliminary testing of the algorithm as it doesn't provide full check but works very quickly.

    The second scheduler performs full search of possible interleavings of threads (full search scheduler) but its disadvantage is a very long process of verification. It can be used in practice only for small tests.

    The last - the third - scheduler is the most interesting and useful - this is a so called context bound scheduler. It performs systematic search of interleavings of threads but checks only those interleavings in which the general number of voluntary switches doesn't exceed some defined number. Because of this it provides a good compromise between the check's quality and operating time. I should also mention that all the schedulers are fair - this allows you to test formally non-terminating algorithms, i.e. algorithms containing loops which can repeat potentially infinitely.

    On what conditions is RRD distributed?

    RRD can be used free for non-commercial development with open source codes, for educational purposes, for academic developments with non-patent results and also for personal non-commercial use. For all the rest scopes of use RRD must be paid for. Although there can be private cases; for example, I participated in some preliminary negotiations concerning providing special licenses for development of Linux kernel (there are some tricky points concerning patent algorithms and commercialization), and also for development of Intel Threading Building Blocks (which is distributed under a double license, one of which is a commercial one).

    Can you advise some additional resources relating to RRD? Where can one download RRD?

    The main resource devoting to RRD is situated here:
    http://www.viva64.com/go.php?url=194

    You can download the latest version of the library there, find some materials on RRD and ask questions as well. RRD distribution kit includes some examples which can help master RRD.

    Perhaps you are familiar with many other verifiers of parallel applications. Doesn't really any of them implement diagnostics which RRD offers? In what way are they different from RRD?

    Of course, before creating RRD I studied many tools for verification (Intel Thread Checker, Chord, Zing, Spin, RacerX, CheckFence, Sober, Coverity Thread Analyzer, CHESS, KISS, PreFast, Prefix, FxCop) hoping to find what I needed for my purposes. But most tools are intended for, so to say, developers of end applications and not for developers of synchronization algorithms and parallelism support libraries. None of the tools provided such a level of refinement and accuracy of relaxed memory order [*] I needed. Figuratively, if the mentioned tools can verify a program which uses OpenMP, RRD can verify the implementation of OpenMP itself.

    [*] Note. Relaxed Memory Order, RMO is a method of working with memory when the processor uses all the means of cashing and dynamic reordering of directions and doesn't try to provide any requirements to access order and saving of operands in the main memory. Sometimes this mode is called "relaxed memory model".

    You have mentioned a lot of different tools. Could you tell us about them briefly? Perhaps many readers haven't even heard about most of these tools.

    I'd like to say that I haven't got acquainted with most of them (installation, launch of samples, using them in my own projects). I studied them briefly for I could understand from general descriptions that they were not what I wanted, and it was senseless to continue studying them. That's why I can hardly tell anything interesting for end users but still:

    I can tell you about Spin tool which approximates RRD in some properties and I know that it has been used for verifying some synchronization algorithms for Linux kernel and for Threading Building Blocks. Spin is, perhaps, the oldest and most thorough tool of this kind, its roots lie in the beginning of the 80-s, several books had been written on it and I'm very pleased that it is still developing. Spin includes a lot of variants of check - dynamic check with and without storing states, full and partial (for very large programs) checks of the program model and so on, it's just impossible to list them all. Promela compiler (the language used by Spin) and verifier (Protocol ANalyser, pan in terms of Spin) have a lot of keys controlling different aspects of operation (test mode, the degree of output refinement, memory limit etc), and there are also some GUI frames. In a word, if you need something special you are likely to find it in Spin.

    The process of working with Spin is in itself similar to working with RRD - a test is described in the special language Promela (a PRocess MEta LAnguage), after that you compile it and at the output you receive the source file in C which must be compiled by a C compiler to get a verifier. Then you launch the verifier and when an error is detected it creates a file with a thorough description of the error and execution history. After that from this file you can generate a Postscript file for further browsing or using it for "playback" of the execution history. As you can see the process of working with Spin is a bit more complicated than with RRD: well, such is the status :).

    There is a logical question - why wasn't I content with Spin? Firstly, it is the special language Promela for describing tests; on the one hand it's not such a fundamental issue but on the other hand I sometimes catch myself at being too lazy to carry out even that minimum code tooling which is necessary for RRD. And while rewriting a program manually into another language we still risk to test an absolutely different thing. Secondly, it is the sequentially consistent memory model; here nothing can be said in defense of Spin - support of free access to memory ("relaxed memory model") is just necessary for the verifier of synchronization algorithms. Thirdly, it is absence of embedded support for such specific things as calls of Win32 API WaitForMultipleObjects() or SignalObjectAndWait(), or false awakenings at the condition variable POSIX, or waitings with time-outs and so on. The sum of all these factors made me turn my back on Spin.

    However, I will once more emphasize that the tool is very worthy. The main site of the project is http://www.viva64.com/go.php?url=196 .

    Could you give examples of code to make the principles of RRD operation clearer and to show how it differs from other tools?

    Here is a simple example in which mutual exception on the basis of a spin-mutex occurs (the first example I will give in C++09 syntax and the second in RRD syntax to show the difference):

    Code:
    std::atomic<int> mutex;
    int data;
    void thread1()
    {
      // simple spin-mutex
      while (mutex.exchange(1, std::memory_order_acquire))
        std::this_thread::yield();
      data = 1;
      mutex.store(0, std::memory_order_release);
    }
    void thread2()
    {
      // simple spin-mutex
      while (mutex.exchange(1, std::memory_order_acquire))
        std::this_thread::yield();
      data = 2;
      mutex.store(0, std::memory_order_relaxed);
    }
    This example contains a so called data race type 2. It is characteristic of data races type 2 that the conflicting accesses to the problem variable are not contiguous in any thread interleaving; however, they conflict with each other because of the possible reordering of memory accesses at the free access. RRD will detect this race and show in the resulting history what exact reorderings took place.

    Here is a more complex example - lock-free stack (written in RRD syntax; the main namespace used by RRD is "rl", also pay attention to the needed tooling of the code in the form of "($)"):

    Code:
    struct node
    {
      rl::atomic<node*> next;
      rl::var<void*> data;
    };
    struct stack
    {
      rl::atomic<node*> head;
    };
    void push(stack* s, void* data)
    {
      node* n = RL_NEW(node);
      n->data($) = data;
      node* next = s->head($).load(rl::memory_order_relaxed);
      for (;;)
      {
        n->next($).store(next, rl::memory_order_relaxed);
        if (s->head($).compare_exchange_weak(
                next, n, rl::memory_order_release))
          break;
      }
    }
    void* pop(stack* s)
    {
      node* n = s->head($).load(rl::memory_order_relaxed);
      for (;;)
      {
        if (0 == n)
          return 0;
        node* next = n->next($).load(rl::memory_order_relaxed);
        if (s->head($).compare_exchange_weak(
                n, next, rl::memory_order_acquire))
          break;
      }
      void* data = n->data($);
      RL_DELETE(n);
      return data;
    }
    And this is a unit-test for RRD:
    Code:
    // template parameter "2" defines the number of threads in the test
    struct test : rl::test_suite<test, 2>
    {
      stack s;
      
      // is executed in one thread
      // before execution of the main function of threads
      void before()
      {
        s.head($) = 0;
      }
      // the main function of threads
      void thread(unsigned /*thread_index*/)
      {
        push(&s, (void*)1);
        void* data = pop(&s);
        RL_ASSERT(data == (void*)1);
      }
    };
    int main()
    {
      rl::simulate<test>();
    }
    If we launch the program we'll see the following result (I've removed the history of executing separate threads; the first figure in the line is the global serial number of the operation - to correlate with the history of executing separate threads, the second figure is the number of the thread):
    Code:
    struct test
    ACCESS TO FREED MEMORY (access to freed memory)
    iteration: 2
    execution history:
    [0] 1: [BEFORE BEGIN]
    [1] 1: <0023DEA0> atomic store, value=00000000, (prev value=00000000), order=seq_cst, in test::before, main.cpp(70)
    [2] 1: [BEFORE END]
    [3] 1: memory allocation: addr=0023CB78, size=52, in push, main.cpp(34)
    [4] 1: <0023CB9C> store, value=00000001, in push, main.cpp(35)
    [5] 1: <0023DEA0> atomic load, value=00000000, order=relaxed, in push, main.cpp(36)
    [6] 0: memory allocation: addr=0023CE80, size=52, in push, main.cpp(34)
    [7] 0: <0023CEA4> store, value=00000001, in push, main.cpp(35)
    [8] 1: <0023CB78> atomic store, value=00000000, (prev value=00000000), order=relaxed, in push, main.cpp(39)
    [9] 0: <0023DEA0> atomic load, value=00000000, order=relaxed, in push, main.cpp(36)
    [10] 0: <0023CE80> atomic store, value=00000000, (prev value=00000000), order=relaxed, in push, main.cpp(39)
    [11] 1: <0023DEA0> CAS fail [SPURIOUSLY] orig=00000000, cmp=00000000, xchg=0023CB78, order=release, in push, main.cpp(40)
    [12] 0: <0023DEA0> CAS succ orig=00000000, cmp=00000000, xchg=0023CE80, order=release, in push, main.cpp(40)
    [13] 1: <0023CB78> atomic store, value=00000000, (prev value=00000000), order=relaxed, in push, main.cpp(39)
    [14] 0: <0023DEA0> atomic load, value=0023CE80, order=relaxed, in pop, main.cpp(47)
    [15] 1: <0023DEA0> CAS fail orig=0023CE80, cmp=00000000, xchg=0023CB78, order=release, in push, main.cpp(40)
    [16] 1: <0023CB78> atomic store, value=0023CE80, (prev value=00000000), order=relaxed, in push, main.cpp(39)
    [17] 0: <0023CE80> atomic load, value=00000000, order=relaxed, in pop, main.cpp(52)
    [18] 1: <0023DEA0> CAS succ orig=0023CE80, cmp=0023CE80, xchg=0023CB78, order=release, in push, main.cpp(40)
    [19] 1: <0023DEA0> atomic load, value=0023CB78, order=relaxed, in pop, main.cpp(47)
    [20] 0: <0023DEA0> CAS fail orig=0023CB78, cmp=0023CE80, xchg=00000000, order=acquire, in pop, main.cpp(53)
    [21] 1: <0023CB78> atomic load, value=0023CE80, order=relaxed, in pop, main.cpp(52)
    [22] 1: <0023DEA0> CAS succ orig=0023CB78, cmp=0023CB78, xchg=0023CE80, order=acquire, in pop, main.cpp(53)
    [23] 1: <0023CB9C> load, value=00000001, in pop, main.cpp(56)
    [24] 1: memory deallocation: addr=0023CB78, in pop, main.cpp(57)
    [25] 0: ACCESS TO FREED MEMORY (access to freed memory), in pop, main.cpp(52)
    From this summary we see that when checking the second thread interleaving RRD detected access to released memory. From the history analysis we can understand that thread 1 takes an element off the stack and releases it and after that thread 0 addresses this element.

    What can you say about the new instrument VivaMP? Do you consider it appropriate now, for OpenMP technology is used only by a small number of developers nowadays?

    I think that you are not quite sincere when saying that OpenMP is used by a small number of developers. Of course, everything is relatively but I think that I'm very near the truth when saying that OpenMP is the most wide-spread library of parallelism support in manufacturing code. Firstly, it is a relatively old and proved means supported by most commercial and non-commercial organizations, with a lot of independent implementations. Secondly, it is rather simple and solves its task well.

    And of course being a developer of my own tool of verifying multi-thread code, I find such tools very urgent and necessary, especially now when everyone has a computer with a multi-core processor on his table. Proceeding from these two points I can say that VivaMP is an indispensable tool for developers who are only beginners in the sphere of parallel programming. But VivaMP will be useful for more experienced developers as well because no one is secure both from "stupid" mistakes (inattention, copy-paste) and "clever" mistakes. And VivaMP will always "cover your back" with the help of its equity and computational power. I know a lot of examples when a multi-thread code developed by experts and examined by many people had been working for years but then serious errors were detected in it which had caused hangs and crashes. Most of these errors had been or could have been detected by means of verification such as VivaMP.

    What the technical aspect is concerned, VivaMP is a tool of static verification. And what I like about static verification is that you don't have to write unit-tests, the tool checks the target code by itself. And the question is not in the necessity of writing some additional code but in that it is again that very human factor. A developer must decide which tests are necessary, how exactly they should work and so on; and the quality of the check will directly depend on the quality of unit-tests. When using VivaMP there is no such a problem, you have only the code being checked and the tool. I think it is rather a powerful tool.

    You showed interest in the open code analysis library VivaCore created by our company OOO "Program Verification Systems". What is the reason for this and can the library help in improving RRD?

    The idea was to avoid the necessity of manual tooling of code. That is, to write a personal code preprocessor on the basis of VivaCore library so that it could insert all those notorious "($)" in the right places and the user could test directly his "urgent" code. But preliminary investigations showed that this would demand a lot of resources and unfortunately we had to give up this idea.

    How are you planning to improve RRD?

    Well, I always have a lot of plans :). On the RRD site you can see TODO/Feature List in which I state my plans and ideas concerning further development of RRD. The most essential and interesting improvements are support of a local thread storage (TSS/TLS) with wrappings for POSIX and Win32, support of UNIX signals and different types of hardware interrupts, optimization of the algorithm of the partial-order reductions and paralleling the library's operation, periodical saving at check points, detection of "dead" (non-tested) code, modeling program characteristics concerning performance and scaling. But at this moment the library's development is, so to say, demand-driven, that is driven by the needs of users. That's why I will be glad to get some responses and ideas from the readers concerning this issue.

    What would you like to say to those our readers who only begin to master parallel technologies?

    About parallel technologies I can say the same thing as about any other new technology - experiment more, try to solve simple tasks and watch what you get, and if you don't succeed, put forward hypotheses and check them, create new ones and check them and so on. Only practice and feedback can make you a professional. And of course don't be "squeamish" about means of automatic code verification - they are like an expert standing behind you and watching you. Of course you can avoid these means but they still will help you save much time.

    Thank you for the interview and interesting and detailed answers.

    Thank you. I wish you and our readers every success in developments.

    Conclusion

    We'd like to once again thank Dmitriy for the interesting conversation and account of tools for verifying parallel applications. In the reference section at end of the article you can get acquainted with the list of resources devoted to RRD and some other similar tools.

    References


    1. Anthony Williams. Peterson's lock with C++0x atomics. http://www.viva64.com/go.php?url=192
    2. Q&A with a TBB Junkie. http://www.viva64.com/go.php?url=193
    3. Relacy Race Detector. http://www.viva64.com/go.php?url=194
    4. Scalable Synchronization Algorithms. http://www.viva64.com/go.php?url=195
    5. Spin - Formal Verification. http://www.viva64.com/go.php?url=196
    6. Evgeniy Ryzhkov. VivaMP - a tool for OpenMP. http://www.viva64.com/art-3-2-655899033.html
    7. Andrey Karpov. Testing parallel programs. http://www.viva64.com/art-3-2-548461019.html
    8. Open library VivaCore for parsing and analyzing C/C++ code. http://www.viva64.com/vivacore-library/
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice