Using PEAR Text_Diff to compare text files

Discussion in 'PHP' started by pradeep, May 6, 2007.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    When it becomes necessary to compare two or more text files in UNIX, most developers reach for the diff program. This program, included by default in almost all UNIX distributions, compares the files line by line and displays the changes between them in a number of different output formats.

    Though diff originally is a command-line utility, packages replicating its functionality are available for most development environments and languages, including Perl, JSP, and PHP. And so we come to Text_Diff, a PEAR class that makes it possible to compare file contents in the PHP environment and render the output in various formats.

    This tutorial will demonstrate this class in action, illustrating how you can use it to dynamically compare file contents with PHP and render the results as a Web page. I'll assume here that you have a working Apache and PHP installation and that the PEAR Text_Diff class has been correctly installed.

    Note: You can install the PEAR Text_Diff package directly from the Web, either by downloading it or by using the instructions provided.

    Setting up test files



    Before writing any code, it's necessary to set up the test files we'll be using in this tutorial. These are two simple files, with some deliberate differences that Text_Diff should be able to pick up on. Snippet A is the first file, named data1.txt.

    Snippet A
    Code:
    apple
    banana
    cantaloupe
    drumstick
    enchilada
    fig
    grape
    horseradish
    
    And Snippet B is the second file, named data2.txt.

    Snippet B
    Code:
    apple
    bat
    cantaloupe
    drumstick
    enchilada
    fig
    peach
    pear
    
    
    
    zebra

    Performing basic comparison



    Having set up the files, let's begin with a simple illustration of how Text_Diff works. Start with the script in Snippet C.

    Snippet C
    PHP:
    <?php
    // adjust file paths as per your local configuration!

    include_once "Text/Diff.php";
    include_once 
    "Text/Diff/Renderer.php";

    // define files to compare
    $file1 "data1.txt";
    $file2 "data2.txt";

    // perform diff, print output
    $diff = &new Text_Diff(file($file1), file($file2));
    $renderer = &new Text_Diff_Renderer();
    echo 
    $renderer->render($diff);
    ?>
    This is fairly simple at first glance. There are two basic classes in the Text_Diff package: Text_Diff(), which actually performs the comparison and returns diffoutput; and Text_Diff_Renderer(), which formats the diff output into a format that is easily understandable. The Text_Diff() object, in particular, must be initialized with the actual contents (and not the locations) of the two files to be compared.

    The script begins by initializing these two objects, making use of PHP's file() function to extract the contents of each file as a series of arrays. The Text_Renderer() object is then used to render the output in standard diff format, producing output which should be familiar to any UNIX developer:
    Code:
    2c2
    <banana
    ---
    >bat
    7,8c7,12
    <grape
    <horseradish
    ---
    >peach
    >pear
    >
    >
    >
    >zebra

    Making differences easier to read



    Now, the output above is not particularly easy to read unless you have lots of experience at decoding diff results. That's why Text_Diff comes with a couple of options to reformat this output into something more readable. These options are accessible as child classes of the Text_Diff_Renderer() object and make it possible to view comparison results in either unified or inline format.

    The following script (Snippet D) modifies the previous example to demonstrate unified format:

    Snippet D
    PHP:
    <html>
    <head></head>
    <body>
        <pre>
        <?php
        
    // adjust file paths as per your local configuration!

        
    include_once "Text/Diff.php";
        include_once 
    "Text/Diff/Renderer.php";
        include_once 
    "Text/Diff/Renderer/unified.php";

        
    // define files to compare
        
    $file1 "data1.txt";
        
    $file2 "data2.txt";

        
    // perform diff, print output
        
    $diff = &new Text_Diff(file($file1), file($file2));
        
    $renderer = &new Text_Diff_Renderer_unified();
        echo 
    $renderer->render($diff);
        
    ?>
        </pre>

    </body>
    </html>
    Notice the call to the appropriate child class when initializing the renderer.

    And here's the output:
    Code:
    @@ -1,8 +1,12 @@
    apple
    -banana
    +bat
    cantaloupe
    drumstick
    enchilada
    fig
    -grape
    -horseradish
    +peach
    +pear
    +
    +
    +
    +zebra
    A quick explanation is in order here: in the unified format, the plus (+) prefix indicates additional lines, the minus (-) prefix indicates deleted lines, and no prefix indicates unchanged lines. Comparing the output above with the original files, it's fairly easy to see how the diff output reflects which lines have changed and what the changes are.

    Of course, it's possible to make it even more user-friendly -- and that's precisely what inline formatting tries to accomplish. In this format, strikethroughs are used to visually indicate which characters and lines have changed. Snippet E shows you how to use it.

    Snippet E
    PHP:
    <html>
    <head></head>
    <body>
        <pre>
        <?php
        
    // adjust file paths as per your local configuration!

        
    include_once "Text/Diff.php";
        include_once 
    "Text/Diff/Renderer.php";
        include_once 
    "Text/Diff/Renderer/inline.php";

        
    // define files to compare
        
    $file1 "data1.txt";
        
    $file2 "data2.txt";

        
    // perform diff, print output
        
    $diff = &new Text_Diff(file($file1), file($file2));
        
    $renderer = &new Text_Diff_Renderer_inline();
        echo 
    $renderer->render($diff);
        
    ?>
        </pre>
    </body>
    </html>
    And here's the output:

    apple
    <strike>banana</strike>bat
    cantaloupe
    drumstick
    enchilada
    fig
    <strike> grape</strike>
    <strike>horseradishpeach</strike>
    pear



    zebraAnd that's about it for this tutorial. Hopefully you now have a clear idea of how Text_Diff can be used to rapidly and efficiently compare files in the PHP environment and how the output can be formatted for easy readability. Happy coding!
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice