Data Compression In Java

Discussion in 'Java' started by pradeep, Mar 24, 2007.

  1. pradeep

    pradeep Team Leader

    Joined:
    Apr 4, 2005
    Messages:
    1,645
    Likes Received:
    87
    Trophy Points:
    0
    Occupation:
    Programmer
    Location:
    Kolkata, India
    Home Page:
    http://blog.pradeep.net.in
    The classical input/output (I/O) library in Java contains classes that support I/O handling of streams with compressed data. You can easily compress and decompress any text or binary data to or from any I/O stream using either a file or any other stream (e.g., a servlet output stream). In this article, you'll see how easily you can compress data streams in Java with GZIP and Zip data formats.

    Data compressing classes



    Data compressing classes use generic I/O streams at a lower level. It's important that these classes are not a part of symbol streams hierarchy like Reader and Writer but are based on byte streams InputStream and OutputStream. This is because the compressing library works with bytes and not with symbols. Nevertheless, you can always use mixed streams by converting a byte stream into symbol stream by using InputStreamReader and OutputStreamWriter.

    Here are some of the classes that you may use when working with data streams:

    " DeflaterOutputStream: This is the base class for all data compressing classes.
    " CheckedInputStream: For any input stream InputStream, this class can return a checksum with method getCheckSum().
    " CheckedOutputStream: For any output stream OutputStream, this class can return a checksum with method getCheckSum().
    " ZipOutputStream: This is a subclass of DeflaterOutputStream; its main purpose is compressing data in Zip format.
    " GZIPOutputStream: This is a subclass of DeflaterOutputStream; it compresses data in GZIP format.
    " InflaterInputStream: This is a base class for decompressing data.
    " ZipInputStream: This subclass of InflaterInputStream can decompress Zip format data.
    " GZIPInputStream: This subclass of InflaterInputStream can decompress GZIP format data.

    There are a lot of data compressing algorithms, but GZIP and Zip formats are the most frequently used. This is why they are implemented in a standard Java package.

    GZIP encoding



    GZIP, the simplest compressing method, is ideal for situations when you have only one data stream that you need to compress. In Listing A, I compress and then decompress a file with the help of Java classes located in the java.util.zip package.

    Listing A
    Code:
    import java.io.*;
    import java.util.zip.*;
    
    public class GZIP {
          public void compress() throws IOException {
                // first compress inputfile.txt into out.gz
                BufferedReader in = new BufferedReader(
                new FileReader("inputfile.txt"));
                BufferedOutputStream out = new BufferedOutputStream(
                      new GZIPOutputStream(new FileOutputStream("out.gz")));
                int c;
                while ((c = in.read()) != -1) out.write();
                in.close();
                out.close();
                
                // now decompress our new file
                BufferedReader in2 = new BufferedReader( new InputStreamReader(
                      New GZIPInputStream(new FileInputStream("out.gz")));
                String s;
                while ((s = in2.readLine()) != null)
                      System.out.println(s);
          }
    }
    To use data compressing classes, you convert them using constructor conversions of I/O classes into the output stream that you need. In Listing A, I used mixed byte and symbol streams; stream uses classes based on the Reader class, while the constructor of GZIPOutputStream uses only streams based on OutputStream but not Writer. This is why when you open a file, the data compressing stream GZIPInputStream is converted into the symbolic stream Reader.

    Zip compression



    The library for handling Zip data format has much more to offer than the one for GZIP. It's easy to compress any number of files, and there is even a special class for reading Zip files. Java uses standard Zip format, so any compressing or archive utility will be able to read your compressed data. Listing B has the same structure as Listing A, but the number of files is not limited.

    Listing B
    Code:
    import java.io.*;
    import java,util.*;
    import java.util.zip.*;
    
    public class Zip {
          public void compress() throws IOException {
                FileOutputStream f = new FileOutputStream("out.zip");
                CheckedOutputStreamcsum = new CheckedOutputStream(f, new CRC32());
                ZipOutputStream out = new ZipOutputStream(
                      new BufferedOututStream(csum));
                out.setComment("Here is how we compressed in Java");
                
                // now adding files -- any number with putNextEntry() method
                BufferedReader in = new BufferedReader( new FileReader("1.txt"));
                out.putNextEntry(new ZipEntry("1.txt"));
                int c;
                while ((c = in.read()) != -1) out.write();
                in.close();
    
                // printing a checksum calculated with CRC32
                System.out.println("Checksum: "+csum.getChecksum().getValue());
    
    
                // Now decompress archive
                FileInputStreamfi = new FileInputStream("out.zip");
                CheckedInputStreamcsumi = new CheckedInputStream(fi,new CRC32());
                ZipInputStream in2 = new ZipInputStream(
                      new BufferedInputStream(csumi));
                ZipEntryze;
                while ((ze = in2.getNextEntry()) != null) {
                      System.out.println("Extracting file "+ze);
                      int x;
                      while ((x = in2.read()) != -1)
                            System.out.write(x);
                      System.out.println();
                }
                System.out.println("Checksum extracted: "+ csumi.getChecksum().getValue());
                in2.close();
          }
    }
    Note the using of classes CheckedOutputStream and CRC32. You can easily get a checksum of your data and control its integrity. Checksum is used for checking that the data has not been changed and/or distorted. CRC32 is a very well-known 32-bit algorithm for checking data integrity.

    For each file added into the archive, you must call the method putNextEntry() with the corresponding ZipEntry object. ZipEntry contains everything required for adding into Zip file additional storage information, such as file name, compressed and decompressed sizes, CRC checksum, comments, compressing method, etc. Java does not allow you to set up a password for Zip archive, despite such possibility in the original Zip file format.

    Adoption



    You can compress and decompress any data stream, which is not necessarily the case for a file. Data compression is widely used, for example, in servlet output streams by many servlet applications and application servers (servlet containers) because GZIP-compressed data is also an Internet standard for transferring data via the HTTP protocol.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice