HIPI - Hadoop Image Processing Framework

Tools and Example Programs

The HIPI distribution includes several programs located in the tools subdirectory that are useful in their own right and will help you learn how to create your own HIPI programs.

tools/hibImport

This is a simple program and a good place to start exploring HIPI. hibImport creates a HipiImageBundle (HIB) from a folder of images on your local file system. A HIB is the key input file to the HIPI framework and represents a collection of images stored on the Hadoop Distributed File System (HDFS).

tools/hibInfo

The hibInfo tool allows querying basic information about HIBs such as image count, spatial dimensions of individual images, image meta data stored at the time of HIB creation, and image EXIF data. It also allows extracting individual images as a stand-alone JPEG or PNG.

tools/hibDump

Similar to hibInfo, but this is a MapReduce/program that extracts basic information about the images in a HIB. It does this using multiple parallel map tasks (one mapper for each image in the HIB) and writes this information to a text file on the HDFS in a single reduce task.

tools/hibDownload

This is a MapReduce/HIPI program that creates a HIB from a set of images located on the Internet. This program highlights some of the more subtle parts of HIPI and the Hadoop framework and will be a valuable tool for creating inputs. It is also designed to work seamlessly with the Yahoo/Flickr 100M Creative Commons research dataset.

tools/hibToJpeg

This is a MapReduce/HIPI program that extracts the images within a HIB as individual JPEG files written to the HDFS. This program illustrates many important features of the HIPI API (e.g., how to process the images in a HIB according to the MapReduce programming model). It is also a useful tool to verify that a HIB has been properly created.

tools/covar

This is a MapReduce/HIPI program that implements the experiment described in the paper The Principal Components of Natural Images, written by Hancock et al. in 1992. This program computes the principal components of natural image patches (eigenvectors of the covariance matrix computed over a large set of small image patches). This is a good starting point for learning how to build more complex HIPI programs and also illustrates HIPI's ability to interface with OpenCV.