HIPI - Hadoop Image Processing Framework


hibImport is a tool for creating a HipiImageBundle (HIB) from a folder of images on your local file system. Note that this tool does not use the MapReduce framework, but does write to the Hadoop Distributed File System (HDFS).


Compile hibImport by executing the following command in the HIPI tools directory (see our general notes on setting up HIPI on your system):
$> cd tools
$> gradle hibImport:jar
If successful, this command will create hibImport.jar at the following location:
$> ls hibImport/build/libs


Run hibImport by executing the hibImport.sh script located in the tools directory. As with all of the tools scripts, running it without any arguments shows its usage:
$> ./hibImport.sh
usage: hibImport.jar [options] <image directory> <output HIB>
 -f,--force   force overwrite if output HIB already exists
hibImport takes two arguments. The first argument is the path to a directory of images on the local file system. The second argument is the HDFS path to the output HIB that will be created once the program has finished. You may optionally specify the -f or --force argument which will cause the destination HIB to be overwritten if it exists.


For this example, suppose the directory ~/Desktop/Tigers on the local file system contains four images (three JPEGs and one PNG). The following command would generate a HIB named tiger.hib on the HDFS consisting of this set of images:
$> ./hibImport.sh ~/Desktop/Tigers tigers.hib
Input image directory: /Users/hipiuser/Desktop/Tigers
Output HIB: tigers.hib
Overwrite HIB if it exists: false
 ** added: 1.jpg
 ** added: 2.jpg
 ** added: 3.jpg
 ** added: 4.png
Created: tigers.hib and tigers.hib.dat    	   
You can verify that the HDFS was updated correctly by listing the contents of the current working directory:
$> hadoop fs -ls
Found 2 items
-rw-r--r--   1 user group         80 2015-03-11 16:55 tigers.hib
-rw-r--r--   1 user group   16493828 2015-03-11 16:55 tigers.hib.dat
Note that hibimport has actually created two files: tigers.hib and tigers.hib.dat. This is the structure for all HIB files. The tigers.hib file is an index into the data file and stores byte offsets to the beginning of each image segment. The tigers.hib.dat file contains the image data itself. HIPI expects both of these files to be present and named consistently.


Read about tools/hibInfo, which lets you query basic information about a HIB and verify its integrity.