HIPI - Hadoop Image Processing Framework

tools/hibImport

hibImport is a tool for creating a HipiImageBundle (HIB) from a folder of images on your local file system. Note that this tool does not use the MapReduce framework, but does write to the Hadoop Distributed File System (HDFS).

Compiling

Compile hibImport by executing the following command in the HIPI tools directory (see our general notes on setting up HIPI on your system):
$> cd tools
$> gradle hibImport:jar
        
If successful, this command will create hibImport.jar at the following location:
$> ls hibImport/build/libs
hibImport.jar
        

Usage

Run hibImport by executing the hibImport.sh script located in the tools directory. As with all of the tools scripts, running it without any arguments shows its usage:
$> ./hibImport.sh
usage: hibImport.jar [options] <image directory> <output HIB>
 -f,--force   force overwrite if output HIB already exists
        
hibImport takes two arguments. The first argument is the path to a directory of images on the local file system. The second argument is the HDFS path to the output HIB that will be created once the program has finished. You may optionally specify the -f or --force argument which will cause the destination HIB to be overwritten if it exists.

Example

For this example, suppose the directory ~/Desktop/Tigers on the local file system contains four images (three JPEGs and one PNG). The following command would generate a HIB named tiger.hib on the HDFS consisting of this set of images:
$> ./hibImport.sh ~/Desktop/Tigers tigers.hib
Input image directory: /Users/hipiuser/Desktop/Tigers
Output HIB: tigers.hib
Overwrite HIB if it exists: false
 ** added: 1.jpg
 ** added: 2.jpg
 ** added: 3.jpg
 ** added: 4.png
Created: tigers.hib and tigers.hib.dat    	   
        
You can verify that the HDFS was updated correctly by listing the contents of the current working directory:
$> hadoop fs -ls
Found 2 items
-rw-r--r--   1 user group         80 2015-03-11 16:55 tigers.hib
-rw-r--r--   1 user group   16493828 2015-03-11 16:55 tigers.hib.dat
	      
Note that hibimport has actually created two files: tigers.hib and tigers.hib.dat. This is the structure for all HIB files. The tigers.hib file is an index into the data file and stores byte offsets to the beginning of each image segment. The tigers.hib.dat file contains the image data itself. HIPI expects both of these files to be present and named consistently.

Next

Read about tools/hibInfo, which lets you query basic information about a HIB and verify its integrity.