HIPI - Hadoop Image Processing Framework


hibToJpeg is a MapReduce/HIPI program that extracts the images in a HipiImageBundle (HIB) as a set of individual JPEGs stored on the HDFS.


Compile hibToJpeg by executing the following Gradle command in the HIPI tools directory (see our general notes on setting up HIPI on your system):
$> cd tools
$> gradle hibToJpeg:jar


Run hibToJpeg by executing the hibToJpeg.sh script located in the tools directory. Running it without any arguments shows its usage:
$> ./hibToJpeg.sh
Usage: hibToJpeg.jar <input HIB> <output directory>
hibToJpeg takes two arguments. The first argument is the path to a HIB on the HDFS. The second is the path to a directory on the HDFS that will be created once the program has finished. The images in the input HIB will be stored as individual JPEGs in this directory.


Let's use the download.hib file that was created in the hibDownload example:
$> ./hibToJpeg.sh download.hib download_extract
After the program finishes, we can inspect the newly created download_extract directory on the HDFS:
$> hadoop fs -ls download_extract
Found 14 items
-rw-r--r--   1 hipiuser supergroup    2520970 2015-08-14 17:05 download_extract/01.jpg
-rw-r--r--   1 hipiuser supergroup    3437882 2015-08-14 17:05 download_extract/02.jpg
-rw-r--r--   1 hipiuser supergroup    1814641 2015-08-14 17:05 download_extract/03.jpg
-rw-r--r--   1 hipiuser supergroup    2474565 2015-08-14 17:05 download_extract/04.jpg
-rw-r--r--   1 hipiuser supergroup    2255832 2015-08-14 17:05 download_extract/05.jpg
-rw-r--r--   1 hipiuser supergroup    3705417 2015-08-14 17:05 download_extract/06.jpg
-rw-r--r--   1 hipiuser supergroup     955944 2015-08-14 17:05 download_extract/07.jpg
-rw-r--r--   1 hipiuser supergroup    1073753 2015-08-14 17:05 download_extract/08.jpg
-rw-r--r--   1 hipiuser supergroup     325019 2015-08-14 17:05 download_extract/09.jpg
-rw-r--r--   1 hipiuser supergroup     412266 2015-08-14 17:05 download_extract/10.jpg
-rw-r--r--   1 hipiuser supergroup    1259357 2015-08-14 17:05 download_extract/11.jpg
-rw-r--r--   1 hipiuser supergroup     262038 2015-08-14 17:05 download_extract/12.jpg
-rw-r--r--   1 hipiuser supergroup          0 2015-08-14 17:05 download_extract/_SUCCESS
-rw-r--r--   1 hipiuser supergroup         96 2015-08-14 17:05 download_extract/part-r-00000
Note that the source filenames are preserved. This is possible because the source path is stored as meta data along with each image in the HIB. Next, copy these images to your local file system using hadoop fs -copyToLocal and open them using an image viewer. This is what 07.jpg should look like:


Read about a HIPI program that computes the Principal Components of Natural Image Patches. This reproduces a famous computer vision result that originally used only 15 grayscale image. MapReduce and HIPI make it easy to study these types of statistical properties over much larger image collections.