README.rst 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319
  1. ===============================================================
  2. Blosc: A blocking, shuffling and lossless compression library
  3. ===============================================================
  4. :Author: Francesc Alted
  5. :Contact: francesc@blosc.org
  6. :URL: http://www.blosc.org
  7. :Gitter: |gitter|
  8. :Travis CI: |travis|
  9. :Appveyor: |appveyor|
  10. .. |gitter| image:: https://badges.gitter.im/Blosc/c-blosc.svg
  11. :alt: Join the chat at https://gitter.im/Blosc/c-blosc
  12. :target: https://gitter.im/Blosc/c-blosc?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
  13. .. |travis| image:: https://travis-ci.org/Blosc/c-blosc.svg?branch=master
  14. :target: https://travis-ci.org/Blosc/c-blosc
  15. .. |appveyor| image:: https://ci.appveyor.com/api/projects/status/3mlyjc1ak0lbkmte?svg=true
  16. :target: https://ci.appveyor.com/project/FrancescAlted/c-blosc/branch/master
  17. What is it?
  18. ===========
  19. Blosc [1]_ is a high performance compressor optimized for binary data.
  20. It has been designed to transmit data to the processor cache faster
  21. than the traditional, non-compressed, direct memory fetch approach via
  22. a memcpy() OS call. Blosc is the first compressor (that I'm aware of)
  23. that is meant not only to reduce the size of large datasets on-disk or
  24. in-memory, but also to accelerate memory-bound computations.
  25. It uses the blocking technique (as described in [2]_) to reduce
  26. activity on the memory bus as much as possible. In short, this
  27. technique works by dividing datasets in blocks that are small enough
  28. to fit in caches of modern processors and perform compression /
  29. decompression there. It also leverages, if available, SIMD
  30. instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in
  31. order to accelerate the compression / decompression process to a
  32. maximum.
  33. Blosc is actually a metacompressor, that meaning that it can use a
  34. range of compression libraries for performing the actual
  35. compression/decompression. Right now, it comes with integrated support
  36. for BloscLZ (the original one), LZ4, LZ4HC, Snappy and Zlib. Blosc
  37. comes with full sources for all compressors, so in case it does not
  38. find the libraries installed in your system, it will compile from the
  39. included sources and they will be integrated into the Blosc library
  40. anyway. That means that you can trust in having all supported
  41. compressors integrated in Blosc in all supported platforms.
  42. You can see some benchmarks about Blosc performance in [3]_
  43. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
  44. details.
  45. .. [1] http://www.blosc.org
  46. .. [2] http://blosc.org/docs/StarvingCPUs-CISE-2010.pdf
  47. .. [3] http://blosc.org/synthetic-benchmarks.html
  48. Meta-compression and other advantages over existing compressors
  49. ===============================================================
  50. C-Blosc is not like other compressors: it should rather be called a
  51. meta-compressor. This is so because it can use different compressors
  52. and filters (programs that generally improve compression ratio). At
  53. any rate, it can also be called a compressor because it happens that
  54. it already comes with several compressor and filters, so it can
  55. actually work like so.
  56. Currently C-Blosc comes with support of BloscLZ, a compressor heavily
  57. based on FastLZ (http://fastlz.org/), LZ4 and LZ4HC
  58. (https://github.com/Cyan4973/lz4), Snappy
  59. (https://github.com/google/snappy) and Zlib (http://www.zlib.net/), as
  60. well as a highly optimized (it can use SSE2 or AVX2 instructions, if
  61. available) shuffle and bitshuffle filters (for info on how and why
  62. shuffling works, see slide 17 of
  63. http://www.slideshare.net/PyData/blosc-py-data-2014). However,
  64. different compressors or filters may be added in the future.
  65. C-Blosc is in charge of coordinating the different compressor and
  66. filters so that they can leverage the blocking technique (described
  67. above) as well as multi-threaded execution (if several cores are
  68. available) automatically. That makes that every compressor and filter
  69. will work at very high speeds, even if it was not initially designed
  70. for doing blocking or multi-threading.
  71. Other advantages of Blosc are:
  72. * Meant for binary data: can take advantage of the type size
  73. meta-information for improved compression ratio (using the
  74. integrated shuffle and bitshuffle filters).
  75. * Small overhead on non-compressible data: only a maximum of (16 + 4 *
  76. nthreads) additional bytes over the source buffer length are needed
  77. to compress *any kind of input*.
  78. * Maximum destination length: contrarily to many other compressors,
  79. both compression and decompression routines have support for maximum
  80. size lengths for the destination buffer.
  81. When taken together, all these features set Blosc apart from other
  82. similar solutions.
  83. Compiling your application with a minimalistic Blosc
  84. ====================================================
  85. The minimal Blosc consists of the next files (in `blosc/ directory
  86. <https://github.com/Blosc/c-blosc/tree/master/blosc>`_)::
  87. blosc.h and blosc.c -- the main routines
  88. shuffle*.h and shuffle*.c -- the shuffle code
  89. blosclz.h and blosclz.c -- the blosclz compressor
  90. Just add these files to your project in order to use Blosc. For
  91. information on compression and decompression routines, see `blosc.h
  92. <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_.
  93. To compile using GCC (4.9 or higher recommended) on Unix:
  94. .. code-block:: console
  95. $ gcc -O3 -mavx2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread
  96. Using Windows and MINGW:
  97. .. code-block:: console
  98. $ gcc -O3 -mavx2 -o myprog myprog.c -Iblosc blosc\*.c
  99. Using Windows and MSVC (2013 or higher recommended):
  100. .. code-block:: console
  101. $ cl /Ox /Femyprog.exe /Iblosc myprog.c blosc\*.c
  102. In the `examples/ directory
  103. <https://github.com/Blosc/c-blosc/tree/master/examples>`_ you can find
  104. more hints on how to link your app with Blosc.
  105. I have not tried to compile this with compilers other than GCC, clang,
  106. MINGW, Intel ICC or MSVC yet. Please report your experiences with your
  107. own platforms.
  108. Adding support for other compressors with a minimalistic Blosc
  109. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  110. The official cmake files (see below) for Blosc try hard to include
  111. support for LZ4, LZ4HC, Snappy, Zlib inside the Blosc library, so
  112. using them is just a matter of calling the appropriate
  113. `blosc_set_compressor() API call
  114. <https://github.com/Blosc/c-blosc/blob/master/blosc/blosc.h>`_. See
  115. an `example here
  116. <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_.
  117. Having said this, it is also easy to use a minimalistic Blosc and just
  118. add the symbols HAVE_LZ4 (will include both LZ4 and LZ4HC),
  119. HAVE_SNAPPY and HAVE_ZLIB during compilation as well as the
  120. appropriate libraries. For example, for compiling with minimalistic
  121. Blosc but with added Zlib support do:
  122. .. code-block:: console
  123. $ gcc -O3 -msse2 -o myprog myprog.c blosc/*.c -Iblosc -lpthread -DHAVE_ZLIB -lz
  124. In the `bench/ directory
  125. <https://github.com/Blosc/c-blosc/tree/master/bench>`_ there a couple
  126. of Makefile files (one for UNIX and the other for MinGW) with more
  127. complete building examples, like switching between libraries or
  128. internal sources for the compressors.
  129. Supported platforms
  130. ~~~~~~~~~~~~~~~~~~~
  131. Blosc is meant to support all platforms where a C89 compliant C
  132. compiler can be found. The ones that are mostly tested are Intel
  133. (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM
  134. Blue Gene Q embedded "A2" processor are reported to work too.
  135. Compiling the Blosc library with CMake
  136. ======================================
  137. Blosc can also be built, tested and installed using CMake_. Although
  138. this procedure might seem a bit more involved than the one described
  139. above, it is the most general because it allows to integrate other
  140. compressors than BloscLZ either from libraries or from internal
  141. sources. Hence, serious library developers are encouraged to use this
  142. way.
  143. The following procedure describes the "out of source" build.
  144. Create the build directory and move into it:
  145. .. code-block:: console
  146. $ mkdir build
  147. $ cd build
  148. Now run CMake configuration and optionally specify the installation
  149. directory (e.g. '/usr' or '/usr/local'):
  150. .. code-block:: console
  151. $ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..
  152. CMake allows to configure Blosc in many different ways, like prefering
  153. internal or external sources for compressors or enabling/disabling
  154. them. Please note that configuration can also be performed using UI
  155. tools provided by CMake_ (ccmake or cmake-gui):
  156. .. code-block:: console
  157. $ ccmake .. # run a curses-based interface
  158. $ cmake-gui .. # run a graphical interface
  159. Build, test and install Blosc:
  160. .. code-block:: console
  161. $ cmake --build .
  162. $ ctest
  163. $ cmake --build . --target install
  164. The static and dynamic version of the Blosc library, together with
  165. header files, will be installed into the specified
  166. CMAKE_INSTALL_PREFIX.
  167. .. _CMake: http://www.cmake.org
  168. Once you have compiled your Blosc library, you can easily link your
  169. apps with it as shown in the `example/ directory
  170. <https://github.com/Blosc/c-blosc/blob/master/examples>`_.
  171. Adding support for other compressors (LZ4, LZ4HC, Snappy, Zlib) with CMake
  172. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  173. The CMake files in Blosc are configured to automatically detect other
  174. compressors like LZ4, LZ4HC, Snappy or Zlib by default. So as long as
  175. the libraries and the header files for these libraries are accessible,
  176. these will be used by default. See an `example here
  177. <https://github.com/Blosc/c-blosc/blob/master/examples/many_compressors.c>`_.
  178. *Note on Zlib*: the library should be easily found on UNIX systems,
  179. although on Windows, you can help CMake to find it by setting the
  180. environment variable 'ZLIB_ROOT' to where zlib 'include' and 'lib'
  181. directories are. Also, make sure that Zlib DDL library is in your
  182. '\Windows' directory.
  183. However, the full sources for LZ4, LZ4HC, Snappy and Zlib have been
  184. included in Blosc too. So, in general, you should not worry about not
  185. having (or CMake not finding) the libraries in your system because in
  186. this case, their sources will be automatically compiled for you. That
  187. effectively means that you can be confident in having a complete
  188. support for all the supported compression libraries in all supported
  189. platforms.
  190. If you want to force Blosc to use external libraries instead of
  191. the included compression sources:
  192. .. code-block:: console
  193. $ cmake -DPREFER_EXTERNAL_LZ4=ON ..
  194. You can also disable support for some compression libraries:
  195. .. code-block:: console
  196. $ cmake -DDEACTIVATE_SNAPPY=ON ..
  197. Mac OSX troubleshooting
  198. ~~~~~~~~~~~~~~~~~~~~~~~
  199. If you run into compilation troubles when using Mac OSX, please make
  200. sure that you have installed the command line developer tools. You
  201. can always install them with:
  202. .. code-block:: console
  203. $ xcode-select --install
  204. Wrapper for Python
  205. ==================
  206. Blosc has an official wrapper for Python. See:
  207. https://github.com/Blosc/python-blosc
  208. Command line interface and serialization format for Blosc
  209. =========================================================
  210. Blosc can be used from command line by using Bloscpack. See:
  211. https://github.com/Blosc/bloscpack
  212. Filter for HDF5
  213. ===============
  214. For those who want to use Blosc as a filter in the HDF5 library,
  215. there is a sample implementation in the blosc/hdf5 project in:
  216. https://github.com/Blosc/hdf5
  217. Mailing list
  218. ============
  219. There is an official mailing list for Blosc at:
  220. blosc@googlegroups.com
  221. http://groups.google.es/group/blosc
  222. Acknowledgments
  223. ===============
  224. See THANKS.rst.
  225. ----
  226. **Enjoy data!**