README.rst 8.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238
  1. ===========================
  2. Unmaintained Package Notice
  3. ===========================
  4. Unfortunately, and due to lack of resources, the Blosc Development Team is unable to maintain this package anymore.
  5. During the last 10 years we managed to find resources (even if in a quite irregular way) to develop what we think is a
  6. nice package for handling compressed data containers, especially tabular data. Regrettably, for the last years we did
  7. not found sponsorship enough to continue the maintenance of this package.
  8. For those that depend on bcolz, a fork is welcome and we will try our best to provide advice for possible new
  9. maintainers. Indeed, if we manage to get some decent grants via Blosc (https://blosc.org/pages/donate/), our umbrella
  10. project, we would be glad to reconsider the maintenance of bcolz. But again, we would be very open and supportive
  11. for this project to get a new maintenance team.
  12. Finally, thanks to all the people that used and contributed in one way or another to bcolz; it has been a nice ride!
  13. Let's hope it still would have a bright future ahead.
  14. The Blosc Development Team
  15. bcolz: columnar and compressed data containers
  16. ==============================================
  17. .. image:: https://badges.gitter.im/Blosc/bcolz.svg
  18. :alt: Join the chat at https://gitter.im/Blosc/bcolz
  19. :target: https://gitter.im/Blosc/bcolz?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
  20. :Version: |version|
  21. :Travis CI: |travis|
  22. :Appveyor: |appveyor|
  23. :Coveralls: |coveralls|
  24. :And...: |powered|
  25. .. |version| image:: https://img.shields.io/pypi/v/bcolz.png
  26. :target: https://pypi.python.org/pypi/bcolz
  27. .. |travis| image:: https://img.shields.io/travis/Blosc/bcolz.png
  28. :target: https://travis-ci.org/Blosc/bcolz
  29. .. |appveyor| image:: https://img.shields.io/appveyor/ci/FrancescAlted/bcolz.png
  30. :target: https://ci.appveyor.com/project/FrancescAlted/bcolz/branch/master
  31. .. |powered| image:: http://b.repl.ca/v1/Powered--By-Blosc-blue.png
  32. :target: http://blosc.org
  33. .. |coveralls| image:: https://coveralls.io/repos/Blosc/bcolz/badge.png
  34. :target: https://coveralls.io/r/Blosc/bcolz
  35. .. image:: docs/bcolz.png
  36. bcolz provides columnar, chunked data containers that can be
  37. compressed either in-memory and on-disk. Column storage allows for
  38. efficiently querying tables, as well as for cheap column addition and
  39. removal. It is based on `NumPy <http://www.numpy.org>`_, and uses it
  40. as the standard data container to communicate with bcolz objects, but
  41. it also comes with support for import/export facilities to/from
  42. `HDF5/PyTables tables <http://www.pytables.org>`_ and `pandas
  43. dataframes <http://pandas.pydata.org>`_.
  44. bcolz objects are compressed by default not only for reducing
  45. memory/disk storage, but also to improve I/O speed. The compression
  46. process is carried out internally by `Blosc <http://blosc.org>`_, a
  47. high-performance, multithreaded meta-compressor that is optimized for
  48. binary data (although it works with text data just fine too).
  49. bcolz can also use `numexpr <https://github.com/pydata/numexpr>`_
  50. internally (it does that by default if it detects numexpr installed)
  51. or `dask <https://github.com/dask/dask>`_ so as to accelerate many
  52. vector and query operations (although it can use pure NumPy for doing
  53. so too). numexpr/dask can optimize the memory usage and use
  54. multithreading for doing the computations, so it is blazing fast.
  55. This, in combination with carray/ctable disk-based, compressed
  56. containers, can be used for performing out-of-core computations
  57. efficiently, but most importantly *transparently*.
  58. Just to whet your appetite, here it is an example with real data, where
  59. bcolz is already fulfilling the promise of accelerating memory I/O by
  60. using compression:
  61. http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb
  62. Rationale
  63. ---------
  64. By using compression, you can deal with more data using the same
  65. amount of memory, which is very good on itself. But in case you are
  66. wondering about the price to pay in terms of performance, you should
  67. know that nowadays memory access is the most common bottleneck in many
  68. computational scenarios, and that CPUs spend most of its time waiting
  69. for data. Hence, having data compressed in memory can reduce the
  70. stress of the memory subsystem as well.
  71. Furthermore, columnar means that the tabular datasets are stored
  72. column-wise order, and this turns out to offer better opportunities to
  73. improve compression ratio. This is because data tends to expose more
  74. similarity in elements that sit in the same column rather than those
  75. in the same row, so compressors generally do a much better job when
  76. data is aligned in such column-wise order. In addition, when you have
  77. to deal with tables with a large number of columns and your operations
  78. only involve some of them, a columnar-wise storage tends to be much
  79. more effective because minimizes the amount of data that travels to
  80. CPU caches.
  81. So, the ultimate goal for bcolz is not only reducing the memory needs
  82. of large arrays/tables, but also making bcolz operations to go faster
  83. than using a traditional data container like those in NumPy or Pandas.
  84. That is actually already the case in some real-life scenarios (see the
  85. notebook above) but that will become pretty more noticeable in
  86. combination with forthcoming, faster CPUs integrating more cores and
  87. wider vector units.
  88. Requisites
  89. ----------
  90. - Python >= 2.7 and >= 3.5
  91. - NumPy >= 1.8
  92. - Cython >= 0.22 (just for compiling the beast)
  93. - C-Blosc >= 1.8.0 (optional, as the internal Blosc will be used by default)
  94. Optional:
  95. - numexpr >= 2.5.2
  96. - dask >= 0.9.0
  97. - pandas
  98. - tables (pytables)
  99. Building
  100. --------
  101. There are different ways to compile bcolz, depending if you want to
  102. link with an already installed Blosc library or not.
  103. Compiling with an installed Blosc library (recommended)
  104. .......................................................
  105. Python and Blosc-powered extensions have a difficult relationship when
  106. compiled using GCC, so this is why using an external C-Blosc library is
  107. recommended for maximum performance (for details, see
  108. https://github.com/Blosc/python-blosc/issues/110).
  109. Go to https://github.com/Blosc/c-blosc/releases and download and
  110. install the C-Blosc library. Then, you can tell bcolz where is the
  111. C-Blosc library in a couple of ways:
  112. Using an environment variable:
  113. .. code-block:: console
  114. $ BLOSC_DIR=/usr/local (or "set BLOSC_DIR=\blosc" on Win)
  115. $ export BLOSC_DIR (not needed on Win)
  116. $ python setup.py build_ext --inplace
  117. Using a flag:
  118. .. code-block:: console
  119. $ python setup.py build_ext --inplace --blosc=/usr/local
  120. Compiling without an installed Blosc library
  121. ............................................
  122. bcolz also comes with the Blosc sources with it so, assuming that you
  123. have a C++ compiler installed, do:
  124. .. code-block:: console
  125. $ python setup.py build_ext --inplace
  126. That's all. You can proceed with testing section now.
  127. Note: The requirement for the C++ compiler is just for the Snappy
  128. dependency. The rest of the other components of Blosc are pure C
  129. (including the LZ4 and Zlib libraries).
  130. Testing
  131. -------
  132. After compiling, you can quickly check that the package is sane by
  133. running::
  134. $ PYTHONPATH=. (or "set PYTHONPATH=." on Windows)
  135. $ export PYTHONPATH (not needed on Windows)
  136. $ python -c"import bcolz; bcolz.test()" # add `heavy=True` if desired
  137. Installing
  138. ----------
  139. Install it as a typical Python package::
  140. $ pip install -U .
  141. Optionally Install the additional dependencies::
  142. $ pip install .[optional]
  143. Documentation
  144. -------------
  145. You can find the online manual at:
  146. http://bcolz.blosc.org
  147. but of course, you can always access docstrings from the console
  148. (i.e. ``help(bcolz.ctable)``).
  149. Also, you may want to look at the bench/ directory for some examples
  150. of use.
  151. Resources
  152. ---------
  153. Visit the main bcolz site repository at:
  154. http://github.com/Blosc/bcolz
  155. Home of Blosc compressor:
  156. http://blosc.org
  157. User's mail list:
  158. http://groups.google.com/group/bcolz (bcolz@googlegroups.com)
  159. An `introductory talk (20 min)
  160. <https://www.youtube.com/watch?v=-lKV4zC1gss>`_ about bcolz at
  161. EuroPython 2014. `Slides here
  162. <http://blosc.org/docs/bcolz-EuroPython-2014.pdf>`_.
  163. License
  164. -------
  165. Please see ``BCOLZ.txt`` in ``LICENSES/`` directory.
  166. Share your experience
  167. ---------------------
  168. Let us know of any bugs, suggestions, gripes, kudos, etc. you may
  169. have.
  170. **Enjoy Data!**