tutorial.rst 4.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
  1. ---------
  2. Tutorials
  3. ---------
  4. This section has been moved to ipython notebook `tutorials`_.
  5. .. _tutorials: https://github.com/Blosc/bcolz/blob/master/docs/tutorials.ipynb
  6. Tutorial on carray objects
  7. ==========================
  8. This section has been moved to ipython notebook `tutorial_carray`_.
  9. .. _tutorial_carray: https://github.com/Blosc/bcolz/blob/master/docs/tutorial_carray.ipynb
  10. Tutorial on ctable objects
  11. ==========================
  12. This section has been moved to ipython notebook `tutorial_ctable`_.
  13. .. _tutorial_ctable: https://github.com/Blosc/bcolz/blob/master/docs/tutorial_ctable.ipynb
  14. Writing bcolz extensions
  15. ========================
  16. Did you like bcolz but you couldn't find exactly the functionality you were
  17. looking for? You can write an extension and implement complex operations on
  18. top of bcolz containers.
  19. Before you start writing your own extension, let's see some
  20. examples of real projects made on top of bcolz:
  21. - `Bquery`: a query and aggregation framework, among other things it
  22. provides group-by functionality for bcolz containers. See
  23. https://github.com/visualfabriq/bquery
  24. - `Bdot`: provides big dot products (by making your RAM bigger on the
  25. inside). Supports ``matrix . vector`` and ``matrix . matrix`` for
  26. most common numpy numeric data types. See
  27. https://github.com/tailwind/bdot
  28. Though not a extension itself, it is worth mentioning `Dask`. Dask
  29. plays nicely with bcolz and provides multi-core execution on
  30. larger-than-memory datasets using blocked algorithms and task
  31. scheduling. See https://github.com/dask/dask.
  32. In addition, bcolz also interacts well with `itertools`, `Pytoolz` or
  33. `Cytoolz` too and they might offer you already the amount of
  34. performance and functionality you are after.
  35. In the next section we will go through all the steps needed to write
  36. your own extension on top of bcolz.
  37. How to use bcolz as part of the infrastructure
  38. ----------------------------------------------
  39. Go to the root directory of bcolz, inside ``docs/my_package/`` you will
  40. find a small extension example.
  41. Before you can run this example you will need to install the following
  42. packages. Run ``pip install cython``, ``pip install numpy`` and ``pip
  43. install bcolz`` to install these packages. In case you prefer Conda
  44. package management system execute ``conda install cython numpy bcolz``
  45. and you should be ready to go. See ``requirements.txt``:
  46. .. literalinclude:: my_package/requirements.txt
  47. :language: python
  48. Once you have those packages installed, change your working directory
  49. to ``docs/my_package/``, please see `pkg. example
  50. <https://github.com/Blosc/bcolz/tree/master/docs/my_package>`_ and run
  51. ``python setup.py build_ext --inplace`` from the terminal, if
  52. everything ran smoothly you should be able to see a binary file
  53. ``my_extension/example_ext.so`` next to the ``.pyx`` file.
  54. If you have any problems compiling these extensions, please make sure
  55. you have a recent version of bcolz as old versions (pre 0.8) don't
  56. contain the necessary ``.pxd`` file which provides a Cython interface
  57. to the carray Cython module.
  58. The ``setup.py`` file is where you will need to tell the compiler, the
  59. name of you package, the location of external libraries (in case you
  60. want to use them), compiler directives and so on. See `bcolz setup.py
  61. <https://github.com/Blosc/bcolz/blob/master/setup.py>`_ as a possible
  62. reference for a more complete example. Along your project grows in
  63. complexity you might be interested in including other options to your
  64. `Extension` object, e.g. `include_dirs` to include a list of
  65. directories to search for C/C++ header files your code might be
  66. dependent on.
  67. See ``my_package/setup.py``:
  68. .. literalinclude:: my_package/setup.py
  69. :language: python
  70. The ``.pyx`` files is going to be the place where Cython code
  71. implementing the extension will be, in the example below the function
  72. will return a sum of all integers inside the carray.
  73. See ``my_package/my_extension/example_ext.pyx``
  74. Keep in mind that carrays are great for sequential access, but random
  75. access will highly likely trigger decompression of a different chunk
  76. for each randomly accessed value.
  77. For more information about Cython visit http://docs.cython.org/index.html
  78. .. literalinclude:: my_package/my_extension/example_ext.pyx
  79. :language: python
  80. Let's test our extension:
  81. >>> import bcolz
  82. >>> import my_extension.example_ext as my_mod
  83. >>> c = bcolz.carray([i for i in range(1000)], dtype='i8')
  84. >>> my_mod.my_function(c)
  85. 499500