# Brief comparison between standalone zlib vs zlib in Blosc 

This notebook compares the performance of a standalone zlib compression library against the one that runs inside Blosc.

The run below has been executed on a machine with a Xeon E3-1240 v3 @ 3.40GHz with 4 physical cores and hyperthreading support.

In [1]:
import numpy as np
import bcolz
from bcolz.utils import human_readable_size

## Store an array in bcolz

In [2]:
# Use a more or less random dataset
np.random.seed(11)
a = np.random.random_integers(0, 1000, 100*1000*1000)
human_readable_size(a.nbytes)

'762.94 MB'

In [3]:
print("bcolz version:", bcolz.__version__)
bcolz.cparams.setdefaults(cname='zlib', clevel=3, shuffle=1)

('bcolz version:', '1.0.1.dev35')


In [4]:
# Create a dataset (compress)
chunks = (100*1000,) # 800 KB cache size
%time ca = bcolz.carray(a, chunklen=chunks[0])
ca

CPU times: user 8.56 s, sys: 2.32 s, total: 10.9 s
Wall time: 1.88 s


carray((100000000,), int64)
 nbytes := 762.94 MB; cbytes := 129.95 MB; ratio: 5.87
 cparams := cparams(clevel=3, shuffle=1, cname='zlib', quantize=0)
 chunklen := 100000; chunksize: 800000; blocksize: 131072
[921 703 80 ..., 506 453 366]

In [5]:
# Get a numpy array (decompress)
%time a2 = ca[:]

CPU times: user 1.6 s, sys: 344 ms, total: 1.95 s
Wall time: 395 ms


Looking at the wall and cpu times, we see that zlib can use multiple threads thanks to Blosc machinery.

## Store the array using zlib in HDF5 in-memory

In [6]:
import h5py
import tempfile
import operator

# Some utilities to use HDF5 files in memory
def h5fmem(**kwargs):
 """Convenience function to create an in-memory HDF5 file."""

 # need a file name even tho nothing is ever written
 fn = tempfile.mktemp()

 # file creation args
 kwargs['mode'] = 'w'
 kwargs['driver'] = 'core'
 kwargs['backing_store'] = False

 # open HDF5 file
 h5f = h5py.File(fn, **kwargs)

 return h5f


def h5d_diagnostics(d):
 """Print some diagnostics on an HDF5 dataset."""
 
 print(d)
 nbytes = reduce(operator.mul, d.shape) * d.dtype.itemsize
 cbytes = d._id.get_storage_size()
 if cbytes > 0:
 ratio = nbytes / cbytes
 else:
 ratio = np.inf
 r = ' compression: %s' % d.compression
 r += '; compression_opts: %s' % d.compression_opts
 r += '; shuffle: %s' % d.shuffle
 r += '\n nbytes: %s' % human_readable_size(nbytes)
 r += '; nbytes_stored: %s' % human_readable_size(cbytes)
 r += '; ratio: %.1f' % ratio
 r += '; chunks: %s' % str(d.chunks)
 print(r)
 

In [7]:
print("h5py version:", h5py.__version__)
h5f = h5fmem()
h5f

('h5py version:', '2.6.0')




In [8]:
# Create a dataset (compress)
%time ha = h5f.create_dataset('h1', data=a, chunks=chunks, compression='gzip', compression_opts=3, shuffle=True)
h5d_diagnostics(ha)

CPU times: user 6.48 s, sys: 72 ms, total: 6.55 s
Wall time: 6.56 s

 compression: gzip; compression_opts: 3; shuffle: True
 nbytes: 762.94 MB; nbytes_stored: 128.66 MB; ratio: 5.0; chunks: (100000,)


In [9]:
# Get a numpy array (decompress)
%time a2 = ha[:]

CPU times: user 1.09 s, sys: 200 ms, total: 1.29 s
Wall time: 1.29 s


Here we see that the zlib filter (named 'gzip' in HDF5) only use a single thread (similar wall and cpu times). 

## Using zarr

In [10]:
import zarr
print('zarr version', zarr.__version__)

('zarr version', '1.0.1.dev11+dirty')


In [11]:
# Create a dataset (compress)
%time za = zarr.array(a, chunks=chunks, compression="blosc", compression_opts=dict(cname='zlib', clevel=3))
za

CPU times: user 6.96 s, sys: 1.48 s, total: 8.44 s
Wall time: 2.5 s


zarr.core.Array((100000000,), int64, chunks=(100000,), order=C)
 compression: blosc; compression_opts: {u'cname': u'zlib', u'shuffle': 1, u'clevel': 3}
 nbytes: 762.9M; nbytes_stored: 129.2M; ratio: 5.9; initialized: 1000/1000
 store: __builtin__.dict

In [12]:
# Get a numpy array (decompress)
%time a2 = za[:]

CPU times: user 1.43 s, sys: 316 ms, total: 1.74 s
Wall time: 680 ms


We see how zarr can also use zlib in multithreaded mode too (via Blosc).

## Discussion

The zlib library integrated in Blosc can make use of its multithreading machinery, giving to better performance.

On the other hand, it is worth noting that Blosc splits chunks in internal blocks (in this case, 128 KB, which fits in L2 comfortably) in order to make better use of the caches, so making the compression/decompression faster too. You can notice that this does not affect compression ratios a lot.

Finally, bcolz seems to compress/decompress noticeably faster than zarr in this case. This is probably due to zarr only using 4 threads internally, whereas bcolz can use the full 8 (logical processors). That means that Intel hyper-threading implementation can be used for a good advanatge here (1.3x faster in compression and up to 1.7x faster in decompression).