{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial on ctable objects\n",
    "[Go to tutorials´ index](tutorials.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='go to index'></a>\n",
    "Index:\n",
    "  1. <a href='#Creating a ctable'>Creating a ctable</a>\n",
    "  -  <a href='#Accessing and setting rows'>Accessing and setting rows</a>\n",
    "  -  <a href='#Adding and deleting columns'>Adding and deleting columns</a>\n",
    "  - <a href='#Iterating over ctable data'>Iterating over ctable data</a>\n",
    "  - <a href='#Iterating over the output of conditions along columns'>Iterating over the output of conditions along columns</a>\n",
    "  - <a href='#Performing operations on ctable columns'>Performing operations on ctable columns</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The bcolz package comes with a handy object that arranges data by\n",
    "column (and not by row, as in NumPy's structured arrays).  This allows\n",
    "for much better performance for walking tabular data by column and\n",
    "also for adding and deleting columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\n",
      "bcolz version:     1.1.1.dev13+dirty\n",
      "bcolz git info:    1.1.0-15-g6565371\n",
      "NumPy version:     1.11.0\n",
      "Blosc version:     1.9.2 ($Date:: 2016-06-08 #$)\n",
      "Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']\n",
      "Numexpr version:   2.6.1.dev0\n",
      "Dask version:      0.9.0\n",
      "Python version:    2.7.12 |Continuum Analytics, Inc.| (default, Jun 29 2016, 11:08:50) \n",
      "[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n",
      "Platform:          linux2-x86_64\n",
      "Byte-ordering:     little\n",
      "Detected cores:    4\n",
      "-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\n"
     ]
    }
   ],
   "source": [
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np\n",
    "import bcolz\n",
    "\n",
    "bcolz.print_versions()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Clear mydir, needed in case you run this tutorial multiple times\n",
    "!rm -rf mydir\n",
    "!mkdir mydir"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Creating a ctable'></a>\n",
    "## Creating a ctable\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can build ctable objects in many different ways, but perhaps the\n",
    "easiest one is using the `fromiter` constructor:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((1000000,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 11.44 MB; cbytes: 2.30 MB; ratio: 4.97\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "[(0, 0.0) (1, 1.0) (2, 4.0) ..., (999997, 999994000009.0)\n",
       " (999998, 999996000004.0) (999999, 999998000001.0)]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "N = int(1e6)\n",
    "ct = bcolz.fromiter(((i,i*i) for i in xrange(N)), dtype=\"i4,f8\", count=N)\n",
    "ct"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exactly the same as in a regular carray, a ctable can be stored to disk as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((1000000,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 11.44 MB; cbytes: 2.30 MB; ratio: 4.97\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "  rootdir := 'mydir/ct_disk'\n",
       "[(0, 0.0) (1, 1.0) (2, 4.0) ..., (999997, 999994000009.0)\n",
       " (999998, 999996000004.0) (999999, 999998000001.0)]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct_disk = bcolz.fromiter(((i,i*i) for i in xrange(N)), dtype=\"i4,f8\", count=N, rootdir=\"mydir/ct_disk\")\n",
    "ct_disk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** If you wish to create an empty ctable and append data afterwards, this is posible using `bzolz.zeros` indicating zero length (albeit this is significantly slower).  If you prefer to do that, we encourage you to use the `with` statement for this, it will take care of flushing data to disk once you are done appending data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((20000,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 234.38 KB; cbytes: 68.68 KB; ratio: 3.41\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "  rootdir := 'mydir/ct_disk2'\n",
       "[(0, 0.0) (1, 1.0) (2, 4.0) ..., (19997, 399880009.0) (19998, 399920004.0)\n",
       " (19999, 399960001.0)]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with bcolz.zeros(0, dtype=\"i4,f8\", rootdir=\"mydir/ct_disk2\") as ct_disk2:\n",
    "    for i in range(20000):\n",
    "        ct_disk2.append((i, i**2))\n",
    "ct_disk2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Accessing and setting rows'></a>\n",
    "## Accessing and setting rows\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ctable object supports the most common indexing operations in\n",
    "NumPy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1, 1.0)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.void"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(ct[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(1, 1.0), (2, 4.0), (3, 9.0), (4, 16.0), (5, 25.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[1:6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first thing to have in mind is that, similarly to `carray`\n",
    "objects, the result of an indexing operation is a native NumPy object\n",
    "(in the case above a scalar and a structured array).\n",
    "\n",
    "Fancy indexing is also supported:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(1, 1.0), (6, 36.0), (13, 169.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[[1,6,13]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can even pass complex boolean expressions as an index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(1, 1.0), (2, 4.0), (3, 9.0)], \n",
       "      dtype=(numpy.record, [('f0', '<i4'), ('f1', '<f8')]))"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[\"(f0>0) & (f1<10)\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that conditions over columns are expressed as string expressions\n",
    "(in order to use either Numexpr or NumPy under the hood), and that the column names\n",
    "are understood correctly.\n",
    "\n",
    "Setting rows is also supported:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((1000000,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 11.44 MB; cbytes: 2.30 MB; ratio: 4.97\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "[(0, 0.0) (0, 0.0) (2, 4.0) ..., (999997, 999994000009.0)\n",
       " (999998, 999996000004.0) (999999, 999998000001.0)]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[1] = (0,0)\n",
    "ct"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(0, 0.0), (2, 4.0), (3, 9.0), (4, 16.0), (5, 25.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[1:6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And in combination with fancy indexing too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(1, 1.0), (1, 1.0), (1, 1.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[[1,6,13]] = (1,1)\n",
    "ct[[1,6,13]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(2, 2.0), (2, 2.0), (2, 2.0), (2, 2.0), (4, 16.0), (5, 25.0),\n",
       "       (2, 2.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct[\"(f0>=0) & (f1<10)\"] = (2,2)\n",
    "ct[:7]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you may have noticed, fancy indexing in combination with conditions\n",
    "is a very powerful feature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Adding and deleting columns'></a>\n",
    "## Adding and deleting columns\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Adding and deleting columns is easy and, due to the column-wise data\n",
    "arrangement, very efficient.  Let's add a new column on an existing\n",
    "ctable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ct = bcolz.fromiter(((i,i*i) for i in xrange(N)), dtype=\"i4,f8\", count=N)\n",
    "new_col = np.linspace(0, 1, N)\n",
    "ct.addcol(new_col)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, remove the already existing 'f1' column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((1000000,), [('f0', '<i4'), ('f2', '<f8')])\n",
       "  nbytes: 11.44 MB; cbytes: 2.29 MB; ratio: 4.99\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "[(0, 0.0) (1, 1.000001000001e-06) (2, 2.000002000002e-06) ...,\n",
       " (999997, 0.9999979999979999) (999998, 0.9999989999989999) (999999, 1.0)]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.delcol('f1')\n",
    "ct"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As said, adding and deleting columns is very cheap (just adding or deleting keys in a Python dict), so don't be afraid of using this feature as much as you like."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Iterating over ctable data'></a>\n",
    "## Iterating over ctable data\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can make use of the `iter()` method in order to easily iterate\n",
    "over the values of a ctable.  `iter()` has support for start, stop and\n",
    "step parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[row(f0=1, f2=1.000001000001e-06),\n",
       " row(f0=4, f2=4.000004000004e-06),\n",
       " row(f0=7, f2=7.000007000007e-06)]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t = bcolz.fromiter(((i,i*i) for i in xrange(N)), dtype=\"i4,f8\", count=N)\n",
    "[row for row in ct.iter(1,10,3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how the data is returned as `namedtuple` objects of type\n",
    "``row``.  This allows you to iterate the fields more easily by using\n",
    "field names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(1, 1.000001000001e-06), (4, 4.000004000004e-06), (7, 7.000007000007e-06)]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[(f0,f1) for f0,f1 in ct.iter(1,10,3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also use the ``[:]`` accessor to get rid of the ``row``\n",
    "namedtuple, and return just bare tuples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(1, 1.000001000001e-06), (4, 4.000004000004e-06), (7, 7.000007000007e-06)]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[row[:] for row in ct.iter(1,10,3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Also, you can select specific fields to be read via the `outcols`\n",
    "parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[row(f0=1), row(f0=4), row(f0=7)]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[row for row in ct.iter(1,10,3, outcols='f0')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(1, 1), (4, 4), (7, 7)]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[(nr,f0) for nr,f0 in ct.iter(1,10,3, outcols='nrow__, f0')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Please note the use of the special 'nrow__' label for referring to\n",
    "the current row."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Iterating over the output of conditions along columns'></a>\n",
    "## Iterating over the output of conditions along columns\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the most powerful capabilities of the ctable is the ability to\n",
    "iterate over the rows whose fields fulfill certain conditions (without\n",
    "the need to put the results in a NumPy container, as described in the\n",
    "previous section).  This can be very useful for performing operations \n",
    "on very large ctables without consuming lots of storage space.\n",
    "\n",
    "Here it is an example of use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[row(f0=1, f1=1.0), row(f0=2, f1=4.0), row(f0=3, f1=9.0)]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct = bcolz.fromiter(((i,i*i) for i in xrange(N)), dtype=\"i4,f8\", count=N)\n",
    "[row for row in ct.where(\"(f0>0) & (f1<10)\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And by using the `outcols` parameter, you can specify the fields that\n",
    "you want to be returned:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[row(f1=1.0), row(f1=4.0), row(f1=9.0)]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[row for row in ct.where(\"(f0>0) & (f1<10)\", outcols=\"f1\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can even specify the row number fulfilling the condition:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(1.0, 1), (4.0, 2), (9.0, 3)]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[(f1,nr) for f1,nr in ct.where(\"(f0>0) & (f1<10)\", outcols=\"f1, nrow__\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also iterate so that you get blocks of results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[array([(1, 1.0), (2, 4.0), (3, 9.0), ..., (32766, 1073610756.0),\n",
       "        (32767, 1073676289.0), (32768, 1073741824.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(32769, 1073807361.0), (32770, 1073872900.0), (32771, 1073938441.0),\n",
       "        ..., (65534, 4294705156.0), (65535, 4294836225.0),\n",
       "        (65536, 4294967296.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(65537, 4295098369.0), (65538, 4295229444.0), (65539, 4295360521.0),\n",
       "        ..., (70708, 4999621264.0), (70709, 4999762681.0),\n",
       "        (70710, 4999904100.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')])]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[br for br in ct.whereblocks(\"(f0>0) & (f1<5e9)\")]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, three blocks of a maximum length of 32768 have been returned.  You can also specify your own block length via the `blen` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[array([(1, 1.0), (2, 4.0), (3, 9.0), ..., (14998, 224940004.0),\n",
       "        (14999, 224970001.0), (15000, 225000000.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(15001, 225030001.0), (15002, 225060004.0), (15003, 225090009.0),\n",
       "        ..., (29998, 899880004.0), (29999, 899940001.0),\n",
       "        (30000, 900000000.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(30001, 900060001.0), (30002, 900120004.0), (30003, 900180009.0),\n",
       "        ..., (44998, 2024820004.0), (44999, 2024910001.0),\n",
       "        (45000, 2025000000.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(45001, 2025090001.0), (45002, 2025180004.0), (45003, 2025270009.0),\n",
       "        ..., (59998, 3599760004.0), (59999, 3599880001.0),\n",
       "        (60000, 3600000000.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')]),\n",
       " array([(60001, 3600120001.0), (60002, 3600240004.0), (60003, 3600360009.0),\n",
       "        ..., (70708, 4999621264.0), (70709, 4999762681.0),\n",
       "        (70710, 4999904100.0)], \n",
       "       dtype=[('f0', '<i4'), ('f1', '<f8')])]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[br for br in ct.whereblocks(\"(f0>0) & (f1<5e9)\", blen=15000)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='Performing operations on ctable columns'></a>\n",
    "## Performing operations on ctable columns\n",
    "<a href='#go to index'>Go to index</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ctable object also wears an `eval()` method, this method is \n",
    "handy for carrying out operations among columns.\n",
    "\n",
    "The best way to illustrate the point would be to squeeze out an example, here we go:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(-0.7076921035197548)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.eval(\"cos((3+f0)/sqrt(2*f1))\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, one can see an exception in ctable methods behaviour: the\n",
    "resulting output is a ctable, and not a NumPy structured array.  \n",
    "This was designed like this because the output of `eval()` has \n",
    "the same length than the ctable, and thus it can be pretty large, \n",
    "so compression maybe of help to reduce its storage needs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, if you are already dealing with large ctables, and you expect the output to be large too, it is always possible to store the result on a ctable that lives on-disk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(-0.7076921035197548)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.eval(\"cos((3+f0)/sqrt(2*f1))\", rootdir=\"mydir/ct_disk3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, if what you want is having a numpy structured array as output, you can always specify that via the `out_flavor` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(-0.7076921035197548)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.eval(\"cos((3+f0)/sqrt(2*f1))\", out_flavor=\"numpy\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fetching data based on conditions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, there is a powerful way to get data that you are interested in while using conditions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((70710,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 828.63 KB; cbytes: 184.79 KB; ratio: 4.48\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "[(1, 1.0) (2, 4.0) (3, 9.0) ..., (70708, 4999621264.0)\n",
       " (70709, 4999762681.0) (70710, 4999904100.0)]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.fetchwhere(\"(f0>0) & (f1<5e9)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And you can skip the first rows fulfilling the condition and limit the total amount to returned too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ctable((2000,), [('f0', '<i4'), ('f1', '<f8')])\n",
       "  nbytes: 23.44 KB; cbytes: 32.00 KB; ratio: 0.73\n",
       "  cparams := cparams(clevel=5, shuffle=1, cname='lz4', quantize=0)\n",
       "[(10001, 100020001.0) (10002, 100040004.0) (10003, 100060009.0) ...,\n",
       " (11998, 143952004.0) (11999, 143976001.0) (12000, 144000000.0)]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.fetchwhere(\"(f0>0) & (f1<5e9)\", skip=10000, limit=2000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or get a NumPy array too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(10001, 100020001.0), (10002, 100040004.0), (10003, 100060009.0),\n",
       "       ..., (11998, 143952004.0), (11999, 143976001.0),\n",
       "       (12000, 144000000.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.fetchwhere(\"(f0>0) & (f1<5e9)\", skip=10000, limit=2000, out_flavor=\"numpy\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although perhaps using default contexts is a more elegant way to do the same thing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(10001, 100020001.0), (10002, 100040004.0), (10003, 100060009.0),\n",
       "       ..., (11998, 143952004.0), (11999, 143976001.0),\n",
       "       (12000, 144000000.0)], \n",
       "      dtype=[('f0', '<i4'), ('f1', '<f8')])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with bcolz.defaults_ctx(out_flavor=\"numpy\"):\n",
    "    out = ct.fetchwhere(\"(f0>0) & (f1<5e9)\", skip=10000, limit=2000)\n",
    "out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## That's all folks!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's all for this tutorial section.  Now you should have a look at the [reference section](http://bcolz.blosc.org/reference.html) so as to grasp all the functionality that bcolz is offering to you.  In general, ctable objects inherits most of the properties of carrays, so make sure that you master all the weaponery in carrays before getting too in deep into ctables."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}