Tag Archives: hbase

Using Jython as a CLI frontend to HBase

HBase, the well known non-relational distributed database, comes with a console program to perform various operations on a HBase cluster. I’ve personally found this tool to be a bit limited and I’ve toyed around the idea of writing my own. Since HBase only comes with a Java driver for direct access and the various RPC interfaces such as Thrift don’t offer the full set of functions over HBase, I decided to go for Jython and to directly use the Java API. This article will show a mock-up of such a tool.

The idea is to provide a simple Python API over the HBase one and couple it with a Python interpreter. This means, it offers the possibility to perform any Python (well Jython) operations whilst operating on HBase itself with an easier API than the Java one.

Note also that the tool uses the WSPBus already described in an earlier article to control the process itself. You will therefore need CherryPy’s latest revision.

# -*- coding: utf-8 -*-
import sys
import os
import code
import readline
import rlcompleter
 
from org.apache.hadoop.hbase import HBaseConfiguration, \
     HTableDescriptor, HColumnDescriptor
from org.apache.hadoop.hbase.client import HBaseAdmin, \
     HTable, Put, Get, Scan
 
import logging
from logging import handlers
 
from cherrypy.process import wspbus
from cherrypy.process import plugins
 
class StaveBus(wspbus.Bus):
    def __init__(self):
        wspbus.Bus.__init__(self)
        self.open_logger()
        self.subscribe("log", self._log)
 
        sig = plugins.SignalHandler(self)
        if sys.platform[:4] == 'java':
            del sig.handlers['SIGUSR1']
            sig.handlers['SIGUSR2'] = self.graceful
            self.log("SIGUSR1 cannot be set on the JVM platform. Using SIGUSR2 instead.")
 
            # See http://bugs.jython.org/issue1313
            sig.handlers['SIGINT'] = self._jython_handle_SIGINT
        sig.subscribe()
 
    def exit(self):
        wspbus.Bus.exit(self)
        self.close_logger()
 
    def open_logger(self, name=""):
        logger = logging.getLogger(name)
        logger.setLevel(logging.INFO)
        h = logging.StreamHandler(sys.stdout)
        h.setLevel(logging.INFO)
        h.setFormatter(logging.Formatter("[%(asctime)s] %(name)s - %(levelname)s - %(message)s"))
        logger.addHandler(h)
 
        self.logger = logger
 
    def close_logger(self):
        for handler in self.logger.handlers:
            handler.flush()
            handler.close()
 
    def _log(self, msg="", level=logging.INFO):
        self.logger.log(level, msg)
 
    def _jython_handle_SIGINT(self, signum=None, frame=None):
        # See http://bugs.jython.org/issue1313
        self.log('Keyboard Interrupt: shutting down bus')
        self.exit()
 
class HbaseConsolePlugin(plugins.SimplePlugin):
    def __init__(self, bus):
        plugins.SimplePlugin.__init__(self, bus)
        self.console = HbaseConsole()
 
    def start(self):
        self.console.setup()
        self.console.run()
 
class HbaseConsole(object):
    def __init__(self):
        # we provide this instance to the underlying interpreter
        # as the interface to operate on HBase
        self.namespace = {'c': HbaseCommand()}
 
    def setup(self):
        readline.set_completer(rlcompleter.Completer(self.namespace).complete)
        readline.parse_and_bind("tab:complete")
        import user
 
    def run(self):
        code.interact(local=self.namespace)
 
class HbaseCommand(object):
    def __init__(self, conf=None, admin=None):
        self.conf = conf
        if not conf:
            self.conf = HBaseConfiguration()
        self.admin = admin
        if not admin:
            self.admin = HBaseAdmin(self.conf)
 
    def table(self, name):
        return HTableCommand(name, self.conf, self.admin)
 
    def list_tables(self):
        return self.admin.listTables().tolist()
 
class HTableCommand(object):
    def __init__(self, name, conf, admin):
        self.conf = conf
        self.admin = admin
        self.name = name
        self._table = None
 
    def row(self, name):
        if not self._table:
            self._table = HTable(self.conf, self.name)
        return HRowCommand(self._table, name)
 
    def create(self, families=None):
        desc = HTableDescriptor(self.name)
        if families:
            for family in families:
                desc.addFamily(HColumnDescriptor(family))
        self.admin.createTable(desc)
        self._table = HTable(self.conf, self.name)
        return self._table
 
    def scan(self, start_row=None, end_row=None, filter=None):
        if not self._table:
            self._table = HTable(self.conf, self.name)
 
        sc = None
        if start_row and filter:
            sc = Scan(start_row, filter)
        elif start_row and end_row:
            sc = Scan(start_row, end_row)
        elif start_row:
            sc = Scan(start_row)
        else:
            sc = Scan()
        s = self._table.getScanner(sc)
        while True:
            r = s.next()
            if r is None:
                raise StopIteration()
 
            yield r
 
    def delete(self):
        self.disable()
        self.admin.deleteTable(self.name)
 
    def disable(self):
        self.admin.disableTable(self.name)
 
    def enable(self):
        self.admin.enableTable(self.name)
 
    def exists(self):
        return self.admin.tableExists(self.name)
 
    def list_families(self):
        desc = HTableDescriptor(self.name)
        return desc.getColumnFamilies()
 
class HRowCommand(object):
    def __init__(self, table, rowname):
        self.table = table
        self.rowname = rowname
 
    def put(self, family, column, value):
        p = Put(self.rowname)
        p.add(family, column, value)
        self.table.put(p)
 
    def get(self, family, column):
        r = self.table.get(Get(self.rowname))
        v = r.getValue(family, column)
        if v is not None:
            return v.tostring()
 
 
if __name__ == '__main__':
    bus = StaveBus()
    HbaseConsolePlugin(bus).subscribe()
    bus.start()
    bus.block()

To test the tool, you can simply grab the latest copy of HBase and run:

hbase-0.20.4$ ./bin/start-hbase.sh

Then you need to configure your classpath so that it includes all the HBase dependencies. To determine them:

$ ps auwx|grep java|grep org.apache.hadoop.hbase.master.HMaster|perl -pi -e "s/.*classpath //"

Copy the full list of jars and export CLASSPATH with it. (This is from the HBase wiki on Jython and HBase).

Next you have to add an extra jar to the classpath so that Jython supports readline:

$ export CLASSPATH=$CLASSPATH:$HOME/jython2.5.1/extlibs/libreadline-java-0.8.jar

Make sure you’ll install libreadline-java as well.

Now, that your environment is setup, save the code above under a script named stave.py and run it as follow:

$ jython stave.py
Python 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) 
[Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_20
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> c.table('myTable').create(families=['aFamily:'])
>>> c.table('myTable').list_families()
array(org.apache.hadoop.hbase.HColumnDescriptor)
>>> c.table('myTable').row('aRow').put('aFamily', 'aColumn', 'hello world!')
>>> c.table('myTable').row('aRow').get('aFamily', 'aColumn')
'hello world!'
>>> list(c.table('myTable').scan())
[keyvalues={aRow/aFamily:aColumn/1277645421824/Put/vlen=12}]

You can import any Python module available to your Jython environment as well of course.

I will probably extend this tool over time but in the meantime I hope you’ll find it a useful canvas to operate HBase.