Archive for the ‘headstock’ Category

Jun 27

Using Jython as a CLI frontend to HBase

Posted by Sylvain Hellegouarch in hbase, headstock, jython, python

HBase, the well known non-relational distributed database, comes with a console program to perform various operations on a HBase cluster. I’ve personally found this tool to be a bit limited and I’ve toyed around the idea of writing my own. Since HBase only comes with a Java driver for direct access and the various RPC interfaces such as Thrift don’t offer the full set of functions over HBase, I decided to go for Jython and to directly use the Java API. This article will show a mock-up of such a tool.

The idea is to provide a simple Python API over the HBase one and couple it with a Python interpreter. This means, it offers the possibility to perform any Python (well Jython) operations whilst operating on HBase itself with an easier API than the Java one.

Note also that the tool uses the WSPBus already described in an earlier article to control the process itself. You will therefore need CherryPy’s latest revision.

# -*- coding: utf-8 -*-
import sys
import os
import code
import readline
import rlcompleter
 
from org.apache.hadoop.hbase import HBaseConfiguration, \
     HTableDescriptor, HColumnDescriptor
from org.apache.hadoop.hbase.client import HBaseAdmin, \
     HTable, Put, Get, Scan
 
import logging
from logging import handlers
 
from cherrypy.process import wspbus
from cherrypy.process import plugins
 
class StaveBus(wspbus.Bus):
    def __init__(self):
        wspbus.Bus.__init__(self)
        self.open_logger()
        self.subscribe("log", self._log)
 
        sig = plugins.SignalHandler(self)
        if sys.platform[:4] == 'java':
            del sig.handlers['SIGUSR1']
            sig.handlers['SIGUSR2'] = self.graceful
            self.log("SIGUSR1 cannot be set on the JVM platform. Using SIGUSR2 instead.")
 
            # See http://bugs.jython.org/issue1313
            sig.handlers['SIGINT'] = self._jython_handle_SIGINT
        sig.subscribe()
 
    def exit(self):
        wspbus.Bus.exit(self)
        self.close_logger()
 
    def open_logger(self, name=""):
        logger = logging.getLogger(name)
        logger.setLevel(logging.INFO)
        h = logging.StreamHandler(sys.stdout)
        h.setLevel(logging.INFO)
        h.setFormatter(logging.Formatter("[%(asctime)s] %(name)s - %(levelname)s - %(message)s"))
        logger.addHandler(h)
 
        self.logger = logger
 
    def close_logger(self):
        for handler in self.logger.handlers:
            handler.flush()
            handler.close()
 
    def _log(self, msg="", level=logging.INFO):
        self.logger.log(level, msg)
 
    def _jython_handle_SIGINT(self, signum=None, frame=None):
        # See http://bugs.jython.org/issue1313
        self.log('Keyboard Interrupt: shutting down bus')
        self.exit()
 
class HbaseConsolePlugin(plugins.SimplePlugin):
    def __init__(self, bus):
        plugins.SimplePlugin.__init__(self, bus)
        self.console = HbaseConsole()
 
    def start(self):
        self.console.setup()
        self.console.run()
 
class HbaseConsole(object):
    def __init__(self):
        # we provide this instance to the underlying interpreter
        # as the interface to operate on HBase
        self.namespace = {'c': HbaseCommand()}
 
    def setup(self):
        readline.set_completer(rlcompleter.Completer(self.namespace).complete)
        readline.parse_and_bind("tab:complete")
        import user
 
    def run(self):
        code.interact(local=self.namespace)
 
class HbaseCommand(object):
    def __init__(self, conf=None, admin=None):
        self.conf = conf
        if not conf:
            self.conf = HBaseConfiguration()
        self.admin = admin
        if not admin:
            self.admin = HBaseAdmin(self.conf)
 
    def table(self, name):
        return HTableCommand(name, self.conf, self.admin)
 
    def list_tables(self):
        return self.admin.listTables().tolist()
 
class HTableCommand(object):
    def __init__(self, name, conf, admin):
        self.conf = conf
        self.admin = admin
        self.name = name
        self._table = None
 
    def row(self, name):
        if not self._table:
            self._table = HTable(self.conf, self.name)
        return HRowCommand(self._table, name)
 
    def create(self, families=None):
        desc = HTableDescriptor(self.name)
        if families:
            for family in families:
                desc.addFamily(HColumnDescriptor(family))
        self.admin.createTable(desc)
        self._table = HTable(self.conf, self.name)
        return self._table
 
    def scan(self, start_row=None, end_row=None, filter=None):
        if not self._table:
            self._table = HTable(self.conf, self.name)
 
        sc = None
        if start_row and filter:
            sc = Scan(start_row, filter)
        elif start_row and end_row:
            sc = Scan(start_row, end_row)
        elif start_row:
            sc = Scan(start_row)
        else:
            sc = Scan()
        s = self._table.getScanner(sc)
        while True:
            r = s.next()
            if r is None:
                raise StopIteration()
 
            yield r
 
    def delete(self):
        self.disable()
        self.admin.deleteTable(self.name)
 
    def disable(self):
        self.admin.disableTable(self.name)
 
    def enable(self):
        self.admin.enableTable(self.name)
 
    def exists(self):
        return self.admin.tableExists(self.name)
 
    def list_families(self):
        desc = HTableDescriptor(self.name)
        return desc.getColumnFamilies()
 
class HRowCommand(object):
    def __init__(self, table, rowname):
        self.table = table
        self.rowname = rowname
 
    def put(self, family, column, value):
        p = Put(self.rowname)
        p.add(family, column, value)
        self.table.put(p)
 
    def get(self, family, column):
        r = self.table.get(Get(self.rowname))
        v = r.getValue(family, column)
        if v is not None:
            return v.tostring()
 
 
if __name__ == '__main__':
    bus = StaveBus()
    HbaseConsolePlugin(bus).subscribe()
    bus.start()
    bus.block()

To test the tool, you can simply grab the latest copy of HBase and run:

hbase-0.20.4$ ./bin/start-hbase.sh

Then you need to configure your classpath so that it includes all the HBase dependencies. To determine them:

$ ps auwx|grep java|grep org.apache.hadoop.hbase.master.HMaster|perl -pi -e "s/.*classpath //"

Copy the full list of jars and export CLASSPATH with it. (This is from the HBase wiki on Jython and HBase).

Next you have to add an extra jar to the classpath so that Jython supports readline:

$ export CLASSPATH=$CLASSPATH:$HOME/jython2.5.1/extlibs/libreadline-java-0.8.jar

Make sure you’ll install libreadline-java as well.

Now, that your environment is setup, save the code above under a script named stave.py and run it as follow:

$ jython stave.py
Python 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) 
[Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_20
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> c.table('myTable').create(families=['aFamily:'])
>>> c.table('myTable').list_families()
array(org.apache.hadoop.hbase.HColumnDescriptor)
>>> c.table('myTable').row('aRow').put('aFamily', 'aColumn', 'hello world!')
>>> c.table('myTable').row('aRow').get('aFamily', 'aColumn')
'hello world!'
>>> list(c.table('myTable').scan())
[keyvalues={aRow/aFamily:aColumn/1277645421824/Put/vlen=12}]

You can import any Python module available to your Jython environment as well of course.

I will probably extend this tool over time but in the meantime I hope you’ll find it a useful canvas to operate HBase.

I am glad to announce that IronPython 2 is now capable of running my XMPP
Python library: headstock.

.NET has already an excellent XMPP SDK called agsXMPP that is a native
.NET/C# framework. However I’m a Python developers at heart and I had
started quite a while ago writing my own XMPP library in Python using the
most excellent Kamaelia framework (designed for concurrency).

For a while IronPython had severe shortcomings that prevented it running
simple Kamaelia applications. Today I was able to run a simplechat demo
using a vanilla IP2 on Windows with only one single modification to the
logging module (thanks Seo). To be honest I didn’t expect it to go through
:)

The chat demo is simple enough but means more complex examples using XMPP
PubSub will work as well (they are all based on the same framework).

Now this isn’t production ready or anything. For instance the TLS support
is broken (hopefully something easy enough to fix) so you won’t be able to
connect to Google Talk for now.

Moreover I’m not sure the code is that fast considering how I had to
simulate an incremental XML parser atop System.Xml (this allows for a
XML stream to be parsed without requiring the full document or even
fragment to be read first).

This is a great news for me because it means I’ll be able to move ahead
with more work using IronPython 2.

Dec 01

Geolocalization and microblogging

Posted by Sylvain Hellegouarch in headstock, python, xmpp

I’ve been working recently with Adrian Hornsby who’s been interested in using the microblogging example I had setup to demonstrate headstock, amplee and some ideas about microblogging in general. Today Adrian asked me how to add some gelocalization information to a message flowing through the system. It took me just a couple of hours to implement it so that now you can push a message like: GEO text [lat,long] through your IM client. This will tell the demo to add a georss:point element to the generate atom entry which will eventually lead to a Google map to be displayed in the web page mapping the atom entry.

May 12

headstock: XMPP library based on Kamaelia

Posted by Sylvain Hellegouarch in headstock

I’m pleased to announce the release of the first version of headstock, my XMPP implementation using the Kamaelia library. My main motivation behind working on that project rather than using one of the existing libraries was that I wanted to find a challenging project to work with Kamaelia. XMPP seemed challenging enough.
This first release is flagged as beta. Not production ready at all but offers enough so that you can play with it and actually use it for integrating XMPP into your applications. Documentation and tests are badly missing and they will eventually come but for now you’ll have to go through the code. I know this is not attractive but I though it was worth to be released nonetheless. Enjoy.