Scala SBT

I am having a good time with sbt while studying Scala, comparing with my horrible Maven experience a while back.

Here is a project that I have created to do some basic staff with joda date library using sbt.

To learn more sbt, of course,

1. RTFM

2. For the lazy people:

HBase – A few things about the HBase shell.

There are a lot of work you can do inside the HBase shell. You can list the tables, you can get and put records ..etc.

**************** DESCRIBE ****************

hbase(main):003:0> describe ‘a59347_tco’
Table mykickasstable is ENABLED
mykickasstable
COLUMN FAMILIES DESCRIPTION
{
NAME => ‘OTHER’,
DATA_BLOCK_ENCODING => ‘NONE’,
BLOOMFILTER => ‘NONE’,
REPLICATION_SCOPE => ‘0’,
VERSIONS => ‘3’,
COMPRESSION => ‘NONE’,
MIN_VERSIONS => ‘0’,
TTL => ‘FOREVER’,
KEEP_DELETED_CELLS => ‘FALSE’,
BLOCKSIZE => ‘65536’,
IN_MEMORY => ‘false’,
BLOCKCACHE => ‘false’
}

TTL is short for Time To Live, `FOREVER` means the data you put in will never expire. It will be a great functionality if you have some use cases where always want to keep a certain amount of data like ‘only store 1 year of data’. In that case, you can probably set the TTL to be one year and it will automatically delete the records after it expired.

BLOCKSIZE is 64MB as the default blocksize for HDFS.

You can also use status command to check the running condition of your hBase cluster, it will return something like this:

hbase(main):016:0> status ‘simple’
8 live servers
server16.datafireball.com:60020 1434485581215
requestsPerSecond=0.0,
numberOfOnlineRegions=4,
usedHeapMB=346,
maxHeapMB=1583,
numberOfStores=7,
numberOfStorefiles=9,
storefileUncompressedSizeMB=12530,
storefileSizeMB=12535,
compressionRatio=1.0004,
memstoreSizeMB=0,
storefileIndexSizeMB=0,
readRequestsCount=14382,
writeRequestsCount=0,
rootIndexSizeKB=126,
totalStaticIndexSizeKB=12657,
totalStaticBloomSizeKB=13420,
totalCompactingKVs=0,
currentCompactedKVs=0,
compactionProgressPct=NaN,
coprocessors=[]

We can see there are plenty of parameters you can refer to help you understanding the running condition of your cluster, understand what they mean will be a long process but super helpful as a big data system admin.