http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/optimize.html
HBase – A few things about the HBase shell.
There are a lot of work you can do inside the HBase shell. You can list the tables, you can get and put records ..etc.
**************** DESCRIBE ****************
hbase(main):003:0> describe ‘a59347_tco’
Table mykickasstable is ENABLED
mykickasstable
COLUMN FAMILIES DESCRIPTION
{
NAME => ‘OTHER’,
DATA_BLOCK_ENCODING => ‘NONE’,
BLOOMFILTER => ‘NONE’,
REPLICATION_SCOPE => ‘0’,
VERSIONS => ‘3’,
COMPRESSION => ‘NONE’,
MIN_VERSIONS => ‘0’,
TTL => ‘FOREVER’,
KEEP_DELETED_CELLS => ‘FALSE’,
BLOCKSIZE => ‘65536’,
IN_MEMORY => ‘false’,
BLOCKCACHE => ‘false’
}
…
TTL is short for Time To Live, `FOREVER` means the data you put in will never expire. It will be a great functionality if you have some use cases where always want to keep a certain amount of data like ‘only store 1 year of data’. In that case, you can probably set the TTL to be one year and it will automatically delete the records after it expired.
BLOCKSIZE is 64MB as the default blocksize for HDFS.
You can also use status command to check the running condition of your hBase cluster, it will return something like this:
hbase(main):016:0> status ‘simple’
8 live servers
server16.datafireball.com:60020 1434485581215
requestsPerSecond=0.0,
numberOfOnlineRegions=4,
usedHeapMB=346,
maxHeapMB=1583,
numberOfStores=7,
numberOfStorefiles=9,
storefileUncompressedSizeMB=12530,
storefileSizeMB=12535,
compressionRatio=1.0004,
memstoreSizeMB=0,
storefileIndexSizeMB=0,
readRequestsCount=14382,
writeRequestsCount=0,
rootIndexSizeKB=126,
totalStaticIndexSizeKB=12657,
totalStaticBloomSizeKB=13420,
totalCompactingKVs=0,
currentCompactedKVs=0,
compactionProgressPct=NaN,
coprocessors=[]
We can see there are plenty of parameters you can refer to help you understanding the running condition of your cluster, understand what they mean will be a long process but super helpful as a big data system admin.
Practical Business Python
Here is a great blog that shows some practical business using Python.
This post is about how to use Python to read from Google Form where I can image it could be a good resource for crowd sourcing.
Link here
SHA1 and Hashcat
SHA1: Secure Hash Algorithm 1
Salt: Randomly generated number, “the password of password”
hashcat: a free password recovery tool that comes with Kali Linux.
The latest time I saw SHA1 is with iPython. You can secure your IPython server by adding a password, where you can generate using the passwd() function and store that into your config file, more.
The generated hash is supposed to be in the format of `hash_algorithm:salt:passphrase_hash`. And we can see the salt is 12 characters long and the passphrase hash is 40 characters long.
Then I start thinking, can I use hashcat to recover my passcode if I forget the password? I first pass the hash code to hashid, which is an application that can give you a best guess which type of hash method the target is encrypted in. After I stripped off the salt, the hashid recognizes it should be SHA1, which is exactly the hash type how it was generated.
Then the next step is how to use hashcat to recover the code.
How hashcat works is you have to provision a list of passwords and a set of rules that hashcat need to follow, Then hashtag will leverage the computing power of GPU to quickly recover the password if the combo of initial list and rules will cover the target. To learn more about hashcat, here is a decent tutorial to get you started.
For the POC, I will just provide a list containing the password `datafireball` and use the straight attach mode.
Based on the documentation about hashcat here, I think salt:pass should be matched to 120, or at least one of 110, 120, 130 and 140. However, neither of them works and they all prompted the error: separator unmatched.
The interesting thing is after I switched the order of salt and phrase_hash, then the hashcat works using mode 110 (sha1($pass.$salt)).
Anyway, it is a fun time getting to know hashcat and sha1. Still need to figure out how the hashcode is generated using salt with password and looking forward knowing more about Kali Linux.
PySpark – Anomaly Detection
Mohammad Fawad Alam from SAS wrote a iPython notebook to analyze the server log using pyspark.
His code is fairly clean and he also mentioned a few things that I have never heard of before.
PyMOTW – Python Module of the Week
Have you ever realized there are tons of build in libraries come with Python? Have you realized that you only happen to know very few of them? Have you ever wanted to learn more about the rest but it turned out that the documentation is meaningless because you are a “handy” people because you want to see examples?
Doug has a website called pymotw.com/2 where you can find the code examples for rarely used libraries, like this one about shlex (simple lexical analysis).
If you feel better holding a book in your hand, here is a book that you can buy.
Tutorialspoint – simply easy learning
Today I came across a Python library that I need to play around and learn how to use it, however, I am using my GF’s Windows desktop where Python interpreter is not installed at all, let alone any IDE.
That is where I came across Tutorialspoint. And they provided the functionality of online terminal and online IDE, which is called codingground
In my point of view, if you are really new to a certain area or programming language, this might be the easiest way to start writing your hello world example and gain some confidence 🙂
Python – Learn Python by reading pyspark’s source code
Here is the RDD module in pyspark
import copy
import operator
import shlex
import warnings
import heapq
import bisect
from functools import reduce
from itertools impoprt imap as map, ifilter as filter
Hash SHA – Secure Hash Algorithm




