Tensorflow – “Wide” tutorial

I have not visited Tensorflow for quite a while and recently had a use case where I need to do some classification and want to give deep learning a whirl. I am surprised to find the number of tutorials/examples that have been added in the past few month. Today, I am going to give the “Tensorflow Linear Model Tutorial” an overhaul and carefully study the functions that have been used in this tutorial.

The use case has been well explained at the beginning of the tutorial but the sample code, for example, the input_fn is a bit daunting for the people, at least me, at a first glance. Here is a list of study notes that I have taken regarding each of the functions that have been used in input_fn.

tf.gfile(.Exists)

Within the module of tf.gfile, there are so many utility functions not only limited to Exists, for example, Copy, Remove, ..etc. In this way, you can do everything using tensorflow without having to dabble with the os.path library and others, which might be a good option for developers who prefer minimizing the amount of dependent libraries in his/her code.

I have created a Jupyter notebook where demonstrate some of the functionalities using the gfile. Hopefully the code and message can give you a more intuitive feeling of how to use those functions.

Parse_CSV

Screen Shot 2017-12-29 at 2.55.09 PM

tf.decode_csv

The parse_csv function actually too me a while to understand. The input to this function has the variable name “value”. And then it got used to be the input to the function tensorflow.decode_csv. The one liner description for decode_csv is as followed:

Convert CSV records to tensors. Each column maps to one tensor.

And the record_defaults argument is definitely something you need to know if you got so spoiled by calling pd.read_csv a lot.

record_defaults: A list of Tensor objects with types from: float32int32int64string. One tensor per column of the input record, with either a scalar default value for that column or empty if the column is required.

For example, this is how a record_defaults could look like, in the example that we are looking at:

_CSV_COLUMN_DEFAULTS = [[0], [”], [0], [”], [0], [”], [”], [”], [”], [”], [0], [0], [0], [”], [”]]

You can tell we have 15 elements in this list, and the first element is a list which has only one element 0, the second element is also a list that has only one element which is an empty string, so on and so forth. The value provided here is basically saying, for the first column of the CSV file, if there need to use a default value, like missing values, use 0 as the default. You can find the raw dataset from here. By reading the adult.names file, you can tell the first column of the data is the field “age”, which is a numeric field which has values like 21, 50, etc. In this case, using 0 as a default value for numeric field makes perfect sense. The second column is “workclass” with values like “self-employed”, ..etc that an empty string is a legit default value for strings/categorical variables.

Here is another example from tensorflow source code using the decode_csv function. You can find the dataset used in the example by visiting here and the records_default is now defined as:

defaults = collections.OrderedDict([
    ("symboling", [0]),
    ("normalized-losses", [0.0]),
    ("make", [""]),
    ...
    ("price", [0.0])
])
types = collections.OrderedDict((key, type(value[0]))
                                for key, value in defaults.items())

#types = OrderedDict([
#    ('symboling', <class 'int'>), 
#    ('normalized-losses', <class 'float'>), 
#    ('make', <class 'str'>),
#    ...    
#    ('price', <class 'float'>)
#])

As you can see, now the record_defaults argument is list(defaults.values()) which is basically the same as the previous example [[0], [0.0], [”], …, [0.0]]. The second example is very helpful because it shows you how to use an ordered list to manage the column types and column names so you can reuse it again and again after manually creating it once.

Screen Shot 2017-12-29 at 3.33.25 PM

The example above perfectly demonstrated how to use decode_csv in a minimal fashion. Since I am using tensorflow.interactive_session, I can simply call .eval() operation on any tensor object to print out for debugging purpose.

Features Dictionary

I want to briefly discuss how the features variable got generated. Clearly, it is a dictionary and the fascinating part is not only how it is constructed, but also how to exclude the labels(y) field.

The zip function is quite a useful function that many people underutilize. The following few lines of code not only demonstrated how to use zip, but also showed you how it is different from Python 2.7 and also how to construct a dict.

Screen Shot 2017-12-29 at 3.49.59 PM

After all of that, here is an example that might captured everything that you need to know about parse_csv but in a more complete context.

Screen Shot 2017-12-29 at 3.59.44 PM

 

Starcraft II – sc2client-api

I was reading about Deepmind is collaborating with Blizzard, trying to build some reinforcement learning empowered artificial general intelligence bot. As being a Starcraft fan since 2000, it is a very exciting feeling to read about the progress that has been made along with some of the fancy Youtube videos out there.

There are most two topics around sc2 bot, one centered around building the robot, and the second is the environment, i,e, how to interact with the game and programmatically control the units within the game. I guess writing a bot for Starcraft is not new, the game comes with its own map editor and you can customize the map and write a bot. I clearly remembered that my childhood friends and I spent almost a summer beating 7 bots on the map – Big Hunter in Starcraft I and then level up, pursuing the challenge of finding different implementations of “advanced bots” and beat them (one of the implementations is double mineral and gas for your opponent, that was crazy). almost two decades have passed, I still do not know how to write a bot yet. I have seen some crazy logic using Galaxy  and here is a great Youtube channel to show off the power.

Other than the map editor, Blizzard has also published a library for researchers and developers to interact with the game using generic programming language.

I came across two libraries that confused me a little bit, one is s2client-api and the other is s2client-proto. So here is the repo description for those two libraries.

s2client-api: StarCraft II Client – C++ library supported on Windows, Linux and Mac designed for building scripted bots and research using the SC2API.

s2client-proto: StarCraft II Client – protocol definitions used to communicate with StarCraft II.

Clearly, both those two libraries claim to be able to control the game via an API but s2client-proto was written mostly in Python while s2client-api in C++.

I have not yet looked into the Python solution yet merely because the name of the project s2client-api, it must be the official API, right? also, I usually have an assumption where if there is an equivalent of a functionality, one written in C++ and the other in Python, it is usually the later is a mere wrapper of the C++ solution, so I decided to first give s2client-api a try.

Other than you have to install lots of development tools if you are not an active C++ developer, like me, I have to install Cmake and latest Visual Studio 2017 community, which is actually pretty easy to do. The set up tutorial is pretty well written and easy to follow.

I was very excited to get the first tutorial up and running, seeing the SCVs started mining the minerals, .etc. The documentation is great and the three tutorials work right out of the box.

In the end, I have modified the code a little bit to implemented two logic:

  1. wait till you have 50 marines and then attack the enemy base
  2. instead of one barracks, build more

And here is the code snippet.

50rush50rush_bax

Last but not least, this is an exciting screenshot of how that simple logic on top of the tutorial3 killed my opponent.

This slideshow requires JavaScript.

GG