Tensorflow – “Wide” tutorial

I have not visited Tensorflow for quite a while and recently had a use case where I need to do some classification and want to give deep learning a whirl. I am surprised to find the number of tutorials/examples that have been added in the past few month. Today, I am going to give the “Tensorflow Linear Model Tutorial” an overhaul and carefully study the functions that have been used in this tutorial.

The use case has been well explained at the beginning of the tutorial but the sample code, for example, the input_fn is a bit daunting for the people, at least me, at a first glance. Here is a list of study notes that I have taken regarding each of the functions that have been used in input_fn.


Within the module of tf.gfile, there are so many utility functions not only limited to Exists, for example, Copy, Remove, ..etc. In this way, you can do everything using tensorflow without having to dabble with the os.path library and others, which might be a good option for developers who prefer minimizing the amount of dependent libraries in his/her code.

I have created a Jupyter notebook where demonstrate some of the functionalities using the gfile. Hopefully the code and message can give you a more intuitive feeling of how to use those functions.


Screen Shot 2017-12-29 at 2.55.09 PM


The parse_csv function actually too me a while to understand. The input to this function has the variable name “value”. And then it got used to be the input to the function tensorflow.decode_csv. The one liner description for decode_csv is as followed:

Convert CSV records to tensors. Each column maps to one tensor.

And the record_defaults argument is definitely something you need to know if you got so spoiled by calling pd.read_csv a lot.

record_defaults: A list of Tensor objects with types from: float32int32int64string. One tensor per column of the input record, with either a scalar default value for that column or empty if the column is required.

For example, this is how a record_defaults could look like, in the example that we are looking at:

_CSV_COLUMN_DEFAULTS = [[0], [”], [0], [”], [0], [”], [”], [”], [”], [”], [0], [0], [0], [”], [”]]

You can tell we have 15 elements in this list, and the first element is a list which has only one element 0, the second element is also a list that has only one element which is an empty string, so on and so forth. The value provided here is basically saying, for the first column of the CSV file, if there need to use a default value, like missing values, use 0 as the default. You can find the raw dataset from here. By reading the adult.names file, you can tell the first column of the data is the field “age”, which is a numeric field which has values like 21, 50, etc. In this case, using 0 as a default value for numeric field makes perfect sense. The second column is “workclass” with values like “self-employed”, ..etc that an empty string is a legit default value for strings/categorical variables.

Here is another example from tensorflow source code using the decode_csv function. You can find the dataset used in the example by visiting here and the records_default is now defined as:

defaults = collections.OrderedDict([
    ("symboling", [0]),
    ("normalized-losses", [0.0]),
    ("make", [""]),
    ("price", [0.0])
types = collections.OrderedDict((key, type(value[0]))
                                for key, value in defaults.items())

#types = OrderedDict([
#    ('symboling', <class 'int'>), 
#    ('normalized-losses', <class 'float'>), 
#    ('make', <class 'str'>),
#    ...    
#    ('price', <class 'float'>)

As you can see, now the record_defaults argument is list(defaults.values()) which is basically the same as the previous example [[0], [0.0], [”], …, [0.0]]. The second example is very helpful because it shows you how to use an ordered list to manage the column types and column names so you can reuse it again and again after manually creating it once.

Screen Shot 2017-12-29 at 3.33.25 PM

The example above perfectly demonstrated how to use decode_csv in a minimal fashion. Since I am using tensorflow.interactive_session, I can simply call .eval() operation on any tensor object to print out for debugging purpose.

Features Dictionary

I want to briefly discuss how the features variable got generated. Clearly, it is a dictionary and the fascinating part is not only how it is constructed, but also how to exclude the labels(y) field.

The zip function is quite a useful function that many people underutilize. The following few lines of code not only demonstrated how to use zip, but also showed you how it is different from Python 2.7 and also how to construct a dict.

Screen Shot 2017-12-29 at 3.49.59 PM

After all of that, here is an example that might captured everything that you need to know about parse_csv but in a more complete context.

Screen Shot 2017-12-29 at 3.59.44 PM


Starcraft II – sc2client-api

I was reading about Deepmind is collaborating with Blizzard, trying to build some reinforcement learning empowered artificial general intelligence bot. As being a Starcraft fan since 2000, it is a very exciting feeling to read about the progress that has been made along with some of the fancy Youtube videos out there.

There are most two topics around sc2 bot, one centered around building the robot, and the second is the environment, i,e, how to interact with the game and programmatically control the units within the game. I guess writing a bot for Starcraft is not new, the game comes with its own map editor and you can customize the map and write a bot. I clearly remembered that my childhood friends and I spent almost a summer beating 7 bots on the map – Big Hunter in Starcraft I and then level up, pursuing the challenge of finding different implementations of “advanced bots” and beat them (one of the implementations is double mineral and gas for your opponent, that was crazy). almost two decades have passed, I still do not know how to write a bot yet. I have seen some crazy logic using Galaxy  and here is a great Youtube channel to show off the power.

Other than the map editor, Blizzard has also published a library for researchers and developers to interact with the game using generic programming language.

I came across two libraries that confused me a little bit, one is s2client-api and the other is s2client-proto. So here is the repo description for those two libraries.

s2client-api: StarCraft II Client – C++ library supported on Windows, Linux and Mac designed for building scripted bots and research using the SC2API.

s2client-proto: StarCraft II Client – protocol definitions used to communicate with StarCraft II.

Clearly, both those two libraries claim to be able to control the game via an API but s2client-proto was written mostly in Python while s2client-api in C++.

I have not yet looked into the Python solution yet merely because the name of the project s2client-api, it must be the official API, right? also, I usually have an assumption where if there is an equivalent of a functionality, one written in C++ and the other in Python, it is usually the later is a mere wrapper of the C++ solution, so I decided to first give s2client-api a try.

Other than you have to install lots of development tools if you are not an active C++ developer, like me, I have to install Cmake and latest Visual Studio 2017 community, which is actually pretty easy to do. The set up tutorial is pretty well written and easy to follow.

I was very excited to get the first tutorial up and running, seeing the SCVs started mining the minerals, .etc. The documentation is great and the three tutorials work right out of the box.

In the end, I have modified the code a little bit to implemented two logic:

  1. wait till you have 50 marines and then attack the enemy base
  2. instead of one barracks, build more

And here is the code snippet.


Last but not least, this is an exciting screenshot of how that simple logic on top of the tutorial3 killed my opponent.

This slideshow requires JavaScript.





GDAX – orderbook data API level1,2,3

GDAX is a cryptocurrency trading platform and it is a subsidy owned by Coinbase. They offer a great platform along with a suite of API services for easy access to the trading data – rear mirror historical view or real-time data.

I have used API before but GDAX suggests to use websocket instead of API for real-time data access.


“Only the best bid and ask”

Screen Shot 2017-11-28 at 7.29.28 PM

As you can see, on the trading platform, the best selling price (ask to sell) is 10077.92 USD/BTC and at the same time, the best buying price (bid to buy) is 10077.91 USD/BTC. There are 53 different orders bidding at the best bid price and the total bids size is 35.26 BTCs. Same for the asks.


“Top 50 bids and asks (aggregated)”

At level2, instead of the top 1 (best) bid and ask orders, the API will return the top 50 bids and the top 50 asks. The data is in the same format [price, ordersize, numberOfOrders]. Given this information, you will have enough information to plot the depth chart (investopedia explanation).

Screen Shot 2017-11-28 at 7.40.57 PM

However, the price range is so small (10112.14 – 10076.77) / 10076.77 ~ 0.3% only a small small price range, which is not sufficient enough to draw a meaning depth chart.

Screen Shot 2017-11-28 at 8.26.54 PM


“Full order book (non aggregated)”

This is a BIG request, however, the information is quite rich and it a complete snapshot of the orderbook. Hence, the response format is slightly different where they do not have order size anymore because record is at order level without any aggregation, instead, they provided the order ID.

As shown in a snippet of the response, they even listed the bids out there at the price of 0.01 USD/BTC. Those bids are probably the ones either added to the orderbook years ago or those were the disbelievers. On the other size of the spectrum, you can easily see people even ask to sell BTC at the price of 9,999,999,999 USD/BTC, good luck to them 🙂

Screen Shot 2017-11-28 at 8.30.33 PM

Screen Shot 2017-11-28 at 8.52.10 PM

This is a plot of the histogram of how orders are distributed on different prices. As you can see, there are a few peak values out there. The biggest peak is definitely on the lowest point, maybe under a dollar. Then they have a few big peaks around $800, $5000, $8000 and $10000.

This is interesting but not necessary the depth chart that you usually see.

Screen Shot 2017-11-28 at 8.53.34 PM

For example, this is the market depth from GDAX. And clearly, the X axis is still exchange rate of BTC and USD but the y axis is the number of Bitcoins at that price. I am not exactly sure of the resolution in this graph but let’s take a quick look and see if we can reproduce this graph in Matplotlib.

The code might not be the most efficient but it does but you want it to do.

Screen Shot 2017-11-28 at 10.42.59 PM


If you are on GDAX website, you can realize that your computer’s fan might be spinning a bit faster, behind the scene of numbers jumping up and down, you can the browser actually subscribed to the GDAX websocket to stream NRT trading information from their server to update the information on your website. And this actually the exact websocket that the developer manual suggest users to use to receive information feed.

Screen Shot 2017-11-28 at 10.44.57 PM

Of course, in this case, you will need to take the responsibility of parsing the information calculate the latest state and deal with everything yourself.

So far, you should have a rough understanding of what kind information are provided various levels of orderbook API from GDAX. We also briefly touched the GDAX websocket. In the next article, I am going to have a deep dive into the websocket data types and message format, hopefully, we can cover Mongodb in more detail regarding how to query the transaction information and gain analytics.

Python Write Formula to Excel

I recently had a challenge writing formulas to Excel file using Python. I managed to create the value of the cell as the right formula but it is not being evaluated or active when you open it up in Excel.

I was using the to_excel method of pandas dataframe. And then someone pointed me to this awesome library XlsxWriter. It supports many of the features within Excel including but not restricted to formula, chart, image and even customized format.

Here is a short snippet of code of showing how it worked just out of box.

Screen Shot 2017-10-12 at 10.28.41 PM

And the output looks straightforward and satisfying.

Screen Shot 2017-10-12 at 10.29.01 PM

Also, as you can see from the code, you should really try to get yourself out of the business of working with row index and column index directly. For example, whenever I think about you are going to use things like row=row+1 or i++, it reminds me of the languages like C++, Java which we should stay away from.

Here is another example of directly working with pandas dataframe using XlsxWriter as the engine.

Screen Shot 2017-10-12 at 10.54.33 PM

And then, this is how the output file looks like, we have a new column called col2 that are active links. Clearly, it has been evaluated and active, when you click on it, it will link you to/jump directly to the A1 cell of sheet1. Problem solved.

Screen Shot 2017-10-12 at 10.56.12 PM

Oracle SQLdeveloper Reset Expired Password

Today I was trying to access an Oracle database after being provided with the credentials, which I have been waiting for a long time, typical IT, isn’t it?

However, I am using a tool called “Oracle SQL Developer” which made this a super fun experience for me that made me want to document and share with people.

When I click to create a new connection and fill in all the needed information (BTW, the SID is actually the database name: Site IDentifier). It prompted me with an error message “the password has expired”.


I tried a few times making sure it is not my typo but every time it gave me the same error message. Also, I randomly entered a few strings and it gave me a different error message of “invalid username/password; logon denied“. This made me believe the password our DBA gave me was definitely right but it just expired. My first question was that my credential must just got created and “brand new”, how could it expire? Clearly, there is a difference between how milk went expired and how password expired.

After a few quick Google research, I realized that some DBAs will set the default password as expired out of box so it will force the users to reset the password, like many web subscription that the confirmation email will directly link you to reset password page. However, now knowing that I need to reset my password, the frustration part is how?

Looking at the connection wizard, there is no where/no button that I can click giving me the option of resetting the password. I navigated through all the buttons, drop downs and tabs and still couldn’t find any sign of “reset password”. “Test Connection” will keep showing this same error message and you are not able to “Connect Either”.

This reminds me of a scenario where it is your first day of employment and your manager told you that your badge is at the front desk. However, the security will not even let you into the building because you do not have your badge. To get the badge, you need to get into the building though to visit the front desk…

I notified my DBA of the awkward situation and he notified me that I need to reset my password in one sentence without any further explanation. I already feel like making a fool out of myself and I think I had better figure this “reset my password” thing all by myself.

In the end, believe it or not, I need to first ignore the error message, SAVE the connection as a valid connection. Then, you need to right click to bring up the menu for the connection and the “reset password” option will be there for you to use. From there, everything will be so straightforward where you enter your old and new password and then you are IN!

Screen Shot 2017-10-09 at 4.08.12 PM

This is purely a UI/UX problem where people should have considered but I just want to share this fun experience so it can save a bit time for those like me, who happen to be new to Oracle, who happen to be given an expired password, who happen to use SQLdeveloper and who happen to think you need to connect first before you change your password.


The regularization is a trick where you try to avoid “overcomplexing” your model, especially during the cases in which the weights are extraordinary big. Having certain weights at certain size might minimize the overall function, however, that unique sets of weights might lead to “overfitting” where the model does not really perform well when new data come in. In that case, people came up with several ways to control the overall size of the weights by appending a term to the existing cost function called regularization. It could be a pure sum of the absolute value of all the weights or it could be a sum of the square (norm) of all the weights. Here is usually a constant you assign to regularization in the cost function, the bigger the number is, the more you want to regulate the overall size of all the weights. vice versa, if the constant is really small, say 10^(-100), it is almost close to zero, which is equivalent of not having regularization. Regularization usually helps prevent overfitting, generalize the model and even increase the accuracy of your model. What will be a good regularization constant is what we are going to look into today.

Here is the source code of regularizing the logistic regression model:

logits = tf.matmul(tf_train_dataset, weights) + biases
loss_base = tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits)
regularizer = tf.nn.l2_loss(weights)
loss = tf.reduce_mean(loss_base + beta * regularizer)
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

As you can see, the code is pretty much the same as the one without regularization, except we add a component of “beta * tf.nn.l2_loss(weights)”. To better understand how the regularization piece contribute to the overall accurarcy, I packaged the training into one function for each reusability. And then, I change the value from extremely small to fairly big and recorded the test accuracy, the training accuracy and validation accuracy on the last batch as a reference, and plotted them in different colors for each visualization.


The red line is what we truly want to focus on, which is the accuracy of the model running against test data. As we increase the value from tiny (10^-5). There is a noticeable but not outstanding bump in the test accuracy, and reaches its highest test accuracy during the range of 0.001 to 0.01. After 0.1, the test accuracy decrease significantly as we increase beta. At certain stage, the accuracy is almost 10% after beta=1. We have seen before that our overall loss is not a big number < 10. And even the train and valid accuracy drop to low point when the beta is relatively big. In summary, we can make the statement that regularization can avoid overfitting, contribute positively to your accuracy after fine tuning and potentially ruin your model if you are not careful.

Now, let’s take a look at the how adding regularization performed on a neural network with one hidden layer.


First, we want to highlight that this graph is in a different scale (y axis from 78 to 88) from the one above (0 to 80). We can see that the test accuracy fluctuate quite a bit in a small range between 86% to 89% but we cannot necessarily see a strong correlation between beta and test accuracy. One explanation could be that our model is already good enough and hard to see any substantial change. Our neuralnet with one hidden layer of a thousand nodes using relu is already sophisticated, without regularization, it can already reach an accuracy of 87% easily.

After all, regularization is something we should all know what it does, and when and where to apply it.

Udacity Deep Learning – The Hidden Layer

The homework of fullyconnected session require the students to:

Turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units nn.relu() and 1024 hidden nodes. This model should improve your validation / test accuracy.

The last block of code was neural network where is simply a network connecting input directly to the output, of course, softmax in the middle. And the change we need to make here is to instead of mapping all the inputs (784) to (10) outputs, we first need to create a layer which maps 784 to 1024 and then another layer to map 1024 to 10. Nothing fancy, but we simply need to add relu after the first Wx+b before passing on to the next layer as the activation function.

First, let’s take a look at how the old no hidden layer SGD looks like:

# Variables.
weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_labels])
biases = tf.Variable(

# Training computation.
logits = tf.matmul(tf_train_dataset, weights) + biases
loss = tf.reduce_mean(

# Optimizer.
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

# Predictions for the training, validation, and test data.
train_prediction = tf.nn.softmax(logits)
valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Now, let’s build on top of this and see if we can architect a new network based on the homework requirement. Frankly speaking, the answer I am going to provide here was mostly inspired by this notebook or even more, the answer is only a fraction of what Mr. Damien offered in his code.

First of all, our network was like softmax(w * x + b) before. Now we are going to have one extra layer. The math will look like softmax(w_2 * h+b_2) where h = relu(w_1 * x + b_1). Or put everything in one:

softmax(w_2 * relu(w_1 * x + b_1) + b_2)

It is a bit complex, but not too much. we first need to redefine the variable weights and bias in a way where all of the four variables w_1, w_2, b_1, b_2 are included.

w_1 now is of the dimension (784, 1028) and b_1 has the size of (1, 1028)
w_2 is of dimension (1028, 10) and b_2 is of size (1, 10). Keep the right dimension in mind, we need to create four variables where two of the weights need to be initialized by normal distribution where the biases can be initialized by zeros, just like the old days.

After that, here comes to developer’s personal preference. You can put all the variables into a dictionary say “myvariables” where it stores all the weights and biases. Or you can create two variables “weights”  and store the two weight variables within that weights dictionary. And the same for biases. Or create one variable all separately and pass them around whenever you need them. First of all, all the design preferences will work in the end if you implement them properly. Damien’s code was following the second design type, which looks clean and tide. Here I am going to package all the variables in one dictionary just to be different.

# Variables
myvars = {
 'w_h': tf.Variable(tf.random_normal([n_input, n_hidden])),
 'b_h': tf.Variable(tf.random_normal([n_hidden])),
 'w_o': tf.Variable(tf.random_normal([n_hidden, num_labels])),
 'b_o': tf.Variable(tf.random_normal([num_labels]))

Now we are done initializing our variables, the next step is to build the model, or how we are going to predict using all the variables that we just defined. In the network without hidden layers, it was easy and actually a one liner

logits = tf.matmul(tf_train_dataset, weights) + biases

In our case, it will look like this:

layer_hidden = tf.add(
    tf.matmul(x, myvars['w_h']), 
logits = tf.matmul(
) + myvars['b_o']

Theoretically you can put everything into one liner to make the statement that “it is still a one liner”, but I highly suggest we at least break them down into components that are more readable. In this case, breaking them down into layers is probably a good idea.

Actually, this block of code will be reused quite a few times other than defining the logits, you will use the model when building accuracy, testing against validation, test, ..etc. So let’s build them into a function:

def model(x, myvars):
    layer_hidden = tf.add(tf.matmul(x, myvars['w_h']), myvars['b_h'])
    logits = tf.matmul(tf.nn.relu(layer_hidden), myvars['w_o']) + myvars['b_o']
    return logits

The definition of loss function and even optimizer is totally independent of how the neural internally looks like internally:

pred = multilayer_perceptron(tf_train_dataset, myvars)
loss = tf.reduce_mean(
optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

After all, we need to define the accuracy against training, validation and testing. Since we have packaged our model into one function, our implementation for a network even with one hidden layer is even cleaned than the homework itself. This is how it looks like right now:

train_prediction = tf.nn.softmax(pred)
valid_prediction = tf.nn.softmax(model(valid_dataset, myvars))
test_prediction  = tf.nn.softmax(model(test_dataset,myvars))

Now we have our new model. And the way we structured our code now, you only need to modify model function and add more variables to myvars even want to add more layers. Everything else should stay exactly the same. When we run our code, the accuracy got improved quite a bit, by about 10% (from 80% to 90%). And the performance difference between GPU and CPU is really substantial. Attached are two screenshots to demonstrate the performance difference between GPU (4 secs) and CPU (37 seconds).

This slideshow requires JavaScript.

Udacity Deep Learning – Desktop into remote GPU server

After playing with tensorflow example codes quite a bit, I think I am ready, and actually cannot wait to unleash the power of GPU. There are so many benchmarks here and there where people brag about how GPU kicks CPU’s ass when it comes to massive computation tasks. I threw a few hundred bucks into my first GPU purely to play top notch video games – Starcraft II back when it first came out, thanks to Blizzard entertainment. Using GPU for machine learning is purely a afterthought, looking back.

This weekend, I brought a GPU workstation back home. However, after being put off by my wife telling me that “I do not want you to make mess in our living room, you can not put anything on the table….”. The sad part, the workstation does not have wireless internet connection, and what is even worse, we only have ethernet connection right next to the router in the living room. Take all constraints into consideration, the only option left here is to leave the desktop as a remote server and access it through other devices (other computers within the same network).

Thank goodness, it was quite easy to set up any Ubuntu machine to be accessed through ssh. Actually, there is no need to set up remote desktop, since we can use the terminal for anything graphic related like web surfing. The only thing we need to do is set up the desktop as a server, people can ssh in, and for the Python related developer, we also need to set up the desktop as a jupyter notebook server where we are going to conduct our experiment.

As you can see in the end, we merely need to plug a power and ethernet cable to the desktop. The moment you power it on, you should be able to access the desktop via SSH and you can even power it off remotely by issuing comment “sudo shutdown”. Now you can work in your bed and never even come off your bed to shut it down!

Also, to make it slightly easier for consistency access, I suggest you log into the router to reserve an IP so that the desktop will not be assigned to a different IP every time it is reboot or what, due to the fact that DHCP is the default many default devices.

After that, you can SSH into your computer, conda install or pip install to set up the tensorflow CUDA development environment. However, it might still be a PITA to develop without any IDE, and in this case, I will be using jupyter notebook. Jupyter notebook does not handle multi-tenancy that well, say you have multiple people accessing the server at the same time. Jupyterhub is supposed to be the tool to ease that pain and the installation and set up are fairly easy. Since I will the only user within the network, I literally did nothing but ran the command “jupyterhub” and now we have a jupyter notebook running that you can simply enter the address “” on any laptop within the network (wifi, ethernet,..). And here is how it looks like on my MacBook Pro accessing that GPU desktop:

This slideshow requires JavaScript.

For those curious mind, look at the slideshow above proving that I am running the Udacity fullyconnected notebook and it is actually running on top of a lovely GPU!

The source code is from here. Since the source code was coded to demonstrate the fundamentals of tensorflow which is not necessarily to run the code on GPU. There will not be any proof right out of box any step is actually using GPU (if available GPU and CPU coexist, tensorflow actually will prioritize GPU).

Here is a snippet of code where I want to highlight. tf.ConfigProto, device_count to be 0 will force the machine to use CPU which you can compare with when you have an time consuming task (the notebook was quite fast using either CPU or GPU).

config = tf.ConfigProto(
    device_count = {'GPU': 0}
graph = tf.Graph()
with graph.as_default():
with tf.Session(graph=graph, config=config) as session:
    _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)

Also, log_device_placement set to be true will enable detailed logging about operations.

Screen Shot 2017-06-10 at 4.22.52 PM

I think this is the end of this post. Nothing new, nothing ground breaking, but quite a fun to see everything got put together, right?


Following the previous posts, this one will be dedicated to go through the block of code which implemented the gradient descent.

First, I want to share my intuitive understanding of how gradient descent works using an analogy. Think about the goal of the optimizer is to find a set of parameters (w, b) in order to minimize the loss function across all the training data. An oversimplified analogy could be trying to find a position within your city that has the lowest altitude. In this case, the altitude is the loss function and the position is the (w,b) which you can control and tweak. The computation for the loss function could be very time consuming across all the training data (think about I want to locate the exact altitude for any given point to the precision of centermeter, or even nanometer). This is precise and accurate of course, but unnecessarily time consuming. After a whole day, you might only be able to measure the altitude of your home to that level of accuracy. The idea of stochastic gradient descent is to trade off the precision for much better speed gain. It is under the assumption that even using a subset of the data, it should still be representative of the general trend of where the loss function is leaning towards and should provide you with a measurement that is in the ballpark. Or in our analogy, I can easily tell you the rough altitude for any given point to the precision of inches or even feet, or meters, at no time. In that case, you do not need to struggle picking the hairs but can quickly realize, your whole neighborhood is not in the lowest point, and this enable you to quickly head to certain directions that lead you to the low land.


Screen Shot 2017-06-07 at 9.07.43 PM

Look at this paragraph of code, first of all, it defined a utility function named “accuracy” which calculate the prediction accuracy based comparing the labels with the predictions. The prediction variable is the output from the softmax function which is, in essence, the probability of certain class happening out of all the possible classes.

For example, we have three classes (‘a’, ‘b’, ‘c’), the predicted logits could be

3.0, 1.0, 0.2

. And the corresponding softmax output (predictions) will be (refer to this stackoverflow question)

[ 0.8360188   0.11314284  0.05083836]

np.argmax function will return the index of the largest element.

Screen Shot 2017-06-07 at 9.36.11 PM

compare the maximum index of the prediction with the one for labels will give you the true / false at row level and you can easily calculate the overall accuracy then.

After that, they run 801 loops and each loop will calculate the loss function for all the data points and report out the accuracy every 100 records for easier readability.

As you can see, the implementation is so neat and clean which tensorflow packages all the variable updates behind the scene.


Screen Shot 2017-06-07 at 9.42.03 PM

Now let’s look at the source code of the stochastic gradient descent. it is very much like the previous code and the only difference is every step/loop, we only feed a batch of data to the optimizer to calculate the loss function.

Screen Shot 2017-06-07 at 9.51.32 PM

For example, the batch size here is 128 and the subset data set is 10,000. This should improve the loss function calculate by 80x faster. And instead of reusing the same data again and again, we will loop through all the training data in a batching way so in that case, we will gain the training performance gain and we used all the data points at least somewhere in our code. The overall accuracy was as good as plain gradient descent.

In the next post, we will try to solve the problem of improving this logistic regression and evolve into a neural network with hidden layers and other cool stuff.

“Turn the logistic regression example with SGD into a 1-hidden layer neural network with rectified linear units nn.relu() and 1024 hidden nodes. This model should improve your validation / test accuracy.”

Challenge Accepted!

UDACITY DEEP LEARNING – Tensorflow Optimizer

In the previous post, we briefly covered how the iPython notebook read the pickled inputs and transformed into a workable format. Today, we are going to cover the following block of how to create an optimizer which will be used during the iterations.


Any tensorflow job could be represented by a graph, which is a combination of operations (computation) and tensors (data). There is always a default graph object got established to store everything. You can also explicitly create a graph like “g = tf.Graph()” and then you can use g everywhere to refer to that specific one. If you want to save the headache of worrying about switching between graphs, you can use the “with” statement and all the operations within that block will be saved to the graph handler.

Also, “a picture is better than a thousand words”, there is a tool called tensorboard which is extremely easy to use to help you visualize and explore the tensorflow graph. In this case, I am simply adding extra two lines of code at the end of the graph with statement block.

Screen Shot 2017-06-06 at 9.17.23 PM

The writer will write the graph to a file and writer.flush will ensure all the operations which got asynchronously written to the log file is flushed out to the disk.

Then you can open up your terminal and run tensorboard command:

$ tensorboard --logdir=~/Desktop/tensorflow/tensorflow/examples/udacity/tf_log/
Starting TensorBoard 47 at
(Press CTRL+C to quit)

And this is how it looks like that block of Python code.

Screen Shot 2017-06-06 at 9.20.31 PM.png

The labels on the tensorboard does not necessarily reflect how variables are named in your code. For example, we first created four constants, the true tensor name only shows up when you print out the four constants.

tf_train_dataset:  Tensor("Const:0", shape=(10000, 784), dtype=float32)
tf_train_labels Tensor("Const_1:0", shape=(10000, 10), dtype=float32)
tf_valid_dataset Tensor("Const_2:0", shape=(10000, 784), dtype=float32)
tf_test_dataset Tensor("Const_3:0", shape=(10000, 784), dtype=float32)

Then, it is also pretty cumbersome to map each variable to the corresponding unit in the graph. Here is how the graph looks like after I highlighted some of the key variables.



The loss function or cost function for a deep learning network are usually adopting a cross entropy function. It is simply

C=−(1/n) [y*ln(a)+(1y)*ln(1a)]

y is the expected outcome or label. In a one hot coding fashion, y is either 0 or 1 which simplified the C = -1/n * ln(a), and here “a” is the predicted probability, which is the normalized outcome done by the softmax function. 

Screen Shot 2017-06-06 at 9.35.38 PM
In this paragraph of code, it first created a variable weight of size (28*28~784, 10) and the bias variable of size (10). Weight was initialized by a normal distribution but throw away all the variables outside two standard deviation range while biases are initialized to be zero to get started. Just to recapture, our tf_train_dataset variable is now a N by 784 size matrix where N is the number of records/images the user specify.

logits = x * w + b = (N,784) * (784, 10) + (10) = (N, 10) + (10) = (N, 10)

Now logits is a matrix of N rows and 10 columns. Each row contains 10 numbers which store a unnormalized format of the “probability” of which hand written digits that record might be. The highest number definitely indicates the column/label it falls under is mostly likely to be the recognized digits, however, since all the numbers are simply the outcome after W*x+b which is necessary to be bounded to be between (0,1) and further more, they do not add up to 1. Those normalization all happens behind the scene in the nn.softmax_cross_entropy_with_logits function. As an end user, we only need to make sure we memorize what logits truly stands for and how you calculate logits. Tensorflow.nn will take care of the rest.


optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
In this example, the author used the GradienDescentOptimizer. 0.5 is the learning rate.

Screen Shot 2017-06-06 at 10.05.18 PM


# Predictions for the training, validation, and test data.
# These are not part of training, but merely here so that we can report
# accuracy figures as we train.
 train_prediction = tf.nn.softmax(logits)
 valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
 test_prediction  = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

Last but not least, they called the softmax function on training/valid/testing dataset in order to calculate the predictions in order to measure the accuracy.

In the next post, we will see how tensorflow frame work will iterate through batches of training dataset and improve the accuracy as we learn.