Python functools lru_cache

LRU_Cache stands for least recently used cache. I understand the value of any sort of cache is to save time by avoiding repetitive computing. Usually you store some computed value in a temporary place (cache) and look it up later rather than recompute everything. Functools is a built-in library within Python and there is a decorate lru_cache which is designed to help Python developers achieve similar goals.

So I have a dummy problem here, instead of the Fibonacci problem, it is even more exhaustive as the new item in the array need to be the sum of all its previous items plus 1. The level of complexity goes exponentially.

Screen Shot 2019-07-28 at 10.09.53 PM

Clearly, by computing n = 24 is already taking more than 6 seconds. However, by decorating the function using the lru_cache, it is as quick as 4 millisecond. You can also find out the cache info, the sheer amount of hits is a the secret of why the function has been sped up so much.

Screen Shot 2019-07-28 at 9.35.45 PM

The performance acceleration is outstanding and based on the definition of Dynamic programming, this is almost a necessity so developers can focus all their efforts into decomposing the problem into subproblems rather than worrying about manually storing hashtable somewhere for loop up.

Screen Shot 2019-07-28 at 8.37.29 AM

Attached is an example of by using the lru_cache decorator, how you can come up with a solution that outperform 100% of the Python solutions out there from leetcode execution time and space wise.

If you are interested in looking under the hood, it isn’t quite complex as all the utilities are written in Python rather than Cython, however, developers are only supposed to access the lru_cache through clear_cache or cache_info because it is believed that messing with cache through an threaded environment will cause unnecessary trouble. I tried to access some of the its private and internally attributes but failed to access the cache due to the fact that cache lives within the namespace of the wrapper and it is not accessible outside the function. This might be an interesting challenge to understand how to get it working.

James Powell 2017 Pydata talk – Python Expert

Mr. James Powell has given this great talk at 2017 Pydata at Seattle about some of the advanced features and concepts in Python (using Python3 but most features also apply to Python2).

Here is a list of some of the highlights that Mr. Powell covered which I want to listed here for later reference:

  • Data model – “dunder method”, double underline or data model
  • Library/user – assert, metaclass, subclass
  • Decorators – @ handy way of calling up a wrapper function
  • Generator – sequential, intermitting and memory efficient yield, __iter__, __next__
  • contextmanager – __enter__, __exist__

In the end, I came across this glossary page from Python’s documentation website which doesn’t hurt to use as a checklist or challenge.

CDN and Github – jsDelivr

Content Delivery Network (CDN)

In HTML, there are many tags, especially the ones related to Javascript requires reference certain script, also somethings requires link to certain stylesheet by including a CSS file in the link tag. However, there are times which you can include all the necessary dependencies as static at the same environment where the site hosts, by including the relative path, or you can add in the complete path in a URL format that can be hosted anywhere on the internet (usually on a CDN Content Delivery Network).

There are several benefits to it:

  1. Effectively offload the serving of those files to CDN servers (load balancing, performance optimization, etc.)
  2. The libraries and content is more abundant and complete at a central place like a CDN, so developer doesn’t have to shop around on the internet and download each dependencies and organize them on your own site for commonly used ones.

There are also cases in which you don’t even have full control over the site that you are working on. For example, you could be developing certain subsection of an important website which you only have limited permission to edit certain section, uploading dependencies is not an option. Also, if you are writing a Chrome extension, you could be injecting certain script into the target sites to manipulate the page, however, it is not realistic for you to upload your dependencies to like github.com/mydependency.js.

Of course, CDN is way beyond just serving little script but can expand to any kind of content serving.

JSDelivr

There are several sites like cdnjs.com which has plenty of Javascript modules or libraries. I came across this site called JSDelivr which looks like cdnjs.com but it has a few cool features like you can refer to any Github repos.

Screen Shot 2019-07-04 at 12.05.22 PM

Of course, you can refer to any files on Github directly by using the link to the raw file hosted on Github. However, Github is just not meant to serve as a CDN and this solution sometime not as straightforward depending on the files types.

Screen Shot 2019-07-04 at 12.27.50 PM

By using jsdelivr, you can simply prefix the Github path by some jsdelivr URL and you are good to have. I have managed to replace all my reference to certain Github material using jsdelivr and it works great.

 

Laoshu50500

I know this post might be a little unorthodox but I just cannot wait to share this amazing Youtube channel laoshu50500 with the folks who might read my blog.

As a non-native English speaker, I have came across plenty of practitioners who claim to be bilingual, trilingual or multilingual, most of them mastered the foreign languages either by growing up in a diverse environment or affording the privilege of attending some sort of school and receive certain training.

The Youtuber Moses totally redefined all of my impression of language study by posting videos about how he practice foreign languages by self teaching and constant communicating. He brought so much happiness to the people around them, strangers just met by recognizing their identity, respecting their culture, and most importantly, working hard (maybe not that hard as he must be smart 🙂 ) to literally speak their language to show respect. It is not that one guy that can speak so many language impressed me the most, it is his humble attitude and his deep desire to practice, to learn and to communicate with another individual on such an equal basis that makes wonder, if everyone in a world spend just a little time to work hard and think/speak from a totally different identity, how much better this world will become.

code HTML and CSS using VS Code

I am testing some front-end code and saw several youtube videos using VS code as the IDE. As a Python developer, it can be overwhelming at the first glance to see SO many lines of code just in general. However, it is like a magic to see how fluent front end developers leverage tools like VS Code and its extensions to pretty much auto generate the code they want with only a few key strokes. This is a post to show some the shortcuts that I came through today.

I do have to admit that VSCode’s default dark theme make it look simple and tidy. However, as you spend more time on it, you also realize that it has most of the features that you require out of a heavy duty IDE like Eclipse or PyCharm, at the same time, as extensible as sublime.

Screen Shot 2019-06-30 at 10.37.38 PM

Like any IDE, VS Code comes with several shortcuts. Here is a printable cheatsheet which you can refer to on a constant basis, including quick comment, open, close and many others.

The most useful one for me is to use Cmd+K, Cmd+S open the shortcut cheatsheet within VSCode. (maybe there are so many key bindings that we have to get to what we need using two key strokes, many of the shortcuts within VS Code starts with Cmd+K)

Many of the tricks were straightly picked up from MS VS Code website, which includes basic features like auto complete, auto closing (as HTML has lots of <whatever> and </whatever> which is easy to miss).

Can you imagine that you only need 15 characters to generate 107 worth of HTML block? it not only thanks to Intellisense within VSCode, but most importantly, the Emmet Abbreviations which frontend developers like a lot.

Screen Shot 2019-06-30 at 10.24.27 PM

In this case, each character is the short abbreviation for certain syntax:

  • dot (.) as default is referring to the class of a div tag
  • greater sign (>) is moving down the DOM tree
  • sharp sign (#) refers to the tag id
  • dollar sign ($) refers to auto numbering
  • asterisk (*) refers to the code block multiplication

You can refer to the Emmet’s website for more information

“Sharpening the axe will not interfere with the cutting of firewood.” Finding a good editor before you start spending lots of time coding is probably time well spent.

 

 

Wikidata – Histropedia

This is a great video from Ewan McAndrew’s youtube channel with Navino explaining how wikidata works and most importantly, how to visualize a timeline written in Sparql in histropedia.

To learn more about wikidata itself which is a great data source for folks want to tinker with natural language and knowledge base, check out the main page of wikidata.

Screen Shot 2019-06-29 at 9.43.03 AM

Geforce Now – Game running in the Cloud

This is a video that I took with startcraft II in ultra setting running in the Cloud thanks to Geforce NOW.

First, here are some “lowlights” of my gaming machine:

  • CPU: Processor AMD FX(tm)-6120 Six-Core Processor, 3500 Mhz, 3 Core(s), 6 Logical Processor(s)
  • GPU: GTX 1050ti (upgraded)
  • Memory: 16 GB (upgraded)

Now, let’s get to my experience of how Geforce Now surprised me.

I came across an activation code in my email inbox that Nvidia actually granted me the access to the Geforce Now free beta. I decided to give it a try and it turned out the experience was fantastic. In essence, it is to off load your gaming machine from doing all the heavy computing, instead, run the game on Nvidia hosted virtual environment and of course, you have to have reasonable and stable network to get the full value of it.

My office is in the second floor and the router is on the first. The wireless internet connection is mediocre so this test isn’t really the best representation of the full capability of Geforce Now. I am tested Starcraft II, Diablo III and battleground and all three of them performed really well.

The lagging is minimized to the internet connection, for Starcraft II players like me who doesn’t have a 300 APM, that lagging is trivial and doesn’t now really impact the gaming experience, but I am assuming if you are playing with any competitive shooting game, that few ms might matter. Anything else should be perfectly fine. I even bought battleground on the fly because my computer was never capable of running it and now I can play it on the Cloud, I spent quite a few minutes just staring at the sky rendered by those crazy machines in the cloud.

I see this literally as a game changer because by pooling all the gaming computing power into one centralized place, this should theoretically drop the total costing of each household spend thousands of dollars on getting the best gear on their own. However, I don’t think a company is running a charity but to maximize their shareholder financial benefits. As an end consumer, I know that the internet is getting faster and better (like 5G), if Nvidia is asking me should I buy a gaming PC or use their service, I might be willing to pay the subscription to play Geforce Now if the monthly subscription fee is close or lower to the monthly depreciation of the hardware.

Say a gaming machine is $2,000 and you expect to get the full usage of it and replace in three years. 2000/3/12 ~ $55/month. Of course, you don’t buy computers only to play games but for many gamers, they do upgrade their gear only because of gaming performance. Also, take into consideration that you can unsubscribe if you are taking a long vacation or busy working, it pays back.

Anyway, good job to Nvidia as usual and this made me wonder if our next generation will be asking the question “hey, daddy, what is that big black box? shouldn’t everything run on a TV directly?” 🙂

geforcenow

Download Geforce Now beta test

geforcenow_internet

Run a test. My internet is on the low end and far from the router but still working.

geforce_now_login

Looks like from this step, it is already running on a Windows virtual machine. I am assuming they are collecting all the information like IP address, hardware spec in order to align the cloud resource to be best compatible with the consumer terminal.

 

Works perfect for me.

Cross Correlation – Python Basics

This is a blog post to familiarize ourselves with the functions that we are going to use to calculate the cross correlation of stock prices. In this case, we are going to create some dummy time series data, one is the leading indicator for the other and hopefully pull the necessary strings to detect it and plot and understand it how it works in the Python realm.

1. time series

Time series data is the best representation of signals like temperature history, pricing history, inventory history, balance history and pretty much any kind of history used in day to day life. We can either use a pandas dataframe or actually, in this case, use the Series class and make the datetime field to be the index.

correlation_s_a

In this case, we generated a series of 8 elements starting at 2018/01/01. Then we are going to generate another series which is a leading indicator of 2 days ahead of s_a.

Before we hard code another series which is, say one day of ahead of the first series, like [0,0,1,2,3,2,1,0]. Let’s check out if there is any method of pd.Series that we can use. There is a whole lot of functions that can be used to time series data. And the closest function that might serve our purpose looks like shift, tshift, sliceshift.

pandas_time_series_shift

shift method indeed looks very powerful where it cannot only shift to fix on the datetime window and shift the value away by filling in NA, but also, if required, will be able to shift the window by a specified frequency. The last print statement shows a perfect way to generate another leading indicator of s_a by two days.

After generating the leading indicator, we can put them side by side so that it is obvious to you. pd.concat is a really powerful function that I will dedicate another whole article to talk about but for now, it serves the purpose of doing a full outer join of those two time series data by date.

pandas_time_series_leading_two_days

Cherry on top of the cake, this is the visualization of two signals with one 2 days of ahead of the other.

plot_two_time_series_2_days_ahead

2. cross correlation

cross_correlation

Cross correlation is to calculate the dot product for two series trying all the possible shiftings. For example, let’s fix the s_a and assume that you slide s_b from the left to the right. At the beginning, s_b is far away and there is no intersection at all.

  1. First intersection, Then as we move s_b to the right, the first intersection will be the far right element of s_b cross the far left element of s_a. In this case [1] from s_b and [0] from s_a. And the dot product is 0. Hence, the first 0 in the corr variable.
  2. Second intersection, it will the be two far right elements of s_b, [2,1] crosses the two far left elements of s_a [0,0], which still ends with a 0.
  3. Actually, it is not until there are four elements intersect which is [0,0,0,1] and [2,3,2,1] where the dot product is 1.
  4. so on and so forth till the far left element of s_b cross far right element of s_a.
  5. Then s_a keep moving to the left and s_b moving to the right and they will never cross again.

As you see, in our dummy example, the dot product is maximized when these two list perfectly aligned with each other perfect vertically. However, here we are only aligning the values, let’s take a look at the index. In this case, we can pick at element in either list. The first 0 from s_a represent  2018-01-01 and the first 0 from s_b represent 2017-12-30. Now we know that s_b is 2 days ahead of s_a purely by analyzing the cross correlation and that is exactly how we constructed s_b in the first place, isn’t it?

In this case, we are simply calculating a sliding dot product which is not necessary the traditional correlation like pearson correlation, for example, how could a correlation be greater than 1, right? There is a good stackoverflow question that sort of addresses this problem.

We can see that the cross correlation is maximized at position 8th, and the length of both s_a and s_b are 8. so no doubt, the two series need to be perfectly aligned. Let’s take a look at another example when two series have different patterns and lengths.

cross_correlation_different_length

The cross correlation is maximized when s_b is shifted to the right by 7 in this case, actually is when the maximum of s_b align with the maximum of s_a aligned.

cross_correlation_different_length_max

3. summary

cross correlation is useful when you try to find a position (lagging/leading) when you compare two time series that doesn’t have to necessary share the same length.

(note: don’t confuse yourself with the pearson correlation, cross correlation doesn’t have to necessarily be between -1 and 1)