random_state is a class from the numpy package which can generate random number following many types of distributions. In this case, the author used the random_state.rand(d0) to generate a one dimension array that has as many elements as n_total_samples, in which each value is between 0 and 1.
Meanwhile, they also prepopulate the sample_mask which has the same shape as the rand variable with 0s.
Then during the for loop, they decide if they want to mask each sample from 0 to 1 using a pretty interesting condition.
rand[i] * (n_total_samples – i) < (n_total_in_bag – n_bagged)
or if we rearrange to be like:
rand[i] < (n_total_in_bag – n_bagged) / (n_total_samples – i)
n_total_in_bag – n_bagged => how many remaining need to be bagged
n_total_samples – i => how many total samples to be considered
At the beginning of numerator is large and will decrease as we switch more masks, and the denominator is also large and also decrease as we continue. n_bagged will start at 0 and as i increase, the threshold will keep increasing and the probability that rand[i] is smaller actually will increase. And when the condition is met, we will flag it as masked and add 1 to the n_bagged. And we will continue until n_bagged equals to n_total_in_bag. In that case, the threshold will become literally 0 and the condition will never be met because the random number generator only spit out numbers between [0,1). Also, if there are more remaining need to be bagged than total left to be considered, the threshold will be strictly greater than 1 and it will definitely will masked which probably happens a lot at the beginning.
Here is a screenshot to demonstrate how np.array indexing mask works. As you can see, they support boolean type and numeric index slicing. However, there is one operator which is the “~” sign which our Python users might not come across very often. It is actually the invert operator.
yeah, this is exactly what I mean by inversion!
Joke aside, “~” is the same as the operator.__invert__. Regarding bitwise operations, you can find the definitions from the Python wiki bitwise operations documentation here. “~x” is basically the same as -x-1. Like ~3 => -4 and ~-4 => 3.
You can also find more information about indexing from ndarray indexing documentation.