Select random K elements in the size n array (return first k elements)
Get a k-permutation of the array of size n
Sampling
Unlimited Data Flow
How to sample for an unlimited data flow and we are required to return 1 random number among all numbers read so far, such that the probability of returning any element read so far is 1/n?
Give you APIs to read the streaming data from the beginning to the end: HasNext(), GetNext().先猜测一个median值,比如1000. 第一轮:过一遍,得到min、max、size、>1000、<1000;第二轮:知道了小于1000和大于1000的个数,用(max+1000)/2下一个median;
Randomization
用random7生成random5:
如果是6或者7,就直接删了; P(0 can be returned) = 1/7*5/7=1/5
Given 1 million urls, find 95th percentile of length
Idea: use a bucket to count how many urls whose length falls into the bucket.
bucket=10 1000urls
bucket=11 1050urls
...
bucket=4100
bucket[i]=#of urls with length i
def rand(x)
count=0
cur=x
while cur>0: #等于在做log 找到需要几位
cur=cur/2
count+=1
while True:
sum=0
for i in range(count):
sum=sum*2+rand2()
if sum<x: #为什么写成while true 因为我们只在实验成功时才会返回,否则会继续做。
return sum
bucket=[0 for i in range(4101)] #[0...4100]
for url in urls:
buckets[len(urls)]+=1
num = 0.95 * 100000
total_so_far=0
for i in xrange(4101): #i is the url's length
total_so_far += bucket[i]
if total_so_far>=num:
print i
break