Heap
快速从一堆数据中找到极值
用途:维护一个变化的数据集的最优值。
堆都是完全二叉树 逻辑上,堆是一个complete binary tree,即除了最后一层外都是满的,最后一层尽可能往左排。物理上,可以把Heap存在array里,因为它具有complete binary tree的性质。
Left Child Node Index = Parent Node Index *2 +1
Right Child Node Index = Parent Node Index *2 +2
Parent Index = (NodeIndex-1)/2
2. 堆序性
Min-Heap: 每个node的val都比自己孩子节点小(或等于)
Max-Heap:每个node的val都比自己孩子节点大(或等于)
Heap上也有两个基本操作,和stack一样,push和pop
Push/Insert
如果push到minheap的值是一个大的数,可以直接append;
如果push到minheap的值是一个比root还要小的数,sift-up operation;
第一优先级是去满足complete binary tree的性质,先往后面加。
class Heap (object):
def __init__(self):
self.array = []
def sift_up(array, index):
parent_idx = (index-1)/2
if parent_idx < 0 or array[index] > array[parent_idx]:
return
array[index], array[parent_idx] = array[parent_idx], array[index]
sift_up(array, parent_idx)
def push(self, val):
self.array.append(val)
self.sift_up(self.array, len(self.array)-1)
arr = [0,1,5,6,8,-1]
sift_up (arr, len(arr)-1)
print(arr)Time Complexity: O (log n)
Update
如果update value>original, 和下面的min换,如果update value<original,和上面换
Time Complexity: O (log n)
Pop
Pop后为了维护complete binary tree的性质,把最后一个最大的往上提,它和堆顶的换了,然后慢慢Sift-down,接下来update; 不能直接update None,不然不能保证最后一层的None在最后还是中间
Time Complexity: O (log n)
Initialization/Heapify
从Array初始化一个Heap 其实是一个Sift down
Time Complexity: O(n) 等比数列求和、求极限推导
Smallest K Elements
soln1: sort
O(nlogn)
soln2: quick select/partition
Average Case: O(n+n/2+n/4+n/8+...+1) = O(n)
Worst Case: O(n+ (n-1) + (n-2) + ... ) = O(n^2)
O(kn)
Soln3: 小根堆 Space O(n)
Step 1: Heapify all elements O(n)
Step 2: Call pop() k times to get the k smallest elements. O(klogn)
Time Complexity Total: O(n+klogn)
Soln4: 更适合online 大根堆 Space O(k)
任何时候maxheap里的都是当前的smallest k stream的数据进来,任何时候fixed window with size of k里的元素都是当前的k个最小,之后再有新的stream,都会update when appropriate 流数据找current min之类的也是类似的算法
Step 1: Heapify the first k elements to form a max-heap of size k O(k)
Step 2: Iterate over the remaining n-k elements one by one.
Compare with the largest element of the previous smallest k candidates.
case 1: new element>=top: ignore
case 2: new element<top: update (top ->new element) O((n-k)logk)
Total O(k+(n-k)logk)
O(n+klogn)
O(k+(n-k)logk)
k<<n
O(c*n)
O(nlogk)
k~~n
O(nlogn)
O(nlogn)
k~~n指的在一个数量级
Top K Frequent Elements Leetcode 347
Solution 1: 最直接的想法,max heap,全部进heap,取前k个。用-frequency的排序+min heap代替max heap的实现。
Solution 2:用k-size min heap,不断push,把最小的弹出去。注意输出时需要reverse一下。
在这种方法下不需要每次heapify
Merge K sorted array
Step 1: Create a min heap, put the first element of each array into the heap
Step 2: Each time pop an element from the heap, and then push the next element into the heap.
Time: O(2K+nlogk)
k读入、k来heapify、nlogk来push和popk次
Merge in Stream
面试题
Linkedin:Can you say something you know about stack and heap?
数据结构的stack:LIFO,
操作系统的stack: 函数调用stack存reference,dfs
数据结构的heap:priority queue
操作系统的heap:存object,申请内存来用
Last updated