Heap

快速从一堆数据中找到极值

用途:维护一个变化的数据集的最优值。

  1. 堆都是完全二叉树 逻辑上,堆是一个complete binary tree,即除了最后一层外都是满的,最后一层尽可能往左排。物理上,可以把Heap存在array里,因为它具有complete binary tree的性质。

Left Child Node Index = Parent Node Index *2 +1

Right Child Node Index = Parent Node Index *2 +2

Parent Index = (NodeIndex-1)/2

2. 堆序性

Min-Heap: 每个node的val都比自己孩子节点小(或等于)

Max-Heap:每个node的val都比自己孩子节点大(或等于)

Heap上也有两个基本操作,和stack一样,push和pop

Push/Insert

如果push到minheap的值是一个大的数,可以直接append;

如果push到minheap的值是一个比root还要小的数,sift-up operation;

第一优先级是去满足complete binary tree的性质,先往后面加。

class Heap (object):
    def __init__(self):
        self.array = []

    def sift_up(array, index):
        parent_idx = (index-1)/2
        if parent_idx < 0 or array[index] > array[parent_idx]:
            return
        array[index], array[parent_idx] = array[parent_idx], array[index]
    
        sift_up(array, parent_idx)

    def push(self, val):
        self.array.append(val)
        self.sift_up(self.array, len(self.array)-1)    

arr = [0,1,5,6,8,-1]
sift_up (arr, len(arr)-1)
print(arr)

Time Complexity: O (log n)

Update

如果update value>original, 和下面的min换,如果update value<original,和上面换

Time Complexity: O (log n)

Pop

Pop后为了维护complete binary tree的性质,把最后一个最大的往上提,它和堆顶的换了,然后慢慢Sift-down,接下来update; 不能直接update None,不然不能保证最后一层的None在最后还是中间

Time Complexity: O (log n)

Initialization/Heapify

从Array初始化一个Heap 其实是一个Sift down

Time Complexity: O(n) 等比数列求和、求极限推导

Smallest K Elements

soln1: sort

O(nlogn)

soln2: quick select/partition

Average Case: O(n+n/2+n/4+n/8+...+1) = O(n)

Worst Case: O(n+ (n-1) + (n-2) + ... ) = O(n^2)

O(kn)

Soln3: 小根堆 Space O(n)

Step 1: Heapify all elements O(n)

Step 2: Call pop() k times to get the k smallest elements. O(klogn)

Time Complexity Total: O(n+klogn)

Soln4: 更适合online 大根堆 Space O(k)

任何时候maxheap里的都是当前的smallest k stream的数据进来,任何时候fixed window with size of k里的元素都是当前的k个最小,之后再有新的stream,都会update when appropriate 流数据找current min之类的也是类似的算法

Step 1: Heapify the first k elements to form a max-heap of size k O(k)

Step 2: Iterate over the remaining n-k elements one by one.

Compare with the largest element of the previous smallest k candidates.

case 1: new element>=top: ignore

case 2: new element<top: update (top ->new element) O((n-k)logk)

Total O(k+(n-k)logk)

O(n+klogn)

O(k+(n-k)logk)

k<<n

O(c*n)

O(nlogk)

k~~n

O(nlogn)

O(nlogn)

k~~n指的在一个数量级

Top K Frequent Elements Leetcode 347

Solution 1: 最直接的想法,max heap,全部进heap,取前k个。用-frequency的排序+min heap代替max heap的实现。

Solution 2:用k-size min heap,不断push,把最小的弹出去。注意输出时需要reverse一下。

在这种方法下不需要每次heapify

Merge K sorted array

Step 1: Create a min heap, put the first element of each array into the heap

Step 2: Each time pop an element from the heap, and then push the next element into the heap.

Time: O(2K+nlogk)

k读入、k来heapify、nlogk来push和popk次

Merge in Stream

面试题

Linkedin:Can you say something you know about stack and heap?

数据结构的stack:LIFO,

操作系统的stack: 函数调用stack存reference,dfs

数据结构的heap:priority queue

操作系统的heap:存object,申请内存来用

Last updated