Complete Binary Tree, Segment Tree, Trie Tree

字典树,segment tree

Complete Binary Tree

最后一层以上的都满,最后一层没有泡泡 complete binary tree一定是balanced binary tree 于是就可以用list或者array来存储complete binary tree,它具有以下性质:

left_child_node_index = parent_node_index*2+1 right_child_node_index = parent_node_index*2+2 parent_node_index = (child_node_index-1)/2

常用表示
class GeneralTreeNode:
    def __init__(self, children, value):
        self.children = children #list
        self.value=value

Heap也可以用数组表示,因为本质上heap是一个complete binary tree

Segment Tree

Question

Given an array, (1) update() at given index (2) query range sum, get sum of number from index start to end.

这是一个设计数据结构的问题,涉及两种操作,分别有不同的时间复杂度。所以会涉及不同的use-case,比如有些情况更改的比较少,但是求和的次数比较多,这个时候下面的时间/空间复杂度就不行了。

O(1) . update(index, value): array[index]=value O(n). query_sum(start, end): return sum of array[start:end]

这个时候就有了第二种做法,update O(n), prefix_sum O(1); 并且还有额外的space cost O(n). 在sum的时候预先求好prefix_sum,预先存好了array[0:index]的和。比如对于[1,2,3,5],它的prefix_sum是[1,3,6,11]。prefix_sum[i]=prefix_sum[i-1]+array[i], prefix_sum[0]=array[0]. 此时如果想知道某个区间的和,只需要query_sum(start, end): prefix_sum[end]-prefix_sum[start-1].

这两种方式都相对极端,要么极端偏update,要么极端偏sum。于是就有了第三种解法,平均了这两种,让两者的时间和空间复杂度都是O(logn).

SegmentTree内部存list的原始index信息,根节点是本身的array,往下的treenode是这个array的均分,直到leafnode每个children都是一个元素。这也是为什么叫segmenttree:每个leafnode都是单个数字的和,treenode都是array的某一个部分。

在这样一个结构下,如果(1)想要更新一个数字,那么就要更新:root、它的某一个孩子、受影响的那个node、... 直到最后一个叶子。每一层又一个node会被影响,有height这么多,所以更新的时间复杂度是O(logn). (2)求sum,先从root开始看start和end的区间,看左右分别需要分到多少区间。对于某一边可能只走了一步或者几步,对于另一边可能走了整个树的高度,所以时间复杂度也是O(n).

class SegmentTreeNode:
        def __init__(self, start, end):
                self.start = start
                self.end = end
                self.sum = 0
                self.left = None
                self.right = None

class SegmentTree:
        def __init__(self, array):
                self.array = array
                self.root = self._build_segment_tree(array, 0, len(array)-1)
        
        def _build_segment_tree(self, array, start, end):
                # sanity check
                if start > end:
                        return None
                cur = SegmentTreeNode(start, end)
               
                 #base case single number
                if start == end:
                        cur.sum=array[start]
                        return cur
                
                mid = start + (end - start)/2
                cur.left = self._build_segment_tree(array, start, mid)
                cur.right = self._build_segment_tree(array, mid+1, end)
               
                if cur.left is not None:
                        cur.sum += cur.left.sum
                if cur.right is not None:
                        cur.sum += cur.right.sum
                return cur
        
        def update(self, index, value):
                diff = value - self.array[index]
                self.array[index] = value
                cur = self.root
                while cur != None:
                        cur.sum += diff
                        mid = cur.start + (cur.end-cur.start)/2
                        if index <= mid:
                                cur = cur.left
                        else:
                                cur = cur.right
        
        def sum(self, start, end):
                return self._get_sum_from_tree(self.root, start, end)
                
        def _get_sum_from_tree(self, cur, start, end):
                if cur is None or cur.start>end or cur.end<start:
                        return 0
                
                if cur.start >= start and cur.end <= end:
                        return cur.sum
                        
                return self._get_sum_from_tree(cur.left, start, end) + self._get_sum_from_tree(cur.right, start, end)
                

# testcase
s_tree = SegmentTree([1,2,3,4,5])
print s_tree.sum(1,2)
s_tree.update(1,3)
print s_tree.sum(1,2)

Trie Tree 字典树

实际应用:搜索引擎,前缀搜索

如果有一系列word:ab, abd, abch, cg, ef. 按照list的方式存储,space是O(infinity),因为对于一个字典,它可以无限大,不停有新单词加入。可以优化的地方是很多单词拥有相同的前缀,那就可以节约前缀的空间。

root分成26个分岔,代表了26个字母,也就是分别以这26个字母为首。而每一个子叉都继续分成了26个叉。走到一个单词的结尾就为它加一个标记,比如加一个True。

class TrieTreeNode:
    def __init__(self):
        self.children = [None]*26
        self.is_end = False
        self.count = 0
    
class TrieTree:
    def __init__(self):
        self.root = TrieTreeNode()
    
    def search(self, word):
        cur = self.root
        for i in range(len(word)):
            index = ord(word[i])=ord('a')
            if cur.children[index] is None:
                return None
            else:
                cur = cur.children[index]
        return cur.is_end
    
    def add(self, word):
        cur = self.root
        for i in range(len(word)):
            index = ord(word[i])-ord('a')
            if cur.children[index] is None:
                cur.children[index] = TrieTreeNode()
            cur.count += 1
            cur = cur.children[index]
        cur.is_end = True
    
    def delete(self, word):
        self._delete_helper(self.root, word, 0)
        
    def _delete_helper(self, cur, word, index):
        if index==len(word):
            cur.is_end = False
            return
        i = ord(word[index])-ord('a')
        self._delete_helper(cur.children[i], word, index+1)
        
        if cur.children[i].count == 0:
            cur.children[i] = None
        cur.count -= 1

Last updated