Advanced Sorting
Merge Sort 归并排序
假如有一个sort好的array的前后半边,就可以用谁小移谁的方式sort完整个array
子问题:如何merge两个sorted array
def merge(array1, array2):
i=j=0
results=[]
while i<len(array1) and j<len(array2):
if array1[i]<array2[j]:
results.append(array1[i])
i+=1
else:
results.append(array2[j])
j+=1
while i<len(array1):
results.append(array1[i])
i+=1
while j<len(array2):
results.append(array2[j])
j+=1
return results
#Time: O(n)
#对于这个function来说 Space: O(1)
# slicing
def merge_sort(array):
if len(array)==0 or len(array)==1:
return array
mid = len(array)/2
left = merge_sort(array[:middle]) #O(n) space due to slicing
right = merge_sort(array[middle:]) #O(n) space due to slicing
return merge(left, right) #调用一次merge是O(n)Time,在这里调用了logn次
时间复杂度
Upper Half to split: Time O(n)=1+2+4+...n/2 Lower Half to merge: Time O(nlogn) 每一层merge的时间复杂度是O(n) 一共有h=logn层 Total Time Complexity O(nlogn)
空间复杂度
如果用indexing,传index, 那么只需要分析call stack上的额外空间。 在等待line7结果时,只有最左边的直上直下,这个时候的call stack上是一些mid,总共logn层,所以最左边的那一列的空间复杂都是O(logn) 但是这还不是最占空间的时刻; 当每一层的调用都在line7结束了,在等待line8的right result时,也就是上图中红色的部分已经return好了,在left等待着right有结果来merge,这个时候红色部分+粉色部分的空间复杂是:红色O(n)=1+2+4+...n/2,粉色O(logn), 总共O(n). 直到最后,被merge之前的最后一瞬,都有O(n) space。
如果是每一步都new一个新的array,那这其实是overhead,虽然同样是O(n) space,但是如果用helper array,可以减少每一次创建和GC的overhead.
Followup: Linked List
时间
Upper Half to split: Time O(nlogn)=n+n+n+...n Lower Half to merge: Time O(nlogn) 每一层merge的时间复杂度是O(n) 一共有h=logn层 Total Time Complexity O(nlogn)
空间
每层 O(1) 直上直下粉色路径的空间复杂都之和 空间改指针 所以O(logn)
Convert a String A1B2C3D4 to ABCD1234
只需要调整merge的那一步,让字母排在数字前面
Convert a String ABCD1234 to A1B2C3D4,要求in place
切一刀,交换一下位置;再切一刀,交换一下位置
找m, lm, rm
Quick Sort 快排
选好一个pivot之后就可以把pivot左边的都放比pivot小的,pivot的右边都放比pivot大的;再分别对左右进行重复操作,就可以让整体sort好
一个挡板(赶紧忘了这个!)
store index是一个隔板,左边都是比pivot小的,右边都是比pivot大的。所以一开始store index是从0开始移动
(..., store_index): <pivot
[store_index, i): >pivot
[i,...) unknown
def partition (lst, start, end, pivot_index):
lst[pivot_index], lst[end] = lst[end], lst[pivot_index]
store_index = start
pivot = lst[end]
for i in range (start, end):
if lst[i]<pivot:
lst[i], lst[store_index] = lst[store_index], lst[i]
store_index+=1
lst[store_index],lst[end]=lst[end], lst[store_index]
return store_index
from random import randrange
def quick_sort(lst, start, end):
if start>=end:
return
pivot_index = randrange(start, end+1)
new_pivot_index = partition(lst, start, end, pivot_index)
quick_sort(lst, start, new_pivot_index-1)
quick_sort(lst, new_pivot_index+1, end)
Time: Best Case and On average, both O(nlogn)
为什么Average是O(nlogn)
下面这个推导的最后结论
最坏的情况不取决于input本身是否有序,而是取决于每次的pivot都选的特别差,比如每次都选到了最大的或者最小的,导致每一次都是n-1比它小,这就成了一个直上直下的一叉树,每一层都需要做n次交换,高度是n, worst case O( )
Space: O(logn) worst case O(n) 这次是只取决于高度了,因为每层的call stack上只分配了常数级别的空间(pivot_index)
class Solution(object):
def quickSort(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
if not array or len(array)<=1:
return array
return self.helper(array, 0, len(array)-1)
def helper(self, array, start, end):
from random import randrange #保证稳定的,防止被hacker攻击
if start>=end:
return
pivot_index=randrange(start, end+1)
new_index=self.partition(array, pivot_index, start, end)
self.helper(array, start, new_index-1)
self.helper(array, new_index+1, end)
return array
def partition(self, array, index, start, end):
store_index=start
array[end],array[index]=array[index],array[end]
pivot=array[end]
for i in range (start, end):
if array[i]<pivot:
array[i], array[store_index] = array[store_index],array[i]
store_index+=1
array[store_index], array[end] = array[end], array[store_index]
return store_index
class Solution(object):
def quickSort(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
if not array or len(array)<=1:
return array
left, right = 0, len(array)-1
self.helper(array, left, right)
return array
def helper(self, array, left, right):
if left>=right:
return
pivotIdx = self.partition(array, left, right)
self.helper(array, left, pivotIdx-1)
self.helper(array, pivotIdx+1, right)
def partition(self, array, left, right):
from random import randrange
pivotIdx=randrange(left, right+1)
pivot=array[pivotIdx]
array[pivotIdx], array[right] = array[right], array[pivotIdx]
leftbound, rightbound = left, right-1
while leftbound<=rightbound:
if array[leftbound]<pivot:
leftbound+=1
elif array[rightbound]>=pivot:
rightbound-=1
else:
array[leftbound],array[rightbound]=array[rightbound],array[leftbound]
leftbound+=1
rightbound-=1
array[leftbound], array[right] = array[right], array[leftbound]
return leftbound
2个挡板,3个区域,把非0放在左边,0放在右边
class Solution(object):
def moveZero(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
if len(array)<=1:
return array
end = len(array)-1
leftBound, rightBound = 0, end
while leftBound<=rightBound:
if array[leftBound]!=0:
leftBound+=1
elif array[rightBound]==0:
rightBound-=1
else:
array[rightBound], array[leftBound] = array[leftBound],array[rightBound]
leftBound+=1
rightBound-=1
return array
Rainbow Sort I
3个挡板,4个区域,三种颜色-1,0,1的顺序
-1 -1 -1 -1 0 0 0 0 0 -1 0 1 1 1 1 i j k
[0, i) -1 [i, j) 0 [j, k] unknown (k, n-1] 1 array[j]==-1: swap with array[i], i++, j++ j可以加,因为可以保证i换过来的一定是0 array[j]==0: j++ array[j]==1: swap with array[k], k-- j不加,因为k的物理意义是unknown 后面的都还不知道呢 不能加 当j和k错开的时候停止,不是重叠,因为我们需要unknown里完全没有数字,所以只有在它俩交错才能保证里面没有东西。
Time: O(n)
class Solution(object):
def rainbowSort(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
if not array or len(array)<=1:
return array
ones, zeros, negs = len(array)-1, 0, 0
while zeros<=ones:
if array[zeros]==-1:
array[zeros],array[negs]=array[negs],array[zeros]
zeros+=1
negs+=1
elif array[zeros]==0:
zeros+=1
else:
array[zeros],array[ones]=array[ones],array[zeros]
ones-=1
return array
Find the k-th largest element in an array
Soln1: Brute Force: Sort the array in descending order, and return the element at index (k-1). O(nlogn)
Soln2: Heap
Soln3: QuickSort Average O(n) Worst Case
每次扔一半 左边是比pivot大的,右边是比pivot小的
import random
def find_kth_largest(arr, k):
left, right = 0, len(arr)-1
while left<=right:
pivot_idx=random.randint(left, right)
new_pivot_idx=partition(left, right, pivot_idx, arr)
if new_pivot_idx == k-1:
return arr[new_pivot_idx]
elif new_pivot_idx > k-1:
right = new_pivot_idx -1
else:
left = new_pivot_idx + 1
def partition(left, right, pivot_idx, arr):
pivot = arr[pivot_idx]
new_pivot_idx = left
arr[pivot_idx], arr[right] = arr[right], arr[pivot_idx]
for i in range(left, right):
if arr[i]>pivot:
arr[i], arr[new_pivot_idx]=arr[new_pivot_idx], arr[i]
new_pivot_idx += 1
arr[right], arr[new_pivot_idx] = arr[new_pivot_idx], arr[right]
return new_pivot_idx
arr = [3,2,5,1,4,0]
print(find_kth_largest(arr,1))
Rainbow Sort 系列
I 3种颜色 -1,0,1
下面这个赶紧忘掉 写的什么玩意
class Solution(object):
def rainbowSort(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
if not array or len(array)<=1:
return array
left,index,right=0,0,len(array)-1
while index<=right:
if array[index]==-1:
array[index], array[left]= array[left], array[index]
index+=1
left+=1
elif array[index]==1:
array[index], array[right] = array[right],array[index]
right-=1
else:
index+=1
return array
II 4种颜色 0,1,2,3
只有在大量重复数字的sort才有意义
class Solution(object):
def rainbowSortII(self, array):
"""
input: int[] array
return: int[]
"""
# write your solution here
return self.quickSort(array, 0, len(array) - 1)
def quickSort(self, A, start, end):
if start >= end:
return A
left, right = start, end
# key point 1: pivot is the value, not the index
pivot = A[(start + end) // 2];
# key point 2: every time you compare left & right, it should be
# left <= right not left < right
while left <= right:
while left <= right and A[left] < = pivot:
left += 1
while left <= right and A[right] > pivot:
right -= 1
if left <= right:
A[left], A[right] = A[right], A[left]
left += 1
right -= 1
self.quickSort(A, start, right)
self.quickSort(A, left, end)
return A
III K种颜色,用quick sort的思想
传入两个区间,一个是颜色区间 color_from, color_to。另外一个是待排序的数组区间 index_from, index_to. 找到颜色区间的中点,将数组范围内进行 partition,<= color 的去左边,>color 的去右边。 然后继续递归。 时间复杂度 O(nlogk)n是数的个数, k 是颜色数目。这是基于比较的算法的最优时间复杂度。
不基于比较的话,可以用计数排序(Counting Sort)
class Solution(object):
def rainbowSortIII(self, array, k):
"""
input: int[] array, int k
return: int[]
"""
# write your solution here
if not array or len(array)<=1:
return array
return self.sort(array, 0, len(array)-1,1,k)
def sort(self, array, idx_l, idx_r, color_l, color_r):
if idx_l==idx_r or color_l==color_r:
return
pivot=(color_l+color_r)//2
left, right = idx_l, idx_r
while left<=right:
while left<=right and array[left]<=pivot:
left+=1
while left<=right and array[right]>pivot:
right-=1
if left<=right:
array[left],array[right]=array[right],array[left]
left+=1
right-=1
self.sort(array,idx_l,right,color_l,pivot)
self.sort(array,left,idx_r,pivot+1,color_r)
return array
面试题目:
为什么基于比较的排序 时间复杂度下界是Omega(nlogn)
Given n numbers, there are n! possible permutations. For any value, x!=y, x<y is half of the permutation. In each comparison, we are left with n!/2 permutation. In the end, we operate log(n!) to achieve sorting.
Olog(n!)=O(nlogn) (Stirling Formula)
如果有1Mb的数据,用哪个sorting algorithm? What about 10Mb,100Mb,1Gb,1Tb,1Pb?
1Mb: Memory sorting 随便算法
1Tb: Disk sorting, External Sorting 外排,单机,多机,最后merge,写回磁盘
Last updated