String
Python 里的string是immutable的
尽量避免做slicing
Char Removal
删除student中的u和n
常见错误:
如果有多个元素连续出现,index不注意维护就会删不干净。
调用API的时候要注意时间复杂度 deleteCharAt, subString, string1+string2都
两个挡板,三个区域,同向而行。对于需要保留output相对顺序的题目,选择同向而行。
左挡板左侧,不包含左侧,是要留的结果 左右之间,是不在乎的区域 右挡板右侧,不包含右侧,是正在探索的需求
Case 1: a[j] should be kept: a[i]=a[j], i++, j++ Case 2: a[j] should not be kept: j++
class Solution(object):
def remove(self, input, t):
"""
input: string input, string t
return: string
"""
# write your solution here
slow, fast = 0, 0
toremove = set(t)
array = list(input)
for fast in range(len(array)):
if array[fast] not in toremove:
array[slow]=array[fast]
slow+=1
return ''.join(array[:slow])
j 快指针,右边是未访问的元素,左边是已经访问的元素;
i 慢指针,左边是要返回的字符;
快慢指针之间[i, j-1]是不返回的元素。
initialization:i=0, j=0
str[j]是否该被保留
a = list(str)
Case1: 保留str[j], a[i]=str[j], i+=1. j+=1;
Case2: 不保留str[j], j+=1
Termination Condition: j>=len(str)
def remove(str):
lst = list(str)
i,j = 0,0
while j<len(lst):
if lst[j] not in ['u', 'n']
lst[i] = lst[j]
i+=1
j+=1
return ''.join(lst[:i])
print(remove("student"))
Time O(n)
Space O(n)
def remove(str):
lst = []
j = 0
while j<len(str):
if str[j] not in ['u', 'n']
lst.append(str[j])
j+=1
return ''.join(lst)
print(remove("student"))
Time O(n)
Space O(n)
相比之下第一种更好,因为第二种里的append不一定是一个O(1)的操作。在Python中可变长度的容器,增加元素时要开辟新的空间(一般是double size),只是amortized O(1)。
删除句中的多余空格
原始输入空格比较多,把开头多余的空格、结尾的空格去掉,中间的只留1个空格。
Case 1: if a[f]==' ': - case 1.1 s==0, don't copy - case 1.2 s!=0, if a[s-1]!=’ ‘,copy if a[s-1]==' '. don't copy
Case 2: if a[f]!=' ': copy
Post-processing: s>0 and a[s-1] ==' ', then s-=1
class Solution(object):
def removeSpaces(self, input):
"""
input: string input
return: string
"""
# write your solution here
if not input:
return input
array = list(input)
slow = 0
for fast in range (len(array)):
if array[fast]==' ' and (fast==0 or array[fast-1]==' '):
continue
array[slow]=array[fast]
slow+=1
if (slow>0 and array[slow-1]==' '):
slow-=1
return ''.join(array[:slow])
initialization:i=0, j=0
Case 1: a[j]==' ' and i==0, ignore
Case 2: a[j]!=' ' , keep
Case 3: a[j]==' ' , and i!=0, and a[i-1]!=' ', keep
Case 4: a[j]==' ' , and i!=0, and a[i-1]==' ', ignore
Termination: j=n
Post-processing: i>0 and a[i-1] ==' ', then i-=1
def remove_spaces(str):
if not str:
return str
lst = list(str)
i,j = 0,0
while j < len(lst):
## Case 2 and Case 3
if lst[j]!=' ' or (i!=0 and lst[i-1]!=' '):
lst[i]=lst[j]
i+=1
j+=1
if i>0 and lst[i-1] ==' ':
i-=1
return ''.join(lst[:i])
Time O(n)
Space O(n)
def remove_spaces(str):
if not str:
return str
lst = []
for fast in xrange(len(lst)):
if str[fast]==' ' and (fast==0 or lst[fast-1]==' '):
continue
lst.append(str[fast])
if len(lst)>0 and lst[-1] ==' ':
lst.pop()
return ''.join(lst)
Char De-duplication
第一个字符,不论是否重复,都一定会被保留,所以initialization时,可以都从1开始。
Case 1: a[j]==a[i-1], ignore (j+=1)
Case 2: a[j]!=a[i-1], keep (a[i]=a[j], i+=1, j+=1)
def remove_duplicate(str):
if not str or len(str)<2:
return str
lst = list(str)
slow, fast = 1, 1
while fast<len(lst):
if lst[fast]!=lst[slow-1]:
lst[slow]=lst[fast]
slow+=1
fast+=1
return ''.join(lst[:slow])
def remove_duplicate(str):
if not str or len(str)<2:
return str
fast = 0
lst = []
while fast<len(lst):
if lst[fast]==0 or lst[-1]!=str[fast]:
lst.append(str[fast])
fast+=1
return ''.join(lst[:slow])
class Solution(object):
def deDup(self, input):
"""
input: string input
return: string
"""
# write your solution here
if not input:
return input
array = list(input)
slow = 0
for fast in range (len(array)):
if fast==0 or array[fast]!=array[slow-1]:
array[slow]=array[fast]
slow+=1
return ''.join(array[:slow])
变形题1 如果可以允许两个一样的,那需要修改的只有2行:line5的初始值2,2; 以及line7的slow-2
变形题2 消消乐 维护一个stack,和栈顶一样就pop,最后返回所有在stack里的元素
fast = 0
stack = []
Case 1: a[fast]!=stack.top(), stack.push(a[fast]), fast+=1
Case 2: a[fast]==stack.top(), keep moving fast until a[fast]!=stack.top(), stack.pop()
要非常清楚fast本身的物理含义,在case2里,如果我们最终是要保留一个重复元素,那么就不需要pop()
class Solution(object):
def deDup(self, input):
"""
input: string input
return: string
"""
# write your solution here
if not input or len(input)<1:
return input
lst = []
fast = 0
while fast < len(input):
if len(lst)>0 and input[fast]==lst[-1]: #易错点 lst[-1]但是忘了>0的前提
while fast<len(input) and lst[-1]==input[fast]: #易错点 fast++的过程中可能比input本身还大
fast+=1
fast+=1
lst.pop()
else:
lst.append(input[fast])
fast+=1
return ''.join(lst)
同理,也可以再定义一个slow pointer,借助双指针实现inplace的操作。
f fast: linear scan pointer (fast and after are not processed)
s slow: all letters to the left of s (not including s) are processed letters that should be kept.
Initialization: slow=fast=1
Case 1: a[f]!=a[s-1], then a[s]=a[f], s++, f++;
Case 2: a[f]==a[s-1], then repeatedly f++ until a[f]!=a[s-1], s-=1
Return a[0:s] (不包含s)
如果输入是一个list,那么空间复杂度就是O(1); 如果是个string,因为它immutable,先转换成string,空间复杂度是O(n)
Reverse a string
iterative 双指针,头尾,相向,互换
def reverse_helper(lst, left, right):
while left<right:
lst[left], lst[right] = lst[right], lst[left]
left += 1
right -= 1
recursive 虚线框中间
reverse句子
相当于reverse两次,call reverse这个helper,第一次 reverse整个list,第二次reverse单词,j放在j-1,然后再call helper
def reverse_helper(lst, left, right):
while left<right:
lst[left], lst[right] = lst[right], lst[left]
left += 1
right -= 1
def reverse_string(string):
lst = list(string)
reverse_helper(lst, 0, len(lst)-1)
i = 0
left = i
right = i
while i < len(lst):
if i==len(lst)-1 or lst[i+1]==' ';
right = i
reverse_helper(lst, left, right)
left = i + 2
i += 1
return ''.join(lst)
reverse right-hand side by two positions
abcdef --> efabcd
step 1: reverse the whole word,
step 2: fe | dcba 分别反转
Substring
检查一个字符串是否是另一个字符串的子串,返回字符串开始的位置
Brute-force O(m*n)
class Solution:
def strstr(self, haystack, needle):
"""
:type haystack: str
:type needle: str
:rtype int
"""
if not needle:
return 0
for h in xrange(len(haystack)-len(needle)+1): #[l,r)
for n in xrange(len(needle)):
if haystack[h+n]!=needle[n]:
break
else: # when the for look terminates naturally (without early break)
return h
return -1
这里注意for和else搭配的用法,这是为了避免引入一个flag变量记录状态。如果正常退出,也就是没有执行break,else会执行;如果执行break,else不执行。这样在内部的for正常退出时h的位置就是match的字符串在原串的起始位置。
虽然乍一看这是一个O(m*n)的复杂度,但是在早年word之类的Ctrl+F查找时都是这套brute force。这是因为对于英文文本来说,只要前面几个字符匹配了,大概率这就是一个词,实际的时间复杂度其实更接近于O(m+n)
Rabin-Karp: Avg Case O(m-n), worst case:O(m*n) 把字符串的比较转化成26进制数的比较
比如,16进制表示FBA=15*16^2+11*16^1+10*16^0
如果用26进制数来表示abcd,那么a*26^3+2*26^2+3*26^1+4*26^0;
取模: 加速计算、保证时间复杂度是常数,避免大数比较
Class RabinKarp:
def __init__(self):
self.base = 26
self.mod = 997
def strstr(self, haystack, needle):
if len(needdle)>len(haystack):
return -1
hay_hash, nhash = 0,0
power = 1
for i in range(len(needle)):
power = power*self.base%self.mod if i!=0 else 1
hay_hash = (hay_hash * self.base + ord(haystack[i])%self.mod
nhash = (nhash*self.base+ord(needle[i]))%self.mod
for i in range(len(needle), len(haystack)):
if hay_hash ==nhash and needle == haystack[i-len(needle):i]:
return i-len(needle)
hay_hash -= (power*ord(haystack[i-len(needle)]))%self.mod
if hay_hash <0:
hay_hash+=self.mod
hay_hash=(hay_hash*self.base+ord(haystack[i]))%self.mod
if hay_hash==nhash and needle ==haystack[len(haystack)-len(needle)]:
return len(haystack) - len(needle)
return -1
rk=RabinKarp()
print(rk.strstr('bacbabababacaab', 'abaca'))
class Solution(object):
def strStr(self, haystack, needle):
"""
:type haystack: str
:type needle: str
:rtype: int
"""
if not needle:
return 0
if len(haystack)<len(needle):
return -1
Q, W = 10**9+0, 256
HW = 1
for i in xrange(1, len(needle)):
HW = (HW*W)%Q
needle_hash, haystack_hash = 0,0
for i in xrange(len(needle)):
needle_hash = (W*needle_hash+ord(needle[i]))%Q
haystack_hash = (W*haystack_hash+ord(haystack[i]))%Q
if needle_hash == haystack_hash:
return 0
for i in xrange(len(needle), len(haystack)):
haystack_hash = (haystack_hash -HW*ord(haystack[i-len(needle)])%Q+Q)%Q
haystack_hash = (haystack_hash*W+ord(haystack[i]))%Q
if needle_hash==haystack_hash
return i -len(needle)+1
return -1
Char Replacement
Example: student --> stuXXt, studendendent -->stuXXXXXXt
slow: all letters to the left of slow (not including slow) are the final results fast: current index; s1: 'den', s2: 'XX'
student s f
case 1: from s1.length > = s2. length 替换后变短
case 1: if f!='d', copy, f++, s++ case 2: if f points to 'den', copy s2, s+=s2.length fast+=s1.lenth
case 2: s1.length < s2. length 替换后变长
这个时候替换成XXXX
step 1: 过一遍string,数一下s1字符串出现了多少次(比如2次); step 2: enlarge the string 2*(s2.length-s1.length);
s t u d e n t d e n t _ _ s f
step 3: 把slow放在最后,f放在真正的string最后,slow的右边,不含slow都是最后结果,该保留的;fast是现在的index。 出来的条件是f和s都离开了整个string。
Shuffling
ABCD1234-->A1B2C3D4
ABCD1234--> AB12CD34-->A1B2C3D4
A1B2C3D4-->ABCD1234
merge sort, 左右两边都sort好了,所以谁小移谁
A1B2|C3D4 / \ AB12 CD34 ABCD1234
12 CD 先换成 DC21,再内部交换一次CD12
ABCDEFG1234567-->A1B2C3D4E5F6G7
要让chunk1和chunk3同size:ABCD1234EFG567 要为每一个步骤求mid
size = right-left-1 mid = left+size/2 lm = left+size/4 rm = left+size*3/4
ABCDE1234567-->A1B2C3D4E567
当后面的67没有,接着做
ABCDEFG12345-->A1B2C3D4E5FG
ABCDE12345FG 54321GF -->12345FG
Permutation with dups
if no dups, swap-swap
if with dups
如果还是swap-swap的思路,那只要是换过来一样的,在hashset看到了,就不换了。这个判断需要加在进入for循环之前;另外里面加一个if判断 如果在里面,continue;把该加的加了
Time: O(n!*n) Space: O(n^2)
Encoding/Decoding
aaaabcc-->a4b1c2 run-length encoding, requires in-place
slow: 左侧不包含都是结果 fast: 当前
看f后面的已经不是f指向的时候了,总结之前的。但是注意这又有可能会overwrite到后面的一个index,如果string越变越长、越变越短...
Step 1: from left to right: we only deal with the cases where the letter repeats >=2 times; the case that appeared only once are simply copied. In the meantime, we need to count how many times a letter only repeats once. eg. count=1 Step 2: enlarge the string by 'count', and from right to left: do similar steps.
Sliding window
longest substring that each char occurs at most k times
[L(slow), R(fast)]: each char occurs at most k times within this window HashMap: <char, count>
BDEFGADE
| |
<B,1>, <D, 1>, <E, 1>, <F, 1>, <G, 1>, <A, 1>, <D, 1>
global max =6
when to move fast index: at most k times within the table (no value exceeds k times)
when to move slow index: when there is a value > k, keep slow++ until there is no value > k
when to update the final result: when we do fast ++, as long as (fast-slow+1)>global_max
find all anagrams of a substring s2 in a long string s1
string s2 = aabc s1 = zzzzcdebcaabcyywwww
Fixed-Size sliding window 一个指针都行 HashMap <char, count>
Step 1: initialize the hashmap, put s2 in hashmap Step 2: run sliding window on s1, 出现一个里面的count减一个,目标是都是0
Time = O(m*n)
how to optimize to O(m+n)
用type_of_char_to_match来替代 每当有一个字母后面的数字是零,意味着type_of_char_to_match可以减一了,不用每次都对着hashmap查,而是对着数字查了。
flip at most k times to convert from 0 to 1, find the longest subarray that contains all 1
同上面的at most k times, 只需要0 at most k times
corner case
a to i (convert string to integer)
Assumption: the string is a valid integral number
null, empty string return 0
leading space ' 123' return 123
invalid characters '123 1' return 123, '123a1' return 123 '+ 123' return 0
sign (+,-) '+123' return 123 '-123' return -123 '+-123' return 0
overflow an integer return Integer.MAX_VALUE
overflow a long return Integer.MAX_VALUE
Backus-Naur form (BNF)
SPC::=' ' NUM::='0'|'1'|'2'...|'9' INTEGER::=(SPC*)['+' | '-'](NUM+)(SPC*)
::= 规则 () 组合 [] 可有可无 0/1 | 任选其一 + 至少1个 * 至少0个
sum>integer.maxvalue+1 或者-sum<-integer.minvalue
Validate if a string is numeric


科学技术法
‘0~9’, ‘+、-’,‘E’, 'e', ' ', ‘.’
+-: 要么是一开始的符号,要么是e之后紧接的符号 不然就false,最多2个 . : 只能出现一次,而且是e的前面 e: 只能出现最多1次,e的前后必须有数字

REGEX
string pattern
Last updated