Notes by Louisa
Notes by Louisa
Notes by Louisa
  • Introduction
  • Chapter1 Python Cheatsheet
    • Reference, Deep Copy and Shallow Copy
    • Iterators
    • List Comprehensions
    • Numpy
    • Pandas
    • Data Visualization
    • DateTime
    • Python Good to knows
  • Chapter2 Java Cheatsheet
    • Fundamentals to Java
    • Interface, Abstract Class, Access Modifier, Exceptions
    • Linked List and Java List
    • Java Queue, Stack and Deque
    • Binary Tree
    • Heap in Java
    • Map/Set/Hash
    • OOD
  • Chapter3 Algorithm
    • Fundamental Knowledge
    • Binary Search
    • Basic Sorting
    • Advanced Sorting
    • Linked List
    • Recursion 1
    • HashTable
    • Queue
    • Sliding Window
    • Stack
    • Binary Tree
    • Binary Search Tree
    • Heap
    • String
    • Graph Search DFS1 (Back Tracking)
    • Recursion II and Memoization
    • Dynamic Programming
    • Complete Binary Tree, Segment Tree, Trie Tree
    • Graph Search BFS
    • Graph Search BFS 2
    • Graph Search DFS2
    • Problems from 'JianZhi Offer'
    • Problems Categorized
    • Bit Operations
  • Chapter4 Model
    • Linear Regression
    • Logistic Regression
    • Regularization and Feature Selection
    • Model Evaluation
    • Nonlinear Models
    • PCA
    • Unsupervised Learning
    • Gradient Descent and Gradient Boosting
    • XG Boost and Light GBD
    • Deep Learning
    • Tensorflow/Keras
    • RNN
  • Chapter5 Statistics and A/B Testing
    • Inference about independence
    • Probability, Sampling and Randomization with Python
    • A/B Testing
    • Stats Interview Review
    • Statistics Glossary
  • Chapter6 SQL
    • Student Scores Query
    • Order Query
    • Movie Rating Query
    • Social-Network Query
    • LeetCode SQL题目总结
    • Spark SQL
  • Chapter7 Big Data and Spark
    • Introduction to Pyspark
    • Data Cleaning with Apache Spark
    • Feature Engineering with Pyspark
    • Building Recommendation Engines with Pyspark
    • Building Data Engineering Pipelines in Python
    • Hadoop MapReduce
    • Big Data Related Paper
  • Chapter8 Code Walk-Throughs
    • Python
    • R
    • Shell
  • Chapter9 Special Topics
    • Anomaly Detection
    • E Commerce
    • Supply Chain
    • Social Network Analysis
    • NLP intro
    • Time Series
    • Challenge Prophet with LSTM models
  • Project: The Winning Recipes to an Oscar Award
  • Project: A Crime Analysis of the Last Decade NYC
  • Project: Predict User Type Based on Citibike Data
  • GeoSpark/GeoSparkVis for Geospatial Big Data
  • Single Scattering Albedo
  • Sea Ice Albedo Retrievals
  • Lidar Project
Powered by GitBook
On this page
  • Set
  • Map
  • class HashMap<K,V>
  • Hashtable
  • class HashSet<K>
  • Collision Control
  • Separate Chaining
  • Open Addressing
  • Implementation
  • hashCode()
  • equals
  • Rehashing, load factor
  • 完整实现
  • Enum
  1. Chapter2 Java Cheatsheet

Map/Set/Hash

Set

Set is an interface, 可以实现它的class有:

  • Hashset: 内部没有顺序,操作O(1)

  • Treeset: 内部有顺序,增删查改O(logn); key可以按照natural order或者comparator的顺序排好 内部是self-balancing BST (red-black tree)

  • LinkedHashSet: 内部维护了插入的顺序,可以从head一直next到tail;其实是个linkedlist, <key,previous,next>, key满足set的条件,没有重复;增删查改O(1)

Map

is also an interface, <key, value>, 可以实现它的class:

  • HashMap

  • TreeMap

  • LinkedHashMap- 有next指针,记录了map的先后顺序,这样就知道了LRU(prefer把新加的放在头上)

class HashMap<K,V>

K: the type of keys maintained by this map; V: the type of mapped values 因为是generics 所以不支持primitive的类型

V put(K key, V value) put可以用来放,可以用来update,所以返回的type是原来的value或者null V get(Object key) 查key,返回对应的value;如果不存在返回null V remove(Object key) 返回删除的record 如果不存在,也不error,返回null boolean containsKey(Object key) 是否存在 boolean containsValue(Object value) -- O(n) 如果想O(1),可以做一个反向的hashmap,value变key,key变value,然后如果key现在有重复了,直接list of objects void clear() int size() boolean isEmpty()

上面是因为比较的时候用的是.equals(),这里的参数是object

Hashtable

被抛弃了,因为thread safe

class HashSet<K>

内部有一个private的HashMap,把所有V的部分都placeholder了。

Collision Control

Separate Chaining

读完整个list,发现没有,那就把插入的元素放在头部,因为刚插入的元素被读的可能性比较大

Open Addressing

如果hash后去找,直到空位为止;如果中间有删除怎么办?删除了之后空了的位置mark一下,以后插入的时候可以在这个被mark的地方插入。可以lazy compute,每过一点时间清空了重算hash,把该腾的腾出来。

Implementation

hashCode()

  1. hashCode()的实现对hashmap的性能很重要,因为需要让hashCode足够均匀

  2. Java默认的hashCode()是基于地址计算的 所以就算是我们认为一样的key,也会被放进不同的bucket里。所以即使equals相等了,hashCode却不等,也无法通过get()找到正确的<key,value>。

equals

  1. == 对于primitive来说比的是值是否相等,primitive没有equals();对于object来说==比的是地址,equals()怎么比取决于class的实现,如果class里没有实现,用的是object默认的方法,object默认方法就是==。

  2. a.equals(b)和b.equals(a)不是绝对相等 比如linkedlist和arraylist如果都存了一样的东西,如果l1.equals(l2)会return True 因为不同类型也可以相等,因为list下有一个abstract list, 它是linkedlist和arraylist的父类。在使用时尽量用什么类型put就用什么类型查找,因为它会override本身object类型的equals。

以上两个都得override

在Java里有一个equals和hashCode的contract,当equals override了之后,需要保证两个equals的hashCode也一样。所以hashCode()也要改。

需要注意的是,HashMap can only have one null key, it is always mapped to bucket 0 (if key==null return 0)

Rehashing, load factor

load factor控制着rehashing,如果 number of <k,v>/number of buckets过大,比如大于0.75,就会rehashing。所有的元素都会参与rehashing。

完整实现

Java中的%是取remainder而不是modulus,区别是-1%3如果是modules,答案是2;如果是remainder,答案是-1。所以在key.hashCode()的时候要做个控制。&0X7FFFFFFF 把hashCode的最高位变0,后面每一位不变,这就相当于取了正。

因为0x7FFFFFFF是 0111 1111 1111 ... 1111,这是32-bit signed integer max value

public class Node {
    final String key;
    int value;
    Node next;
    Node(String key, int value) {
        this.key = key;
        this.value = value;
    }
}

public class HashMap implements Map {
    

0x100000000是32-bit integer min value

Enum

就像是一个双向的map

PreviousHeap in JavaNextOOD

Last updated 5 years ago