从最开始只是会使用它的 API 进行数据的存取,到决定要搞定疑惑、搞懂它的底层原理!本篇文章,将从 0 浅入,从什么是哈希表讲起,然后再说 Java 是怎样实现哈希表的。整个梳理过程,将通过源码这个第一手的资料进行梳理分析,吸收知识、解决疑问,一步一步进行梳理,如果你是对 HashMap 懵懵懂懂的同学,那么欢迎跟着我的节奏一起来梳理!
哈希冲突完整定义:哈希函数可能把两个或两个以上的不同关键字映射到同一个地址,这种情况称为冲突。发生冲突的不同关键字称为同义词。你想一下,如果两个不同的 Key,比方说 Key1 和 Key2,通过哈希函数得到同一个地址,那么你不解决这个冲突,直接进行存储,那么就是这样的:
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.哈希表是通过实现 Map 接口实现的,因为 Map 接口提供了所有的映射操作,并且允许空值和空键(HashMap这个类可以简单的看作是 HashTable 类,与之不同是,HashMap 是非同步且允许存储空数据的)。HashMap 这个类不保证映射的顺序,特别是不保证这个顺序会随着时间的改变而保持不变。
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.当哈希函数恰当地将元素映射到哈希桶(the buckets)中,那么就具有常量时间里的基本操作(get 和 put)性能。遍历 HashMap 这个集合的时候,遍历的时间与 HashMap 实例的容量(哈希桶的数量)加上它的大小(Key和Value的映射数量,即键值对的数量)成正比,因此,不要设置初始的容量太高,或者负载因子太低,这是非常重要的。
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.HashMap 的实例,有两个参数会影响它的性能:初始容量和负载因子。
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.通常来说,默认的负载因子是 0.75,它是一个很好的平衡,平衡了时间以及空间,这个值过高,虽然会降低空间的开销,但是会增加更多的时间来查找。当设置 HashMap 的初始容量时,应该。考虑好预期的 Entry 数量以及负载因子,以此减少 rehash 的次数。如果哈希表的初始容量(table 数组的长度)大于 Entry 数量除以负载因子,则不会发生 rehash 操作。
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.如果需要使用 HashMap 实例来存储许多的映射,那么就需要创建一个够大容量的 HashMap 实例,这样效率是最大的,比起你搞一个小容量的 HashMap,然后让它自己 rehash、table 数组翻倍。注意,使用许多有着相同 hashCode() 的 Key 的话,那肯定是会降低哈希表的性能的,因为冲突多了,为了降低这种影响,当这些 Key 是可比较的情况下,那么哈希表旧可以使用比较的方式去解决。
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:上面的意思就是说,需要注意 HashMap 是不同步的,也就是非线程安全的,在多线程并发的情况下对 HashMap 进行操作(添加、删除、映射),就会发生线程安全问题,导致数据不一致,如果想在多线程的情况下使用 HashMap,那么可以使用 Collections.synchronizedMap 方法将普通的 HashMap 包装成同步的 HashMap。不过不推荐这样使用,我们推荐使用 ConcurrentHashMap。复制代码
- Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.迭代器快速失败(fail-fast):如果 Map 在迭代器进行迭代的时候,Map 中的数据被修改了,那么迭代器就会抛出一个异常:ConcurrentModificationException。因为怕有不确定的行为风险,主要来说就是怕数据不一致,怕线程不安全。
欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) | Powered by Discuz! X3.4 |