### 概述

Jump Consistent Hash 是一致性哈希的一种实现，仅用 5 行代码便可实现

#include <iostream>
#include <cstdint>
#include <boost/range/irange.hpp>
#include <boost/format.hpp>

int32_t JumpConsistentHash(uint64_t key, int32_t num_buckets)
{
int64_t b = -1, j = 0;
while (j < num_buckets) {
b = j;1
key = key * 2862933555777941757ULL + 1;
j = (b + 1) * (double(1LL << 31) / double((key >> 33) + 1));
}
return b;
}

int main(void)
{
int32_t total = 4;
for (auto i: boost::irange(0, 32)) {
int32_t before = JumpConsistentHash(i, total);
int32_t after = JumpConsistentHash(i, total+1);
std::cout<<boost::format("%d  %d") % before % after;
if(before != after) {
std::cout<<" changed";
}
std::cout<<std::endl;
}
return 0;
}


### 使用限制

Both of these algorithms(指 Karger 算法和 rendezvous 算法) allow buckets to have arbitrary ids, and handle not only new buckets being added, but also arbitrary buckets being removed. This ability to add or remove buckets in any order can be valuable for cache servers where the servers are purely a performance improvement. But for data storage applications, where each bucket represents a different shard of the data, it is not acceptable for shards to simply disappear, because that shard is only place where the corresponding data is stored. Typically this is handled by either making the shards redundant (with several replicas), or being able to quickly recover a replacement, or accepting lower availability for some data. Server death thus does not cause reallocation of data; only capacity changes do. This means that shards can be assigned numerical ids in increasing order as they are added, so that the active bucket ids always fill the range [0, num_buckets).

Only two of the four properties of consistent hashing described in the Karger et al. paper are important for data storage applications. These are balance, which essentially states that objects are evenly distributed among buckets, and monotonicity(单调性), which says that when the number of buckets is increased, objects move only from old buckets to new buckets, thus doing no unnecessary rearrangement. Their other two properties, spread and load, both measure the behavior of the hash function under the assumption that each client may see a different arbitrary subset of the buckets. Under our data storage model this cannot happen, because all clients see the same set of buckets [0, num_buckets). This restriction enables jump consistent hash.

### 算法分析

int ch(int key, int num_buckets) {
random.seed(key) ;
int b = 0; // This will track ch(key, j +1) .
for (int j = 1; j < num_buckets; j ++) {
if (random.next() < 1.0/(j+1) ) b = j ;
}
return b;
}


int ch(int key, int num_buckets) {
random. seed(key) ;
int b = -1; //  bucket number before the previous jump
int j = 0; // bucket number before the current jump
while(j<num_buckets){
b=j;
double r=random.next(); //  0<r<1.0
j = floor( (b+1) /r);
}
return b;
}


P(j ≥ i) = P( ch(k, i) = ch(k, b+1) )


P(j ≥ i) = P( ch(k, i) = ch(k, b+1) ) = (b+1) / i


### Reference

A Fast, Minimal Memory, Consistent Hash Algorithm