Before An article In, we have deeply understood the basic principle of Bloom filter and learned that it has many applications in cache system. The Bitmap provided by Redis can just be used as the basis of the bit group required by the bloom filter. This paper first briefly introduces the Bitmap, and then gives the implementation of the bloom filter based on it.
Bitmap is not a separate data type in Redis, but is implemented by bit related operations defined on the string type (Redis is internally called Simple Dynamic String, SDS). At this time, SDS is regarded as a bit group. The following is an operation example of using getbit and setbit instructions in Redis cli.
# Binary representation of string "meow": 01101101 01100101 01101111 01110111 es1:19000> set bitmap_cat "meow" OK # The lowest subscript is 0. Get bit of bit 3 (0) es1:19000> getbit bitmap_cat 3 (integer) 0 # Get bit (1) of bit 23 es1:19000> getbit bitmap_cat 23 (integer) 1 # Set bit 7 to 0 es1:19000> setbit bitmap_cat 7 0 (integer) 1 # Set bit 14 to 1 es1:19000> setbit bitmap_cat 14 1 (integer) 0 # The modified string becomes "lgow" es1:19000> get bitmap_cat "lgow"
Redis's Bitmap is automatically expanded, that is, when get/set is high, it will actively fill in 0. In addition, the bitcount instruction is used to calculate the number of 1 in a specific byte range, and the bitop instruction is used to perform bit operations (and, or, xor and not are supported). For the corresponding usage, you can query redis official documents, etc.
Next, we implement RedisBloomFilter based on Redis (Codis). According to the previous article on Bloom filter, to initialize a bloom filter, two parameters are required: the estimated number of elements and the maximum acceptable error (i.e. false positive rate). In addition, we also need to pass in the connection pool instance JedisResourcePool of Jodis to facilitate operation on Redis. The members and construction methods of RedisBloomFilter class are as follows.
public class RedisBloomFilter { private static final Logger LOGGER = Logger.getLogger(RedisBloomFilter.class); private static final String BF_KEY_PREFIX = "bf:"; private int numApproxElements; private double fpp; private int numHashFunctions; private int bitmapLength; private JedisResourcePool jedisResourcePool; /** * Construct bloom filter. Note: in the same business scenario, the three parameters must be the same * * @param numApproxElements Estimated number of elements * @param fpp Maximum acceptable error (false positive rate) * @param jedisResourcePool Codis Dedicated Jedis connection pool */ public RedisBloomFilter(int numApproxElements, double fpp, JedisResourcePool jedisResourcePool) { this.numApproxElements = numApproxElements; this.fpp = fpp; this.jedisResourcePool = jedisResourcePool; bitmapLength = (int) (-numApproxElements * Math.log(fpp) / (Math.log(2) * Math.log(2))); numHashFunctions = Math.max(1, (int) Math.round((double) bitmapLength / numApproxElements * Math.log(2))); } /** * Gets the number of automatically calculated optimal hash functions */ public int getNumHashFunctions() { return numHashFunctions; } /** * Get the automatically calculated optimal Bitmap length */ public int getBitmapLength() { return bitmapLength; } // The following are the methods }
In order to distinguish the Key corresponding to the bloom filter, prefix the original Key with "bf:". The Bitmap length bitmapLength and the number of hash functions numHashFunctions are calculated using the method in the Guava implementation.
Then, we need to calculate which bits of the Bitmap an element corresponds to after being hashed by k hash functions. Here, we still learn from Guava's BloomFilterStrategies implementation and use MurmurHash and double hash for hashing. For simplicity, assume that all elements are fixed as string types without generics.
/** * Which bits of the Bitmap are mapped after calculating the hash of an element value * * @param element Element value * @return bit Array of Subscripts */ private long[] getBitIndices(String element) { long[] indices = new long[numHashFunctions]; byte[] bytes = Hashing.murmur3_128() .hashObject(element, Funnels.stringFunnel(Charset.forName("UTF-8"))) .asBytes(); long hash1 = Longs.fromBytes( bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0] ); long hash2 = Longs.fromBytes( bytes[15], bytes[14], bytes[13], bytes[12], bytes[11], bytes[10], bytes[9], bytes[8] ); long combinedHash = hash1; for (int i = 0; i < numHashFunctions; i++) { indices[i] = (combinedHash & Long.MAX_VALUE) % bitmapLength; combinedHash += hash2; } return indices; }
Then we can use Jedis's setbit() and getbit() methods to implement the logic of inserting elements into the bloom filter and querying whether elements exist. An element will correspond to multiple bits. In order to improve efficiency, the pipeline comes in handy. In addition, the setbit instruction does not reset the expiration timestamp of the corresponding Key.
/** * Insert element * * @param key The original Redis key will be prefixed with 'bf:' automatically * @param element Element value, string type * @param expireSec Expiration time (seconds) */ public void insert(String key, String element, int expireSec) { if (key == null || element == null) { throw new RuntimeException("No key value can be empty"); } String actualKey = BF_KEY_PREFIX.concat(key); try (Jedis jedis = jedisResourcePool.getResource()) { try (Pipeline pipeline = jedis.pipelined()) { for (long index : getBitIndices(element)) { pipeline.setbit(actualKey, index, true); } pipeline.syncAndReturnAll(); } catch (IOException ex) { LOGGER.error("pipeline.close()happen IOException", ex); } jedis.expire(actualKey, expireSec); } } /** * Check whether the element exists (possibly) in the collection * * @param key The original Redis key will be prefixed with 'bf:' automatically * @param element Element value, string type */ public boolean mayExist(String key, String element) { if (key == null || element == null) { throw new RuntimeException("No key value can be empty"); } String actualKey = BF_KEY_PREFIX.concat(key); boolean result = false; try (Jedis jedis = jedisResourcePool.getResource()) { try (Pipeline pipeline = jedis.pipelined()) { for (long index : getBitIndices(element)) { pipeline.getbit(actualKey, index); } result = !pipeline.syncAndReturnAll().contains(false); } catch (IOException ex) { LOGGER.error("pipeline.close()happen IOException", ex); } } return result; }
Finish, write a simple unit test. Suppose that the ID of the read post is now stored by the bloom filter, and the Key contains the user ID and time. The estimated maximum reading volume per user per day is 3000, with a maximum error of 3%.
public class RedisBloomFilterTest { private static final int NUM_APPROX_ELEMENTS = 3000; private static final double FPP = 0.03; private static final int DAY_SEC = 60 * 60 * 24; private static JedisResourcePool jedisResourcePool; private static RedisBloomFilter redisBloomFilter; @BeforeClass public static void beforeClass() throws Exception { jedisResourcePool = RoundRobinJedisPool.create() .curatorClient("10.10.99.130:2181,10.10.99.132:2181,10.10.99.133:2181,10.10.99.124:2181,10.10.99.125:2181,", 10000) .zkProxyDir("/jodis/bd-redis") .build(); redisBloomFilter = new RedisBloomFilter(NUM_APPROX_ELEMENTS, FPP, jedisResourcePool); System.out.println("numHashFunctions: " + redisBloomFilter.getNumHashFunctions()); System.out.println("bitmapLength: " + redisBloomFilter.getBitmapLength()); } @AfterClass public static void afterClass() throws Exception { jedisResourcePool.close(); } @Test public void testInsert() throws Exception { redisBloomFilter.insert("topic_read:8839540:20190609", "76930242", DAY_SEC); redisBloomFilter.insert("topic_read:8839540:20190609", "76930243", DAY_SEC); redisBloomFilter.insert("topic_read:8839540:20190609", "76930244", DAY_SEC); redisBloomFilter.insert("topic_read:8839540:20190609", "76930245", DAY_SEC); redisBloomFilter.insert("topic_read:8839540:20190609", "76930246", DAY_SEC); } @Test public void testMayExist() throws Exception { System.out.println(redisBloomFilter.mayExist("topic_read:8839540:20190609", "76930242")); System.out.println(redisBloomFilter.mayExist("topic_read:8839540:20190609", "76930244")); System.out.println(redisBloomFilter.mayExist("topic_read:8839540:20190609", "76930246")); System.out.println(redisBloomFilter.mayExist("topic_read:8839540:20190609", "76930248")); } }
Observe the output.
numHashFunctions: 5 bitmapLength: 21895 true true true false
Author: LittleMagic
Link: https://www.jianshu.com/p/c2defe549b40
Source: Jianshu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization, and for non-commercial reprint, please indicate the source.