Implementation of guava cache expiration scheme

expiration mechanism

As long as it is a cache, there must be an expiration mechanism. guava cache expiration is divided into the following three types:

  • expireAfterAccess: if the data is not accessed (read or write) within the specified time, it is expired data. When there is no data or the expired data is read, only one thread is allowed to update the new data. Other threads are blocked and wait for the thread to update and get the latest data.

Constructor:

public CacheBuilder<K, V> expireAfterAccess(long duration, TimeUnit unit) {
  ...
  this.expireAfterAccessNanos = unit.toNanos(duration);
  return this;
}

The constructor sets the value of the variable expireAfterAccessNanos.

  • expireAfterWrite: if the data has not been updated (written) within the specified time, it is expired data. When there is no data or the expired data is read, only one thread is allowed to update the new data. Other threads block and wait for the thread to update and get the latest data.

Constructor:

public CacheBuilder<K, V> expireAfterWrite(long duration, TimeUnit unit) {
  ...
  this.expireAfterWriteNanos = unit.toNanos(duration);
  return this;
}

The constructor sets the value of the variable expireAfterWriteNanos.

  • refreshAfterWrite: if the data is not updated (written) within the specified time, it is expired data. When a thread is updating (writing) new data, other threads return old data.

Constructor:

public CacheBuilder<K, V> refreshAfterWrite(long duration, TimeUnit unit) {
  ...
  this.refreshNanos = unit.toNanos(duration);
  return this;
}

The constructor sets the value of the variable refreshNanos.

problem

  • expireAfterAccess and expireAfterWrite:
    When the data reaches the expiration time, only one thread can perform data refresh. Other requests are blocked and wait for the refresh operation to complete, which will cause performance loss.
  • refreshAfterWrite:
    When the data reaches the expiration time, only one thread can load the new value, and other threads take the old value and return (you can also set asynchronous access to the new value, and all threads return the old value). This effectively reduces waiting and lock contention, so refreshAfterWrite performs better than expireAfterWrite. However, there will still be a thread that needs to perform the refresh task, and guava cache supports asynchronous refresh. If asynchronous refresh is enabled, the thread will return the old value after submitting the asynchronous refresh task, with better performance.
    However, guava cache does not have the function of regular cleaning (active), but performs expiration check and cleaning (passive) when querying data. The following problems will arise: if the data is queried after a long time, the old value may come from a long time ago, which will cause problems. Scenarios with high timeliness requirements may cause very big errors.

When there is no data to be accessed in the cache, no matter which mode is set, all threads will be blocked, and only one thread will load data through lock control

principle

First, understand the principle of guava cache expiration.

1. Overall approach

get method:

class LocalCache<K, V> extends AbstractMap<K, V> implements ConcurrentMap<K, V> {

    V get(K key, int hash, CacheLoader<? super K, V> loader) throws ExecutionException {
      checkNotNull(key);
      checkNotNull(loader);
      try {
        if (count != 0) { // read-volatile
          // don't call getLiveEntry, which would ignore loading values
          ReferenceEntry<K, V> e = getEntry(key, hash);
          if (e != null) {
            long now = map.ticker.read();
            V value = getLiveValue(e, now);
            if (value != null) {
              recordRead(e, now);
              statsCounter.recordHits(1);
              return scheduleRefresh(e, key, hash, value, now, loader);
            }
            ValueReference<K, V> valueReference = e.getValueReference();
            if (valueReference.isLoading()) {
              return waitForLoadingValue(e, key, valueReference);
            }
          }
        }

        // at this point e is either null or expired;
        return lockedGetOrLoad(key, hash, loader);
      } catch (ExecutionException ee) {
        Throwable cause = ee.getCause();
        if (cause instanceof Error) {
          throw new ExecutionError((Error) cause);
        } else if (cause instanceof RuntimeException) {
          throw new UncheckedExecutionException(cause);
        }
        throw ee;
      } finally {
        postReadCleanup();
      }
    }
    
}

It can be seen that guava cache inherits ConcurrentHashMap. In order to meet the concurrency scenario, the core data structure is based on ConcurrentHashMap.

2. Simplified method

Here, the method is simplified into several key steps related to this topic:

if (count != 0) {    // Is there data in the current cache
  ReferenceEntry<K, V> e = getEntry(key, hash);    // Fetch data node
  if (e != null) {                 
    V value = getLiveValue(e, now);    // Judge whether it is expired, filter the expired data, and only judge the time set in expireAfterAccess or expireAfterWrite mode
    if (value != null) {
      return scheduleRefresh(e, key, hash, value, now, loader);    // Whether the data needs to be refreshed is generated only in refreshAfterWrite mode
    }
    ValueReference<K, V> valueReference = e.getValueReference();
    if (valueReference.isLoading()) {   // If another thread is loading / refreshing data
      return waitForLoadingValue(e, key, valueReference);    // Wait for other threads to finish loading / refreshing data
    }        
  }
}
return lockedGetOrLoad(key, hash, loader);    // Load / refresh data

Count is an attribute of cache, which is modified by volatile int count. It stores the current number of caches.

  • If count == 0 (no cache) or the Hash node cannot be retrieved according to the key, lock and load the cache (lockedGetOrLoad).
  • If the Hash node is obtained, judge whether it is expired (getLiveValue) and filter out the expired data.

3. getLiveValue

V getLiveValue(ReferenceEntry<K, V> entry, long now) {
  if (entry.getKey() == null) {
    tryDrainReferenceQueues();
    return null;
  }
  V value = entry.getValueReference().get();
  if (value == null) {
    tryDrainReferenceQueues();
    return null;
  }

  if (map.isExpired(entry, now)) {
    tryExpireEntries(now);
    return null;
  }
  return value;
}

Judge whether the current node has expired by isExpired:

boolean isExpired(ReferenceEntry<K, V> entry, long now) {
  checkNotNull(entry);
  if (expiresAfterAccess() && (now - entry.getAccessTime() >= expireAfterAccessNanos)) {
    return true;
  }
  if (expiresAfterWrite() && (now - entry.getWriteTime() >= expireAfterWriteNanos)) {
    return true;
  }
  return false;
}

isExpired only judges the time of expireAfterAccessNanos and expireAfterWriteNanos. Combined with the constructors of expireAfterAccess, expireAfterWrite and refreshAfterWrite methods, it can be seen that this method does not control the time set for refreshAfterWrite, that is, if the time set for expireAfterAccess and expireAfterWrite passes, the data is expired, Otherwise, it doesn't expire.
If it is found that the data has expired, it will check whether there are other expired data (lazy deletion):

void tryExpireEntries(long now) {
  if (tryLock()) {
    try {
      expireEntries(now);
    } finally {
      unlock();
      // don't call postWriteCleanup as we're in a read
    }
  }
}
  
void expireEntries(long now) {
  drainRecencyQueue();
  ReferenceEntry<K, V> e;
  while ((e = writeQueue.peek()) != null && map.isExpired(e, now)) {
    if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
      throw new AssertionError();
    }
  }
  while ((e = accessQueue.peek()) != null && map.isExpired(e, now)) {
    if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
      throw new AssertionError();
    }
  }
}  

void drainRecencyQueue() {
  ReferenceEntry<K, V> e;
  while ((e = recencyQueue.poll()) != null) {
    if (accessQueue.contains(e)) {
      accessQueue.add(e);
    }
  }
}   

Take the latest range & written data and check whether it is expired one by one.

4. scheduleRefresh

V value = getLiveValue(e, now);
if (value != null) {
  return scheduleRefresh(e, key, hash, value, now, loader);
}

After getLiveValue, if the result is not null, it means that there is no expiration in expireAfterAccess and expireAfterWrite modes (or the time of these two modes is not set), but it does not mean that the data will not be refreshed, because getLiveValue does not judge the expiration time of refreshAfterWrite, but in the scheduleRefresh method.

V scheduleRefresh(
    ReferenceEntry<K, V> entry,
    K key,
    int hash,
    V oldValue,
    long now,
    CacheLoader<? super K, V> loader) {
  if (map.refreshes()
      && (now - entry.getWriteTime() > map.refreshNanos)
      && !entry.getValueReference().isLoading()) {
    V newValue = refresh(key, hash, loader, true);
    if (newValue != null) {
      return newValue;
    }
  }
  return oldValue;
}

When the following conditions are met, the data will be refreshed (the refresh thread returns a new value in synchronous refresh mode, and the old value may be returned in asynchronous refresh mode). Otherwise, the old value will be returned directly:

  1. refreshAfterWrite time refreshNanos is set.
  2. The current data has expired.
  3. No other thread is refreshing data (! entry.getValueReference().isLoading()).

5. waitForLoadingValue

If refreshAfterWrite is not set and the data has expired:

  1. If another thread is refreshing, it will block and wait (blocking through future.get()).
  2. If no other thread is refreshing, lock and refresh the data.

    ValueReference<K, V> valueReference = e.getValueReference();
    if (valueReference.isLoading()) {
      return waitForLoadingValue(e, key, valueReference);
    }  
    
    V waitForLoadingValue(ReferenceEntry<K, V> e, K key, ValueReference<K, V> valueReference)
     throws ExecutionException {
      if (!valueReference.isLoading()) {
     throw new AssertionError();
      }
    
      checkState(!Thread.holdsLock(e), "Recursive load of: %s", key);
      // don't consider expiration as we're concurrent with loading
      try {
     V value = valueReference.waitForValue();
     if (value == null) {
       throw new InvalidCacheLoadException("CacheLoader returned null for key " + key + ".");
     }
     // re-read ticker now that loading has completed
     long now = map.ticker.read();
     recordRead(e, now);
     return value;
      } finally {
     statsCounter.recordMisses(1);
      }
    }
    
    public V waitForValue() throws ExecutionException {
      return getUninterruptibly(futureValue);
    }
    
    public static <V> V getUninterruptibly(Future<V> future) throws ExecutionException {
      boolean interrupted = false;
      try {
     while (true) {
       try {
         return future.get();
       } catch (InterruptedException e) {
         interrupted = true;
       }
     }
      } finally {
     if (interrupted) {
       Thread.currentThread().interrupt();
     }
      }
    }

6. Load data

When loading data, the final call is to call either the lockedGetOrLoad method or the refresh method in scheduleRefresh. The final call is the load/reload method of CacheLoader.
When there is no data to access in the cache, no matter which mode is set, it will enter the lockedGetOrLoad method:

  1. Have the right to load data through lock contention.
  2. Grab the lock data, set the node status to loading, and load the data.
  3. For the data that fails to grab the lock, enter the waitForLoadingValue method as in the previous step, and block until the data loading is completed.
lock();
try {
  LoadingValueReference<K, V> loadingValueReference =
                new LoadingValueReference<K, V>(valueReference);
  e.setValueReference(loadingValueReference);

  if (createNewEntry) {
    loadingValueReference = new LoadingValueReference<K, V>();

    if (e == null) {
      e = newEntry(key, hash, first);
      e.setValueReference(loadingValueReference);
      table.set(index, e);
    } else {
      e.setValueReference(loadingValueReference);
    }
  }
} finally {
  unlock();
  postWriteCleanup();
}

if (createNewEntry) {
  try {
    // Synchronizes on the entry to allow failing fast when a recursive load is
    // detected. This may be circumvented when an entry is copied, but will fail fast most
    // of the time.
    synchronized (e) {
      return loadSync(key, hash, loadingValueReference, loader);
    }
  } finally {
    statsCounter.recordMisses(1);
  }
} else {
  // The entry already exists. Wait for loading.
  return waitForLoadingValue(e, key, valueReference);
}

Solution

From the above analysis, we can know that guava cache will make two independent judgments when judging whether the cache expires:

  1. Judge expireAfterAccess and expireAfterWrite.
  2. Judge refreshAfterWrite.

Returning to the question "although refreshAfterWrite improves performance, other threads may access long overdue data except the thread that performs refresh in synchronous load mode". We can solve this problem through the combination of expireAfterWrite and refreshAfterWrite: while setting the expiration time of refreshAfterWrite, we can set the expiration time of expireAfterWrite/expireAfterAccess. The time of expireAfterWrite/expireAfterAccess is greater than that of refreshAfterWrite.
For example, the time of refreshAfterWrite is 5 minutes, while the time of expireAfterWrite is 30 minutes. When accessing expired data:

  1. If the expiration time is less than 30 minutes, it will enter the scheduleRefresh method, and other threads except the refresh thread will directly return the old value.
  2. If the cached data has not been accessed for a long time and the expiration time exceeds 30 minutes, the data will be filtered out in the getLiveValue method. Other threads block and wait except the refresh thread.

Keywords: Java Cache guava

Added by alecks on Tue, 07 Dec 2021 01:36:58 +0200