When clients choose to use Google Safe Browsing v5 in real-time mode, clients will maintain in their local database: (i) a Global Cache of likely-benign sites, formatted as SHA256 hashes of host-suffix/path-prefix URL expressions, (ii) a set of threat lists, formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. The high-level idea is that whenever the client wishes to check a particular URL, a local check is performed using the Global Cache. If that check passes, a local threat lists check is performed. Otherwise, the client continues with the real-time hash check as detailed below.
Besides the local database, the client will maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.
A detailed specification of the procedure is available below.
The Real-Time URL Check Procedure
This procedure takes a single URL u and returns SAFE, UNSAFE or UNSURE. If it returns SAFE the URL is deemed safe by Google Safe Browsing. If it returns UNSAFE the URL is deemed potentially unsafe by Google Safe Browsing and appropriate action should be taken: such as showing a warning to the end user, moving a received message to the spam folder, or requiring extra confirmation by the user before proceeding. If it returns UNSURE, the following local-check procedure should be used afterwards.
- Let
expressionsbe a list of suffix/prefix expressions generated by the URLu. - Let
expressionHashesbe a list, where the elements are SHA256 hashes of each expression inexpressions. - For each
hashofexpressionHashes:- If
hashcan be found in the global cache, returnUNSURE.
- If
- Let
expressionHashPrefixesbe a list, where the elements are the first 4 bytes of each hash inexpressionHashes. - For each
expressionHashPrefixofexpressionHashPrefixes:- Look up
expressionHashPrefixin the local cache. - If the cached entry is found:
- Determine whether the current time is greater than its expiration time.
- If it is greater:
- Remove the found cached entry from the local cache.
- Continue with the loop.
- If it is not greater:
- Remove this particular
expressionHashPrefixfromexpressionHashPrefixes. - Check whether the corresponding full hash within
expressionHashesis found in the cached entry. - If found, return
UNSAFE. - If not found, continue with the loop.
- Remove this particular
- If the cached entry is not found, continue with the loop.
- Look up
- Send
expressionHashPrefixesto the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), returnUNSURE. Otherwise, let response be theresponsereceived from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration timeexpiration. - For each
fullHashofresponse:- Insert
fullHashinto the local cache, together withexpiration.
- Insert
- For each
fullHashofresponse:- Let
isFoundbe the result of findingfullHashinexpressionHashes. - If
isFoundis False, continue with the loop. - If
isFoundis True, returnUNSAFE.
- Let
- Return
SAFE.
While this protocol specifies when the client sends expressionHashPrefixes to the server, this protocol purposefully does not specify exactly how to send them. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.