Safe Browsing API

Safe Browsing Lookup API Developer's Guide

The Google Safe Browsing Lookup API is an experimental API that allows applications to check URLs against Google's constantly-updated lists of suspected phishing and malware pages.

In addition to providing some background on the capabilities of the Safe Browsing API, this document provides examples for interacting with the API by sending HTTP messages to perform lookups. This document also includes information about the HTTP response code and suggestions for debugging error codes.

Contents

  1. Overview
  2. Getting started
  3. A Quick Example
    1. Using GET Method
    2. Using POST Method
  4. Protocol Specification
    1. HTTP GET Request
      1. Request URL
      2. Response Code
      3. Response Body
    2. HTTP POST Request
      1. Request URL
      2. Request Body
      3. Response Code
      4. Response Body
  5. Acceptable Usage in Clients
    1. Usage Restrictions
    2. End-User Visible Warnings
  6. Report Incorrect Data
    1. Report Phishing URLs
    2. Report Phishing Errors
    3. Report Malware URLs
    4. Report Malware Errors
  7. References
  8. Appendix
    1. R-BNF

Overview

The Safe Browsing Lookup API is designed to provide a simple interface for the applications that just want to query the state of URLs, do not mind sending the URL to Google, and are willing to accept the latency implied by a network roundtrip. Using the Lookup API, clients simply query the URL(s) through HTTP GET or POST request and get the state of the URL(s) directly from the server.

Getting Started

First you need to request an API key, which will authenticate you as an API user. In order to obtain an API key, you must have a Google account. You may create a Google account or log in with your existing Google account and sign up for the API at http://www.google.com/safebrowsing/key_signup.html

Please note that if you violate the requirements detailed in the Acceptable Usage in Clients section, your key may be disabled for a period of time.

Next you need to choose either the GET or POST method to perform your lookup. The GET method is simple, but you can query only one URL per request and you need to encode the URL to be looked up yourself. The POST method allows the client to specify up to 500 URLs in the POST body per request and does not require the API user to encode the URLs in the request body.

A Quick Example

We will use a quick example to show how the lookup API works.

Using GET Method

Client's request URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=demo-app&apikey=12345&appver=1.5.2&pver=3.0&url=http%3A%2F%2Fianfette.org%2F

Server's response code:

200

Server's response body:

malware

Using POST Method

Client's request URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=firefox&apikey=12345&appver=1.5.2&pver=3.0

Client's request Body:

2
http://www.google.com/
http://ianfette.org/

Server's response code:

200

Server's response body:

ok
malware

In this example, the server responses with the state of the queried URLs one by one in the response body, in the same order as in the request.

Protocol Specification

HTTP GET Request

The API user can perform the lookup through a simple HTTP GET request. In this way, the client can only lookup one URL per request.

Request URL

The API user performs a GET request by sending the following URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=CLIENT&apikey=APIKEY&appver=APPVER&pver=PVER&url=URL

Required CGI parameters:

  • The client parameter indicates the type of client, it could be any name of the client’s choice. However, we suggest the name you choose represents the true identity of the client, e.g. “firefox” or “facebook”.
  • The appver parameter indicates the version of the client, e.g. "1.5.2".
  • The apikey parameter indicates your API key.
  • The pver parameter indicates the protocol version that the client supports. Currently this should be "3.0". The format is "major.minor". If we update the protocol, we will make sure that minor revisions are always compatible; however major revision will be incompatible and the server MAY NOT be able to cope with an older protocol.
  • The url parameter indicates the url the client wants to lookup. It must be a valid URL (non ASCII characters must be in UTF-8) and needs to be encoded properly to avoid confusion. For example, if the url contains '&', it could be interpreted as the separator of the CGI parameters. We require the API users to use the percent encoding for the set of "reserved characters", which is defined in RFC 3986 . A summary of the percent encoding can be found here.

Formal R-BNF description:

CLIENT  = (LOALPHA | "-")+
APIKEY = (ALPHA | DIGIT)+
APPVER = DIGIT ["." DIGIT]
PVER = 3 "." DIGIT
URL = valid URL string following the RFC 1738

Response Code

The server generates the following HTTP response codes for the GET request:

  • 200: The queried URL is either phishing, malware or both, see the response body for the specific type.
  • 204: The requested URL is legitimate, no response body returned.
  • 400: Bad Request — The HTTP request was not correctly formed.
  • 401: Not Authorized — The apikey is not authorized
  • 503: Service Unavailable — The server cannot handle the request. Besides the normal server failures, it could also indicate that the client has been “throttled” by sending too many requests

Possible reasons for the Bad Request (HTTP code 400):

  • Not all the required CGI parameters are specified
  • Some of the CGI parameters are empty
  • The queried URL is not a valid URL or not properly encoded

Response Body

For a GET request, the server will include the actual type of URL in the response body when the queried URL matches either the phishing or malware lists (i.e. response code =200):

GET_RESP_BODY = “phishing” | “malware” | “phishing,malware”

where “phishing” means the queried URL is matched in our phishing lists, “malware” means the queried URL is matched in our malware lists, and “phishing,malware” means matches in both.

HTTP POST Request

In addition to the GET request, the client can also look up a set of URLs (up to 500) through HTTP POST request.

Request URL

The client performs a HTTP POST request by sending the following URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=CLIENT&apikey=APIKEY&appver=APPVER&pver=PVER

All the CGI parameters here have the same meaning and requirements as those in the GET request except that the POST request doesn’t contain “url” field since the queried URLs are specified in the request body.

Request Body

The client specifies the queried URLs in the POST request body using the following format:

POST_REQ_BODY = NUM LF URL (LF URL)*
NUM = (DIGIT)+
URL = url string following the RFC 1738

The request’s body contains several lines separated by LF. The first line is a number indicating how many URLs are included in the body. The next several lines are URLs to be looked up. Each line contains one URL and the client must specify at least one URL in the body. The client needs to make sure the URLs specified are valid URLs the same as in the GET request, except that the URL encoding is not required in the POST request. Also note that the specified number of URLs in the first line must equal to the actual number of URLs specified in the subsequent lines (empty lines in the request body should not be counted towards the total number of URLs). Otherwise, the server will return code 400 (Bad Request) and does not perform any lookup.

Response Code

The server generates the following HTTP error codes for the POST request:

  • 200: AT LEAST ONE of the queried URLs are matched in either the phishing or malware lists, the actual results are returned through the response body
  • 204: NONE of the queried URLs matched the phishing or malware lists, no response body returned
  • 400: Bad Request — The HTTP request was not correctly formed
  • 401: Not Authorized — The apikey is not authorized
  • 503: Service Unavailable — The server cannot handle the request. Besides the normal server failures, it could also indicate that the client has been “throttled” by sending too many requests

Possible reasons for the Bad Request (HTTP code 400):

  • Not all the required CGI parameters are specified.
  • Some of the CGI parameters are empty.
  • Fail to specify the number of URLs in the first line of request body.
  • The number of URLs specified in the first line does not match the actual number of URLs specified in the subsequent lines.
  • At least one of the queried URL is not a valid URL or not properly encoded.

Response Body

For a POST request, the server will return the types of URLs queried in the response body when at least one of the queried URLs matches in our suspected phishing or malware lists (i.e. response code = 200):

POST_RESP_BODY = VERDICT (LF VERDICT)*
VERDICT = “phishing” | “malware” | “phishing,malware” | “ok”

The type has the same meaning as in the GET response body except that some of the URLs may be legitimate (recall that the server returns empty content only when none of the queried URLs matches the phishing or malware lists). In this case, we return "ok" for the non-matching URLs. The results are separated by the LF. There is a one-on-one mapping between the results in the response body and the queried URLs in the request body. For example, assume there are 10 URLs specified in the request body, the server will return exactly 10 results with the original order. That is, the first line corresponds to the result of the first queried URL, the second line corresponds to the result of the second queried URL, and so on.

Acceptable Usage in Clients

Usage Restrictions

We will limit the number of URLs queried in a single POST request to be 500, which we believe is sufficient for most API users. We will also limit the number of requests that can be made with a single API key in a 24-hour period. If you expect to make more than 10,000 requests per day, you must contact us to have your API key provisioned for additional users. At the present time there is no cost for this; we want to make sure that we have contact information for any large users that may potentially affect the service and its availability. For further questions about large deployments, contact us by sending email to antiphish-malware-cap-req@google.com.

End-User Visible Warnings

If you use the Google Safe Browsing API to warn users about risks from particular webpages, we require that you follow certain guidelines. These guidelines help protect both you and Google from misunderstandings by making clear that the page is not known with 100% certainty to be a phishing site or a distributor of malware, and that the warnings merely identify possible risk.

  • In your end-user visible warning, you may not lead users to believe that the page in question is, without a doubt, a phishing page or a page that distributes malware. When you refer to the page being identified or the potential risks it may pose to users, you must qualify the warning using terms such as: suspected, potentially, possible, likely, may be.
  • Your warning must enable the user to learn more by reviewing information at http://www.antiphishing.org/ (for phishing warnings) or http://www.stopbadware.org/ (for malware warnings).
  • When you show warnings for pages identified as risky by the Safe Browsing API, you must give attribution to Google by including the line "Advisory provided by Google," with a link to http://code.google.com/apis/safebrowsing/safebrowsing_faq.html#whyAdvisory. If your product also shows warnings based on other sources, you may not include the Google attribution in warnings derived from non-Google data.

Suggested phishing warning language

We encourage you to just copy this warning language in your product, or modify it slightly to fit your product.

Warning- Suspected phishing page. This page may be a forgery or imitation of another website, designed to trick users into sharing personal or financial information. Entering any personal information on this page may result in identity theft or other abuse. You can find out more about phishing from www.antiphishing.org.

Warning- Visiting this web site may harm your computer. This page appears to contain malicious code that could be downloaded to your computer without your consent. You can learn more about harmful web content including viruses and other malicious code and how to protect your computer at StopBadware.org.

Notice to Users About Phishing and Malware Protection

Our Terms of Service require that if you indicate to users that your service provides malware or phishing protection, you must also let them know that the protection is not perfect. This notice must be visible to them before they enable the protection, and it must let them know that there is a chance of both false positives (safe sites flagged as risky) and false negatives (risky sites not flagged). We suggest using the following language:

Google works to provide the most accurate and up-to-date phishing and malware information. However, it cannot guarantee that its information is comprehensive and error-free: some risky sites may not be identified, and some safe sites may be identified in error.

Reporting Incorrect Data

If you would like to help us improve our data, you can submit reports to us. We also encourage you to allow your users to send reports directly to us by including these URLs in your product. The hl parameter in the URL is a language code, values such as "en" and "de" are supported, or you can omit this parameter.

Report phishing URLs that are not currently on our list

http://www.google.com/safebrowsing/report_phish/? continue=http%3A%2F%2Fwww.google.com%2Ftools%2Ffirefox%2Ftoolbar%2FFT2%2Fintl%2F%3Clang%3E%2Fsubmit_success.html&hl=en

Report URLs that are currently on our phishing list in error:

http://www.google.com/safebrowsing/report_error/?continue=http%3A%2F%2Fwww.google.com%2Ftools%2Ffirefox% 2Ftoolbar%2FFT2%2Fintl%2Fen%2Fsubmit_success.html&hl=en

Report malware URLs that are not currently on our malware list

http://www.google.com/safebrowsing/report_badware/

Report URLs that are currently on our malware list in error:

http://www.stopbadware.org/home/reviewinfo

References

  • RFC 1738 — Uniform Resource Locators(URL).
  • RFC 2119 — Keywords for use in RFCs.
  • RFC 2616 — Hypertext transfer Protocol HTTP/1.1.
  • RFC 3629 — UTF-8.
  • RFC 3986 — Uniform Resource Indentifier(URI).

Appendix

RBNF

This document uses a R-BNF notation, which is a mix between Extended BNF and PCRE-style regular expressions:

  • Rules are in the form: name = definition. Rule names referenced as-is in the definition. Angle brackets may be used to help facilitate discerning the use of rule names.
  • Literals are surrounded by quotation marks: "literal".
  • Sequences: (rule1 rule2) or simply rule1 rule2.
  • Alternatives groups: (rule1 | rule2).
  • Optional groups: [rule[]].
  • Repetition: rule* means 0 or more of this rule or this group.
  • Repetition: rule+ means 1 or more of this rule or this group.

The following basic rules that describe the US-ASCII character set are also used as defined in RFC 2616:

  • LOALPHA = <any US-ASCII lowercase letter "a".."z">
  • ALPHA = UPALPHA | LOALPHA
  • DIGIT = <any US-ASCII digit "0".."9">
  • LF = <US-ASCII LF, line-feed (10)>

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.