Click here to see your recently viewed pages and most viewed pages.
Hide

Safe Browsing Lookup API Developer's Guide

The Google Safe Browsing Lookup API is an experimental API that allows applications to check URLs against Google's constantly-updated lists of suspected phishing, malware, and unwanted software pages.

This document describes the capabilities of the Safe Browsing API, provides code samples for HTTP requests, shows example responses, and explains error codes.

Contents

  1. Overview
  2. Key Differences from Previous Versions
    1. Differences Between Version 3.0 and 3.1
  3. Getting Started
  4. Quick Example
    1. GET Method
    2. POST Method
  5. Protocol Specification
    1. HTTP GET Request
      1. Request URL
      2. Response Codes
      3. Response Body
    2. HTTP POST Request
      1. Request URL
      2. Request Body
      3. Response Codes
      4. Response Body
  6. Acceptable Usage in Clients
    1. Usage Restrictions
    2. End-User Visible Warnings
  7. Report Incorrect Data
    1. Report Phishing URLs
    2. Report Phishing Errors
    3. Report Malware URLs
    4. Report Malware Errors
  8. References
  9. Appendix
    1. R-BNF

Overview

The Safe Browsing Lookup API provides a simple interface for applications that just want to query the state of URLs, do not mind sending the URLs to Google, and are willing to accept the latency implied by a network roundtrip. Using the Lookup API, clients query the URLs through HTTP GET or POST requests and receive the state of the URLs directly from the server.

Key Differences From Previous Versions

Differences Between Version 3.0 and 3.1

Changes since version 3.0:

  • The API key format has changed. API keys are now managed in the Google Developers Console, as described in Getting Started. Note that the CGI parameter is now called key.
  • HTTPS is required for version 3.1.

Getting Started

In order to interact with the Safe Browsing lookup server, you need an API key to authenticate as an API user. You will pass this key as a CGI parameter in your HTTP requests to the lookup server:

https://safebrowsing.google.com/safebrowsing/...&key=SIzaVyOm19mrXxv-z80s-nC-G2XYH1-3hAtNlGh&...

If you do not have a key, create the key in the Google Developers Console:

  1. Create a project in the Google Developers Console, if you don't already have one.
  2. In your project, click on APIs & Auth > APIs.
  3. Scroll down to the Safe Browsing API and turn it ON.
  4. Click on APIs & Auth > Credentials.
  5. Click on Create new key and create a browser or server key depending on your application.
If you need more help, check out the Google Developers Console help.

Quick Example

You can use the GET or POST method to perform your lookup. The GET method is simple, but you can query only one URL per request, and you need to encode that URL yourself. The POST method allows you to specify up to 500 URLs in the request body, and they need not be encoded.

GET Method

Client's request URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=demo-app&key=12345&appver=1.5.2&pver=3.1&url=http%3A%2F%2Fianfette.org%2F

Server's response code:

200

Server's response body:

malware

POST Method

Client's request URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=firefox&key=12345&appver=1.5.2&pver=3.1

Client's request Body:

2
http://www.google.com/
http://examplebadurl.org/

Server's response code:

200

Server's response body:

ok
malware

The server response lists the state of the queried URLs in the same order as in the request.

Protocol Specification

HTTP GET Request

Use a simple HTTP GET request to look up one URL.

Request URL

Send the following URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=CLIENT&key=APIKEY&appver=APPVER&pver=PVER&url=URL

Required CGI parameters:

  • The client parameter indicates the type of client. You can choose any name. However, we suggest you choose a name that represents the true identity of the client, such as “yourawesomecompanyname”.
  • The appver parameter indicates the version of the client, such as "1.5.2".
  • The key parameter specifices your API key.
  • The pver parameter indicates the protocol version supported by the client. This should be "3.1". The format is "major.minor". If we update the protocol, we will make sure that minor revisions are always compatible; however, major revision will be incompatible and the server may not be able to cope with an older protocol.
  • The url parameter indicates the URL to look up. This must be a valid and properly encoded URL. Non-ASCII characters must be in UTF-8. For example, if the URL contains an '&', it could be interpreted as the CGI parameter separator. Therefore, all reserved characters must use percent encoding, which is defined in RFC 3986 .

Formal R-BNF description:

CLIENT  = (LOALPHA | "-")+
APIKEY = UNRESERVED+
APPVER = DIGIT ["." DIGIT]
PVER = 3 "." DIGIT
URL = valid URL string following the RFC 1738

Response Codes

The server generates the following HTTP response codes for the GET request:

  • 200: The queried URL is either phishing, malware, or both; see the response body for the specific type.
  • 204: The requested URL is legitimate and no response body is returned.
  • 400: Bad Request—The HTTP request was not correctly formed.
  • 401: Not Authorized—The API key is not authorized.
  • 503: Service Unavailable—The server cannot handle the request. Besides the normal server failures, this can also indicate that the client has been “throttled” for sending too many requests.

Possible reasons for the Bad Request (HTTP code 400):

  • Not all required CGI parameters are specified.
  • Some of the CGI parameters are empty.
  • The queried URL is not a valid URL or not properly encoded.

Response Body

For a GET request, the server will include the URL type in the response body when the queried URL matches the phishing, malware, or unwanted software lists (response code is 200):

GET_RESP_BODY = “phishing” | “malware” | "unwanted" | “phishing,malware” | "phishing,unwanted" | "malware,unwanted" | "phishing,malware,unwanted"

Where “phishing” means the queried URL is matched in our phishing lists, “malware” means the queried URL is matched in our malware lists, "unwanted" means the queried URL is matched in our unwanted software lists, and multiple returned URL types means there are matches in the corresponding lists.

HTTP POST Request

Use a POST request to check up to 500 URLs.

Request URL

Send the following URL:

https://sb-ssl.google.com/safebrowsing/api/lookup?client=CLIENT&key=APIKEY&appver=APPVER&pver=PVER

Required CGI parameters:

  • The client parameter indicates the type of client. You can choose any name. However, we suggest you choose a name that represents the true identity of the client, such as “yourcompanyname”.
  • The appver parameter indicates the version of the client, such as "1.5.2".
  • The key parameter specifices your API key.
  • The pver parameter indicates the protocol version supported by the client. This should be "3.1". The format is "major.minor". If we update the protocol, we will make sure that minor revisions are always compatible; however, major revision will be incompatible and the server may not be able to cope with an older protocol.

Request Body

Specify the queried URLs in the POST request body using the following format:

POST_REQ_BODY = NUM LF URL (LF URL)*
NUM = (DIGIT)+
URL = URL string following the RFC 1738

The request body contains several lines separated by LF.

  • The first line is a number that indicates how many URLs are included in the body. This number must match with the number of URLs listed.
  • The lines below are the URLs to be looked up. There must be one URL per line, and at least one URL overall (empty lines do not count). The URLs must be valid, but need not be encoded.

Response Codes

The server generates the following HTTP error codes for the POST request:

  • 200: AT LEAST ONE of the queried URLs are matched in either the phishing, malware, or unwanted software lists. The actual results are returned through the response body.
  • 204: NONE of the queried URLs matched the phishing, malware, or unwanted software lists, and no response body is returned.
  • 400: Bad Request—The HTTP request was not correctly formed.
  • 401: Not Authorized—The API key is not authorized.
  • 503: Service Unavailable—The server cannot handle the request. Besides the normal server failures, this could also indicate that the client has been “throttled” for sending too many requests.

Possible reasons for a Bad Request (HTTP code 400):

  • Not all the required CGI parameters are specified.
  • Some of the CGI parameters are empty.
  • Failed to specify the number of URLs in the first line of request body.
  • The number of URLs specified in the first line does not match the actual number of URLs supplied in the subsequent lines.
  • At least one of the queried URLs is not a valid URL.

Response Body

For a POST request, the server will return the URL type for each URL in the response body, if at least one of the queried URLs is found in the suspected phishing, malware, or unwanted software lists (response code is 200):

POST_RESP_BODY = VERDICT (LF VERDICT)*
VERDICT = “phishing” | “malware” | "unwanted" | “phishing,malware” | "phishing,unwanted" | "malware,unwanted" | "phishing, malware, unwanted" | “ok”

Where “phishing” means the queried URL is matched in our phishing lists, “malware” means the queried URL is matched in our malware lists, "unwanted" means the queried URL is matched in our unwanted software lists, and multiple URL types means there are matches in the corresponding lists. The "ok" parameter marks URLs that were not found in either list.

The server will return the results, one per line, in the same order as in the request.

Acceptable Usage in Clients

Usage Restrictions

  • You can query up to 500 URLs in a single POST request.
  • A single API key can make requests for up to 10,000 clients per 24-hour period.

We limit the number of different clients you can support with a single API key. If you expect that more than 10,000 distinct clients per day will request updates, you must contact us to have your API key provisioned for additional capacity. We want to make sure that we have contact information for large users that may potentially affect the service and its availability. At the present time there is no cost for this. For further questions about large deployments, contact antiphish-malware-cap-req@google.com.

Please note that if you violate the requirements detailed in this Acceptable Usage in Clients section, your key may be disabled for a period of time.

End-User Visible Warnings

If you use the Google Safe Browsing API to warn users about risks from particular webpages, we require that you follow certain guidelines. These guidelines help protect both you and Google from misunderstandings by making clear that the page is not known with 100% certainty to be a phishing site or a distributor of malware or unwanted software, and that the warnings merely identify possible risk.

  • In your end-user visible warning, you may not lead users to believe that the page in question is, without a doubt, a phishing page or a page that distributes malware or unwanted software. When you refer to the page being identified or the potential risks it may pose to users, you must qualify the warning using terms such as: suspected, potentially, possible, likely, may be.
  • Your warning must enable the user to learn more by reviewing information at http://www.antiphishing.org/ (for phishing warnings), http://www.stopbadware.org/ (for malware warnings), or https://www.google.com/about/company/unwanted-software-policy.html (for unwanted software warnings).
  • When you show warnings for pages identified as risky by the Safe Browsing API, you must give attribution to Google by including the line "Advisory provided by Google," with a link to http://code.google.com/apis/safebrowsing/safebrowsing_faq.html#whyAdvisory. If your product also shows warnings based on other sources, you may not include the Google attribution in warnings derived from non-Google data.

Suggested warning language

We encourage you to just copy this warning language in your product, or modify it slightly to fit your product.

Warning—Suspected phishing page. This page may be a forgery or imitation of another website, designed to trick users into sharing personal or financial information. Entering any personal information on this page may result in identity theft or other abuse. You can find out more about phishing from www.antiphishing.org.

Warning—Visiting this web site may harm your computer. This page appears to contain malicious code that could be downloaded to your computer without your consent. You can learn more about harmful web content including viruses and other malicious code and how to protect your computer at StopBadware.org.

Warning—The site ahead may contain harmful programs. Attackers might attempt to trick you into installing programs that harm your browsing experience (for example, by changing your homepage or showing extra ads on sites you visit). You can learn more about unwanted software at https://www.google.com/about/company/unwanted-software-policy.html.

Notice to Users About Phishing, Malware, and Unwanted Software Protection

Our Terms of Service require that if you indicate to users that your service provides malware, phishing, or unwanted software protection, you must also let them know that the protection is not perfect. This notice must be visible to them before they enable the protection, and it must let them know that there is a chance of both false positives (safe sites flagged as risky) and false negatives (risky sites not flagged). We suggest using the following language:

Google works to provide the most accurate and up-to-date phishing, malware, and unwanted software information. However, Google cannot guarantee that its information is comprehensive and error-free: some risky sites may not be identified, and some safe sites may be identified in error.

Reporting Incorrect Data

If you would like to help us improve our data, you can submit reports to us. We also encourage you to allow your users to send reports directly to us by including these URLs in your product. The hl parameter in the URL is a language code—values such as "en" and "de" are supported—or you can omit this parameter.

Report phishing URLs that are not currently on our list

http://www.google.com/safebrowsing/report_phish/? continue=http%3A%2F%2Fwww.google.com%2Ftools%2Ffirefox%2Ftoolbar%2FFT2%2Fintl%2F%3Clang%3E%2Fsubmit_success.html&hl=en

Report URLs that are currently on our phishing list in error:

http://www.google.com/safebrowsing/report_error/?continue=http%3A%2F%2Fwww.google.com%2Ftools%2Ffirefox% 2Ftoolbar%2FFT2%2Fintl%2Fen%2Fsubmit_success.html&hl=en

Report malware URLs that are not currently on our malware list

http://www.google.com/safebrowsing/report_badware/

Report URLs that are currently on our malware list in error:

http://www.stopbadware.org/home/reviewinfo

References

Appendix

RBNF

This document uses a R-BNF notation, which is a mix of Extended BNF and PCRE-style regular expressions:

  • Rules are in the form: name = definition. Rule names referenced as-is in the definition. Angle brackets may be used to help facilitate discerning the use of rule names.
  • Literals are surrounded by quotation marks: "literal".
  • Sequences: (rule1 rule2) or simply rule1 rule2.
  • Alternatives groups: (rule1 | rule2).
  • Optional groups: [rule[]].
  • Repetition: rule* means 0 or more of this rule or this group.
  • Repetition: rule+ means 1 or more of this rule or this group.

The following basic rules that describe the US-ASCII character set are also used as defined in RFC 2616:

  • UPALPHA = <any US-ASCII uppercase letter "A".."z">
  • LOALPHA = <any US-ASCII lowercase letter "a".."z">
  • ALPHA = UPALPHA | LOALPHA
  • DIGIT = <any US-ASCII digit "0".."9">
  • LF = <US-ASCII LF, line-feed (10)>