URL structure best practices for Google
Google supports URLs as defined by RFC 3986. Characters defined by the standard as reserved must be percent encoded. Unreserved ASCII characters may be left in the non-encoded form. Additionally, characters in the non-ASCII range should be UTF-8 encoded.
When possible, use readable words rather than long ID numbers in your URLs.
Recommended: Simple, descriptive words in the URL:
Recommended: Localized words in the URL, if applicable.
Recommended: Use UTF-8 encoding as necessary. For example, the following example uses UTF-8 encoding for Arabic characters in the URL:
The following example uses UTF-8 encoding for Chinese characters in the URL:
The following example uses UTF-8 encoding for the umlaut in the URL:
The following example uses UTF-8 encoding for emojis in the URL:
Not recommended: Using non-ASCII characters in the URL:
Not recommended: Unreadable, long ID numbers in the URL:
If your site is multi-regional, consider using a URL structure that makes it easy to geotarget your site. For more examples of how you can structure your URLs, refer to using locale-specific URLs.
Recommended: Country-specific domain:
Recommended: Country-specific subdirectory with gTLD:
Consider using hyphens to separate words in your URLs, as it helps users and search engines
identify concepts in the URL more easily. We recommend that you use hyphens (
instead of underscores (
_) in your URLs.
Recommended: Hyphens (
Not recommended: Underscores (
Not recommended: Words in the URL joined together:
Common issues related to URLs
Overly complex URLs, especially those containing multiple parameters, can cause problems for crawlers by creating unnecessarily high numbers of URLs that point to identical or similar content on your site. As a result, Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all the content on your site.
Unnecessarily high numbers of URLs can be caused by a number of issues. These include:
- Additive filtering of a set of items. Many sites provide different views of
the same set of items or search results, often allowing the user to filter this set using
defined criteria (for example: show me hotels on the beach). When filters can be combined
in an additive manner (for example: hotels on the beach and with a fitness center), the number
of URLs (views of data) in the sites explodes. Creating a large number of slightly different
lists of hotels is redundant, because Googlebot needs to see only a small number of lists from
which it can reach the page for each hotel. For example:
- Hotel properties at "value rates":
- Hotel properties at "value rates" on the beach:
- Hotel properties at "value rates" on the beach and with a fitness center:
- Hotel properties at "value rates":
- Dynamic generation of documents. This can result in small changes because of counters, timestamps, or advertisements.
- Problematic parameters in the URL. Session IDs, for example, can create massive amounts of duplication and a greater number of URLs.
Sorting parameters. Some large shopping sites provide multiple ways to
sort the same items, resulting in a much greater number of URLs. For example:
- Irrelevant parameters in the URL, such as referral parameters. For example:
- Calendar issues. A dynamically generated calendar might generate links to
future and previous dates with no restrictions on start or end dates. For example:
- Broken relative links. Broken relative links can often cause infinite
spaces. Frequently, this problem arises because of repeated path elements. For example:
Resolve problems related to URLs
To avoid potential problems with URL structure, we recommend the following:
- Create a simple URL structure. Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans.
- Consider using a robots.txt file to block Googlebot's access to problematic URLs. Typically, consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars. Using regular expressions in your robots.txt file can allow you to easily block large numbers of URLs.
- Wherever possible, avoid the use of session IDs in URLs. Consider using cookies instead.
- If upper and lower case text in a URL is treated the same by the web server, convert all text to the same case so it is easier for Google to determine that URLs reference the same page.
- Whenever possible, shorten URLs by trimming unnecessary parameters.
If your site has an infinite calendar, add a
nofollowattribute to links to dynamically created future calendar pages.
- Check your site for broken relative links.