robots.txt 實用規則

以下是 robots.txt 一些常見的實用規則：

實用規則
禁止檢索整個網站	提醒您，在某些情況下，未經檢索的網站網址仍可能會編入索引。注意：使用這項規則時，比對範圍不含各種 AdsBot 檢索器；如要比對 AdsBot 檢索器，必須特別指明。 User-agent: * Disallow: /
允許檢索整個網站 (使用空白的 `Disallow` 規則)	這會明確允許所有檢索器存取整個網站。這與完全沒有 robots.txt 檔案，或是使用 `Allow: /` 規則，在功能上是等同的。 User-agent: * Disallow:
禁止檢索特定目錄及其中內容	在目錄名稱後方附加正斜線，即可禁止檢索整個目錄。注意：提醒您，如果想禁止存取私人內容，請不要使用 robots.txt，而是要改用適當的驗證機制。即使是 robots.txt 檔案禁止的網址，仍有可能在未經檢索的情況下編入索引；此外，由於任何人都能查看 robots.txt 檔案，所以私人內容的位置也可能因此曝光。 User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/
禁止檢索單一網頁	例如，禁止檢索位於 `https://example.com/useless_file.html` 的 `useless_file.html` 頁面，以及 `junk` 目錄中的 `other_useless_file.html` 頁面。 User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
禁止檢索整個網站，但子目錄除外	檢索器只能存取 `public` 子目錄。 User-agent: * Disallow: / Allow: /public/
允許單一檢索器存取網站內容	只有 `Googlebot-News` 可檢索整個網站。 User-agent: Googlebot-News Allow: / User-agent: * Disallow: /
允許所有檢索器存取網站內容，但某一個檢索器除外	`Unnecessarybot` 不得檢索網站，但其他漫遊器可以。 User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
禁止檢索整個網站，但允許 `Storebot-Google` 進行檢索	這麼做會讓您的網頁無法顯示在 Google 搜尋結果中，但 `Storebot-Google` 網路檢索器仍可分析網頁，以在 Google 購物上顯示您的產品。 User-agent: * Disallow: / User-agent: Storebot-Google Allow: /
禁止 Google 檢索您網站上的所有圖片 (包括 Google 圖片和探索專區等 Google 顯示圖片的任何位置)	Google 不得為未經檢索的圖片和影片建立索引。 User-agent: Googlebot-Image Disallow: /
禁止 Google 圖片檢索特定圖片	例如，禁止 `dogs.jpg` 圖片。 User-agent: Googlebot-Image Disallow: /images/dogs.jpg
禁止檢索特定類型的檔案	例如，禁止檢索所有的 `.gif` 檔案。 User-agent: Googlebot Disallow: /*.gif$
使用 `*` 和 `$` 萬用字元來比對結尾為特定字串的網址	例如，封鎖所有 `.xls` 檔案： User-agent: Googlebot Disallow: /.xls$ `$` 萬用字元代表網址結尾。也就是說，如果網址在模式後方有額外字元 (例如網址參數)，就不會相符。舉例來說，`https://example.com/cats.xls?personality=loki` 不會遭到 `/.xls$` 規則封鎖。
將多個使用者代理程式合併為單一群組	將多個檢索器的規則整合到一個群組，可縮短檔案長度並簡化管理作業，因為群組中的所有規則都會套用至列出的每個使用者代理程式。這樣做等同於列出使用者代理程式兩次，並分別套用規則。 User-agent: Googlebot User-agent: Storebot-Google Allow: /cats Disallow: /