Free tools
Robots.txt Examples
Robots.txt file content for thehindu.com.
Robot.txt file for: thehindu.com
User-agent: * Disallow: /cgi-bin/ Disallow: /cdn-cgi/* Disallow: /config/ Disallow: */analysis-logger/* Disallow: /todays-paper/*article*.ece* Disallow: /migration_catalog/ Disallow: */getapplink* Disallow: /summercamp/ Disallow: */wf.fragment/* Disallow: /poll/vote.do Disallow: /photo/10000/ Disallow: */photo/10000/ #Disallow: /newsletter/* Disallow: /news-service/* #Disallow: /ebooks/ Disallow: /search/ Disallow: /SEARCH/ Disallow: /Search/ Disallow: /22390678/ Disallow: /walkathon Disallow: /heelsonwheels Disallow: /archive/print/ #Disallow: /todays-paper/ #Disallow: /static/content/ #Disallow: /static/ Disallow: /todayspaper/ Disallow: /today-paper/ Disallow: /thseweekend Disallow: /ywsummercamp Disallow: /frnf2014 Disallow: /summermusic Disallow: /summercamp Disallow: /yw25 Disallow: /thischess Disallow: /ywquiz Disallow: /mumbaiedition Disallow: /marathon Disallow: /flndrmk-spl Disallow: /system/ Disallow: /sachin Disallow: /supermom2016 Disallow: /ywq2015 Disallow: /fooddirectory Disallow: /chennaicoastalcleanup Disallow: /onam-1 Disallow: /onam-2 Disallow: /hohkidscarnival Disallow: /HOW2014 Disallow: /how2014 Disallow: /cookingcontest Disallow: /youngworld Disallow: /impulse2014 Disallow: /mssaward2015 Disallow: /mssaward2016 Disallow: /wow2015 Disallow: /wow2017/chennai/ Disallow: /wow2017/bangalore/ Disallow: /tickets2016/ Disallow: /tickets2017/ Disallow: /tickets2018/ Disallow: /ticket2018/ Disallow: /ywsummercamp/ Disallow: /tags/TH category/ Disallow: /tags/TopicRoot_TH/ Disallow: /thehindu/ Disallow: /200* Disallow: /201* Disallow: */http:* Disallow: */https:* Disallow: */mailto:* Disallow: *.ecehttp* Disallow: *.ece1http* Disallow: *.ece2http* Disallow: /brandhub/sponsored-content/ Disallow: /coupons/ # Blocking ptid and gl parameter URLs that are causing crawl budget wastage Disallow: */?_ptid=* Disallow: */?_gl=* # Block picture sitemap because urls are noindex therein Disallow: /sitemap/archive/picture/ # Block all space or %20 suffixed with http or https protocols Disallow: * http:* Disallow: * https:* Disallow: *;http:* Disallow: *;https:* Disallow: *%20http:* Disallow: *%20https:* Disallow: */couponRedirect Disallow: *?redirect= Disallow: *?store= Disallow: /tag/*/*/ # Blocked until duplicate profile bylines fixed Disallow: /profile/ # profile articles allowed Allow: /profile/*article*.ece # profile exceptions allowed for pages with page=1 for rediects Allow: /profile/contributor/? Allow: /profile/photographers/? Allow: /profile/author/? Allow: /profile/contributor/$ Allow: /profile/photographers/$ Allow: /profile/author/$ Allow: /profile/$ Disallow: /sitemap/archive* Disallow: */fragment/* Disallow: /sitemap/ Allow: /sitemap/$ Allow: /sitemap/update/all.xml$ Allow: /sitemap/update/section.xml$ Allow: /sitemap/googlenews/all/all.xml$ #Disallow Wayback machine which is a problem crawler User-Agent: ia_archiver Disallow: / #Blocking AI Crawlers User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: Bytespider Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: FacebookBot Disallow: / User-agent: CCBot Disallow: / User-agent: Anthropic-ai Disallow: / User-agent: YouBot Disallow: / Sitemap: https://www.thehindu.com/sitemap/googlenews/all/all.xml Sitemap: https://www.thehindu.com/sitemap/update/all.xml Sitemap: https://www.thehindu.com/sitemap/update/section.xml Sitemap: https://www.thehindu.com/feeder/default.rss Sitemap: https://www.thehindu.com/?service=videositemap