<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://www.cookipedia.co.uk/wiki/index.php?action=history&amp;feed=atom&amp;title=Limiting_Chat_GPTBot_Crawl_Rate</id>
	<title>Limiting Chat GPTBot Crawl Rate - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.cookipedia.co.uk/wiki/index.php?action=history&amp;feed=atom&amp;title=Limiting_Chat_GPTBot_Crawl_Rate"/>
	<link rel="alternate" type="text/html" href="https://www.cookipedia.co.uk/wiki/index.php?title=Limiting_Chat_GPTBot_Crawl_Rate&amp;action=history"/>
	<updated>2026-04-21T18:15:48Z</updated>
	<subtitle>Revision history for this page on [[Cookipedia]]</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://www.cookipedia.co.uk/wiki/index.php?title=Limiting_Chat_GPTBot_Crawl_Rate&amp;diff=272392&amp;oldid=prev</id>
		<title>Chef at 10:01, 6 December 2025</title>
		<link rel="alternate" type="text/html" href="https://www.cookipedia.co.uk/wiki/index.php?title=Limiting_Chat_GPTBot_Crawl_Rate&amp;diff=272392&amp;oldid=prev"/>
		<updated>2025-12-06T10:01:09Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;lt;!-- seo --&amp;gt;&lt;br /&gt;
{{#seo:&lt;br /&gt;
|title=Limiting_Chat_GPTBot_Crawl_Rate&lt;br /&gt;
|titlemode=replace&lt;br /&gt;
|keywords=#tools #chatGPT #Robotstxt #fail2ban #iptables #firewalld #serverload #apache #webcrawler #pest&lt;br /&gt;
|hashtagrev=12032020&lt;br /&gt;
|description=Prevent AI Robots from hammering your webserver by repeated and uncontrolled crawling&lt;br /&gt;
}}&lt;br /&gt;
&amp;lt;!-- /seo --&amp;gt;&lt;br /&gt;
[[Image:{{PAGENAME}}.jpg|300px|thumb|right|Bye-bye, AI Crawler]]&lt;br /&gt;
=== Limiting Chat GPTBot Crawl Rate===&lt;br /&gt;
Recently ChatGPT&amp;#039;s crawler robot &amp;#039;&amp;#039;GPTBot&amp;#039;&amp;#039; had been crawling this site &amp;#039;&amp;#039;so&amp;#039;&amp;#039; heavily that it has raised the processor load so high that it had rendered the site inoperable to normal users. &lt;br /&gt;
&lt;br /&gt;
It was hitting the server from multiple ip addresses more than 30 times per second. &lt;br /&gt;
&lt;br /&gt;
For the benefit of others, here are various ways to prevent this crawler from slowing down your Linux server.&lt;br /&gt;
&lt;br /&gt;
Full User-agent string: &amp;lt;pre&amp;gt;Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To limit the crawl rate and ensure your server remains usable, you can control how frequently GPTBot (and other bots) request pages by configuring your robots.txt file or using rate-limiting techniques at the server level.&lt;br /&gt;
&lt;br /&gt;
===Step 1: Adjust Crawl Rate with robots.txt===&lt;br /&gt;
You can limit the crawl rate for GPTBot by adding the following to your robots.txt file:&lt;br /&gt;
&lt;br /&gt;
==== December 2025====&lt;br /&gt;
The following crawl delay seems to be working.  The extra instructions after the crawl delay are specific to MediaWiki servers.  They allow the site to be crawled but prevent processor-expensive queries which can create infinite loops or extremely large URL trees.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# Crawl-delay: This sets a delay (in seconds) between each request from the bot. Adjust the value (e.g., 10 seconds) to a rate that suits your server load.&lt;br /&gt;
User-agent: GPTBot&lt;br /&gt;
Crawl-delay: 10&lt;br /&gt;
Disallow: /*action=&lt;br /&gt;
Disallow: /*oldid=&lt;br /&gt;
Disallow: /*curid=&lt;br /&gt;
Disallow: /*diff=&lt;br /&gt;
Disallow: /*printable=&lt;br /&gt;
Disallow: /*redlink=&lt;br /&gt;
Disallow: /*mobileaction=&lt;br /&gt;
Disallow: /index.php?title=Special:&lt;br /&gt;
&lt;br /&gt;
User-agent: ChatGPT-User&lt;br /&gt;
Crawl-delay: 10&lt;br /&gt;
Disallow: /*action=&lt;br /&gt;
Disallow: /*oldid=&lt;br /&gt;
Disallow: /*curid=&lt;br /&gt;
Disallow: /*diff=&lt;br /&gt;
Disallow: /*printable=&lt;br /&gt;
Disallow: /*redlink=&lt;br /&gt;
Disallow: /*mobileaction=&lt;br /&gt;
Disallow: /index.php?title=Special:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
===Step 2: Use Server-Side Rate Limiting===&lt;br /&gt;
You can also set rate limits using server-side tools like mod_qos (for Apache) or ngx_http_limit_req_module (for NGINX). These modules help manage how many requests are allowed per second per IP address.&lt;br /&gt;
====NGINX Configuration (if you are using NGINX):====&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
http {&lt;br /&gt;
    limit_req_zone $binary_remote_addr zone=bot_zone:10m rate=1r/s;&lt;br /&gt;
&lt;br /&gt;
    server {&lt;br /&gt;
        location / {&lt;br /&gt;
            limit_req zone=bot_zone burst=5 nodelay;&lt;br /&gt;
        }&lt;br /&gt;
    }&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This limits bots to 1 request per second, with a burst capacity of 5.&lt;br /&gt;
&lt;br /&gt;
====Apache Configuration (if you are using Apache):====&lt;br /&gt;
You can use mod_qos to limit the requests:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
QS_SrvRequestRate 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This limits requests to 1 per second.&lt;br /&gt;
&lt;br /&gt;
===Step 3: Use Fail2Ban for Rate-Limiting Bots (Advanced)===&lt;br /&gt;
If you are using Fail2Ban with iptables or firewalld, you can also set up a Fail2Ban rule to detect excessive bot traffic and throttle it:&lt;br /&gt;
&lt;br /&gt;
Create a custom jail for GPTBot in /etc/fail2ban/jail.local:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[gptbot]&lt;br /&gt;
enabled  = true&lt;br /&gt;
port     = http,https&lt;br /&gt;
filter   = gptbot&lt;br /&gt;
logpath  = /var/log/apache2/access.log  # or /var/log/nginx/access.log&lt;br /&gt;
maxretry = 10&lt;br /&gt;
findtime = 60&lt;br /&gt;
bantime  = 600&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Create a filter in /etc/fail2ban/filter.d/gptbot.conf:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[Definition]&lt;br /&gt;
failregex = &amp;lt;HOST&amp;gt; - - .*&amp;quot;GET .* HTTP.*&amp;quot; .* &amp;quot;GPTBot&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will ban IPs that send more than 10 requests in 60 seconds for 10 minutes.&lt;br /&gt;
&lt;br /&gt;
By combining robots.txt settings and server-side rate limiting, you can control bot activity and prevent server overload.&lt;br /&gt;
== Completely Block Chat GPT Bots from your server==&lt;br /&gt;
As an absolute last resort you can use the following iptables rules to block &amp;lt;b&amp;gt;all&amp;lt;/b&amp;gt; Chat GPT Ip addresses from your server. This list is in CIDR format. These rules can easily be modified for use with firewalld or another firewalls.  &amp;lt;i&amp;gt; IP List valid as of December 2025&amp;lt;/i&amp;gt; &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/sbin/iptables -I INPUT -s 132.196.86.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 172.182.202.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 172.182.204.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 172.182.207.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 172.182.214.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 172.182.215.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 20.125.66.80/28 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 20.171.206.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 20.171.207.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 4.227.36.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 52.230.152.0/24 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.175.128/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.227.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.227.128/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.228.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.230.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.241.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.241.128/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.242.0/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.243.128/25 -j REJECT # Chat GPT&lt;br /&gt;
/sbin/iptables -I INPUT -s 74.7.244.0/25 -j REJECT # Chat GPT&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
===How to check what the Bot see when they crawl your site [linux]===&lt;br /&gt;
If you want to see what a specific bor oe crawler sees when they spider your website, use the command &amp;lt;pre&amp;gt; curl -A &amp;quot;BOTNAME&amp;quot; -I http://website_to_test&amp;lt;/pre&amp;gt;&lt;br /&gt;
See what Google sees when it spiders cookipedia.co.uk&lt;br /&gt;
&amp;lt;pre&amp;gt;curl -A &amp;quot;Googlebot&amp;quot; -I https://www.cookipedia.co.uk/&amp;lt;/pre&amp;gt;&lt;br /&gt;
See what A certain backlinks pest sees when it spiders cookipedia.co.uk&lt;br /&gt;
&amp;lt;pre&amp;gt;curl -A &amp;quot;SERankingBacklinksBot&amp;quot; -I https://www.cookipedia.co.uk/&amp;lt;/pre&amp;gt;&lt;br /&gt;
See what A GPTBot sees when it spiders cookipedia.co.uk&lt;br /&gt;
&amp;lt;pre&amp;gt;curl -A &amp;quot;GPTBot&amp;quot; -I https://www.cookipedia.co.uk/&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{CategoryLine}}&lt;br /&gt;
[[Category:Tools]]&lt;br /&gt;
&amp;lt;!-- footer hashtags --&amp;gt;&amp;lt;code &amp;#039;hashtagrev:12032020&amp;#039;&amp;gt;#tools #chatGPT #Robotstxt #fail2ban #iptables #firewalld #serverload #apache #webcrawler #pest&amp;lt;/code&amp;gt;&amp;lt;!-- /footer_hashtags --&amp;gt;&lt;/div&gt;</summary>
		<author><name>Chef</name></author>
	</entry>
</feed>