How To Block Bots By User-agent
Why you should block some crawling bots
The activity of crawling bots and spider bots of well-known search engines usually does no matter site load and does not affect a website's work speed. But the most part of crawling bots is not helpful, moreover, they harm the site performance.
For example, bots like DotBot or Semrush. We have experienced these bots sent so many requests to the site, so it was like a small DDoS attack effect. This led to a heavy overload of the site and the server, and the site was inaccessible to other visitors.
We strongly recommend blocking overly active bots if your site has more than 100 pages, especially if your account has already exceeded the provided load limits.
Two ways to block harmful bots
1. Using CleanTalk Anti-Spam plugin with Anti-Flood and Anti-Crawler options enabled.
This way is preferred because the plugin detects bot activity according to its behavior. Any bot with high activity will be automatically redirected to 403 for some time, independent of user-agent and other signs. Web crawling bots such as Google, Bing, MSN, Yandex are excluded and will not be blocked.
More information about the options: https://cleantalk.org/help/anti-flood-and-anti-crawler
Installation guide: https://cleantalk.org/help/install-wordpress
2. Using .htacces for apache servers or nginx.conf file for Nginx.
We do not recommend using these methods. Note, a too-large list records in .htaccess will slow down the web-server work!
How to block popular crawling bots using .htacces file for Apache and nginx.conf for Nginx
1. How to block Baidu bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block baidu bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} baidu [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block baidu bot nginx
if ($http_user_agent ~* (baidu|baidubot) ) {
return 403;
}
2. How to block AhrefsBot
Using .htaccess:
Add this code to the end of .htaccess file:
# block AhrefsBot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block AhrefsBot bot nginx
if ($http_user_agent ~* (AhrefsBot) ) {
return 403;
}
3. How to block MJ12bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block MJ12bot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block MJ12bot bot nginx
if ($http_user_agent ~* (MJ12bot) ) {
return 403;
}
4. How to block Detectify bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block detectify bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} Detectify [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block detectify bot nginx
if ($http_user_agent ~* (Detectify) ) {
return 403;
}
5. How to block DuckDuckGo bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block DuckDuckGo bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} DuckDuckGo [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block DuckDuckGo bot nginx
if ($http_user_agent ~* (DuckDuckGo) ) {
return 403;
}
6. How to block Semrush bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Semrush bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} semrush [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Semrush bot nginx
if ($http_user_agent ~* (semrush) ) {
return 403;
}
7. How to block Seznam bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Seznam bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} seznam [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Seznam bot nginx
if ($http_user_agent ~* (seznam) ) {
return 403;
}
8. How to block Zgrab bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Zgrab bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} zgrab [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Zgrab bot nginx
if ($http_user_agent ~* (zgrab) ) {
return 403;
}
9. How to block Petalbot bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Petalbot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} petalbot [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Petalbot bot nginx
if ($http_user_agent ~* (petalbot) ) {
return 403;
}
10. How to block Jorgee bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Jorgee bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} jorgee [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Jorgee bot nginx
if ($http_user_agent ~* (Jorgee) ) {
return 403;
}
11. How to block Yandex bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Yandex bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} yandex [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Yandex bot nginx
if ($http_user_agent ~* (yandex) ) {
return 403;
}
12. How to block Dotbot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Dotbot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} dotbot [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Dotbot bot nginx
if ($http_user_agent ~* (dotbot) ) {
return 403;
}
13. How to block Sogou bot
Using .htaccess:
Add this code to the end of .htaccess file:
# block Sogou bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} sogou [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block Sogou bot nginx
if ($http_user_agent ~* (sogou) ) {
return 403;
}
14. How to block multiple bots at the same time
Using .htaccess:
Add this code to the end of .htaccess file:
# block bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} baidu [NC]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC]
RewriteRule .* - [F,L]
</IfModule>
Using nginx.conf:
Add this code HTPP{} section of nginx.conf:
#block bot nginx
if ($http_user_agent ~* (baidu|baidubot|AhrefsBot|MJ12bot) ) {
return 403;
}
You can block any user-agent you need. See the list of known crawlers.
yandex
baidu
petalbot
semrush
Cliqzbot
SurdotlyBot
zgrab
Jorgee
dotbot
seznam
duckduckgo
sogou
exabot
AhrefsBot
InterfaxScanBot
SputnikBot
SolomonoBot
MJ12bot
Detectify
Riddler
omgili
socialmediascanner
Jooblebot
SeznamBot
Scrapy
CCBot
linkfluence
veoozbot
Leikibot
Seopult
Faraday
hybrid
Go-http-client
SMUrlExpander
SNAPSHOT
getintent
ltx71
Nuzzel
SMTBot
Laserlikebot
facebookexternalhit
mfibot
OptimizationCrawler
crazy
Dispatch
ubermetrics
HTMLParser
musobot
petalbot
filterdb
InfoSeek
omgilibot
DomainSigma
SafeSearch
CommentReader
meanpathbot
statdom
proximic
spredbot
StatOnlineRuBot
openstat
DeuSu
semantic
postano
masscan
Embedly
NewShareCounts
linkdexbot
GrapeshotCrawler
Digincore
NetSeer
help.jp
PaperLiBot
getprismatic
360Spider
Ahrefs
ApacheBench
Aport
Applebot
archive
BaiduBot
Baiduspider
Birubot
BLEXBot
bsalsa
Butterfly
Buzzbot
BuzzSumo
CamontSpider
curl
dataminr
discobot
DomainTools
DotBot
Exabot
Ezooms
FairShare
FeedFetcher
FlaxCrawler
FlightDeckReportsBot
FlipboardProxy
FyberSpider
Gigabot
gold crawler
HTTrack
ia_archiver
InternetSeer
Jakarta
Java
JS-Kit
km.ru
kmSearchBot
Kraken
larbin
libwww
Lightspeedsystems
Linguee
LinkBot
LinkExchanger
LinkpadBot
LivelapBot
LoadImpactPageAnalyzer
lwp-trivial
majestic
Mediatoolkitbot
MegaIndex
MetaURI
MJ12bot
MLBot
NerdByNature
NING
NjuiceBot
Nutch
OpenHoseBot
Panopta
pflab
PHP/
pirst
PostRank
ptd-crawler
Purebot
PycURL
Python
QuerySeekerSpider
rogerbot
Ruby
SearchBot
SemrushBot
SISTRIX
SiteBot
Slurp
Sogou
solomono
Soup
spbot
suggybot
Superfeedr
SurveyBot
SWeb
trendictionbot
TSearcher
ttCrawler
TurnitinBot
TweetmemeBot
UnwindFetchor
urllib
uTorrent
Voyager
WBSearchBot
Wget
WordPress
woriobot
Yeti
YottosBot
Zeus
zitebot
ZmEu
It would also be interesting
- How to Block Facebook Crawler Bot with CleanTalk for WordPressHow to Block Facebook Crawler Bot with CleanTalk for WordPress A specific crawler you might encounter...
- Encoding Contact Data with a Shortcode and Hook for Third-Party PluginsEncoding Contact Data with a Shortcode and Hook for Third-Party Plugins This article explains how to...
- Encode contact data WordPress optionHow to hide (encode) email addresses on your WordPress website from crawlers and bots Since the...