

To ensure high performance, which can also improve your search rankings, you may want to use an Edge Computing solution, such as or Cloudflare Workers. The approach is very similar to excluding bots from analytics.Īgain, we can test our implementation with curl.īrowser curl -H "user-agent: A-Browser" Modern websites, especially single-page-applications can be hard for crawlers to read and index, so we can serve them a simplified version. Serving a different version of your website to crawlers can cause a major improvement in SEO. Serving a Search Engine Optimised version of your site This can easily be adapted to totally block traffic from bots or referral spammers from accessing your website, you can treat them like threats (see line 11 of index.js), but this should usually be unnecessary for most sites. Referral spammer curl -H "user-agent: A-Browser" -H "referer: " Īs you can see, when we request the website from a normal browser, they’re served the analytics scripts, but bots and referral spammers are excluded.

Now, we can test our implementation using curl.īrowser with no referrer curl -H "user-agent: A-Browser" īot user-agent curl -H "user-agent: bot" Res.status(403).send('You are not allowed to access this site') **index.js** const isbot = require('isbot')Ĭonst isSpammer = require('./is-spammer') // isSpammer function defined aboveĬonst getIpData = require('./get-ip-data') // getIpData function defined aboveĬonst ipdata = await getIpData() In Node.js with Express and EJS, we can do this with just a few lines of code. This can help prevent bots from consuming your usage limits, and ensures consistency across different tools. Many tools have a built-in methods for excluding bots, but if you wish to control this yourself, you can exclude the tracking codes. Most people want to exclude all bot traffic from their web analytics tools. Now we’ve detected a majority of our bot traffic, we need to decide what to do with it! Excluding bots from analytics "hosting" - this IP is a Microsoft hosting IP

_threat // false - this IP is not associated with threats const axios = require("axios") Ĭonst microsoftIpData = await getIpData("13.107.6.152") Here’s a small script, which we’ll use later, to detect hosting providers and threats. Hosting IPs are often used for bots and hacking attempts, but they could also be legitimate proxies. Additionally, ipdata’s ASN API can detect IP addresses which are associated with hosting providers, such as AWS. ipdata’s threat API can help to do exactly this. It’s best to rely on threat data to ensure you’re able to block requests from IPs which have been flagged as malicious. Return referer & spammerList.some(spammer => referer.includes(spammer))īots which aren’t referral spammers will be very difficult to detect. filter(Boolean) // filter out empty lines

split('\n') // each spammer is on a new line referral-spammers.txt downloaded from Ĭonst spammerList = fs.readFileSync('./referral-spammers.txt')
Username for trafficbot how to#
We can compare the Referer header against this list and decide how to handle it… const fs = require('fs') Luckily, there’s an open-source list of known referral spammers. For that reason, they won’t set a recognisable User-Agent, and it’s impossible to reliably detect all of them. Referral spam bots, and other illegitimate bots, will try to disguise themselves as normal visitors to your website. Usage of isbot is simple: const isbot = require('isbot') Most major programming languages will have a library available – for Node.js, there’s a library named () and for PHP there’s one named (). One frustrating example is that Cubot is often detected as a bot, but it’s not a bot at all – it’s a mobile phone manufacturer.įor those reasons, it’s best to use a library to detect whether it’s a bot. Most of these include bot in them, such as Googlebot, but there’s a long list of rules and exceptions. Legitimate bots will send a User-Agent which indicates that they’re a bot. There are plenty of illegitimate bots too – one common issue is referral spam, advertising sites by making them show up in your web analytics, consuming your resources and usage limits. Many are legitimate, such as Search Engine bots (or spiders), which crawl your website to include pages on search engines like Google. Detecting and optimising your site for bot trafficīots come in all shapes and sizes.
