SEMrush keeps taking down my site

My site is being absolutely bombarded by page requests originating from semrush.com. Its a bot that is indexing and downloading links to my entire site, which would be fine if it were a small website, except that I have content going back to 1996, which is rather a lot, and they are hitting the server with so many requests that Apache2 and/or MySQL Server 5.7 is crashing.

It’s utterly frustrating, and they have taken the site down 3 times in the past couple of hours. Their bot should be rate-limited so that it doesn’t cause issues like this! The site can cope with 100+ simultaneous visitors as it is running on an AWS T3 micro instance, which should be more than capable, but unfortunately, its not capable of dealing with the SEMrush bot.

In the coming hours I will be working on ways to permanently blacklist their services, and I suggest that other website owners do the same so that the same misfortune does not befall their site too.

Terminal – calculate number of lines of code in a directory

We had an interesting question, can we calculate how many lines of code we have written for an entire project? It turns out this isnt the easiest thing to calculate for a web-project, but we gave it a go. This is the best we have come up with so far for all code we have written to calculate the number of lines of code in all PHP, CSS, JS, HTML and HTM pages.

( find ./ -name '*.php' -print0 -o -name '*.css' -print0 -o -name '*.js' -print0 -o -name '*.html' -print0 -o -name '*.htm' -print0 | xargs -0 cat ) | wc -l

The answer for our particular project was 1500784 lines of code!

If you wanted to do just PHP pages its rather easier:

( find ./ -name '*.php' -print0 | xargs -0 cat ) | wc -l