SEMrush keeps taking down my site

My site is being absolutely bombarded by page requests originating from semrush.com. Its a bot that is indexing and downloading links to my entire site, which would be fine if it were a small website, except that I have content going back to 1996, which is rather a lot, and they are hitting the server with so many requests that Apache2 and/or MySQL Server 5.7 is crashing.

It’s utterly frustrating, and they have taken the site down 3 times in the past couple of hours. Their bot should be rate-limited so that it doesn’t cause issues like this! The site can cope with 100+ simultaneous visitors as it is running on an AWS T3 micro instance, which should be more than capable, but unfortunately, its not capable of dealing with the SEMrush bot.

In the coming hours I will be working on ways to permanently blacklist their services, and I suggest that other website owners do the same so that the same misfortune does not befall their site too.

MySQL Collation Error

Occasionally I come across databases written by other developers who have a complete mix of character sets, and where the database character set does not match the character set of fields within a table. This causes all sorts of nasty errors when trying to use CONCAT or CONCAT_WS. Errors like:

Illegal mix of collations for operation ‘concat_ws’)

The most common cause is that the database type does not match the field or table types.

1. Identify the character set that the fields within the table use with the following sql script:

SELECT
COLUMN_NAME,
CHARACTER_SET_NAME,
COLLATION_NAME
FROM
information_schema.`COLUMNS`
WHERE
table_name = “yourtablenamehere”;

 

2. Identify the character set that the database uses:

SELECT
default_character_set_name
FROM
information_schema.SCHEMATA
WHERE
schema_name = “solution9ssu”;

 

These should be the same, however a very common difference is that some developers create the database using utf-8 and the fields imported are in latin1_swedish_ci. This causes the collation error.

To solve: alter the character encoding of each table. NB. This can cause all manner of issues, so take a full backup first!

ALTER TABLE yourtablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

MySQL ibtmp1 file grows exponentially

Recently we have had several servers run out of hard drive space causing inaccessibility and crashing. On further investigation the hard drive was full so many operations were complaining. The hard drive was being consumed by a very large file within the /var/lib/mysql directory called ibtmp1.

ibtmp1 is a temporary working file for MySQL. When you turn the MySQL server off and on again the file is deleted and the server runs again without issue until the file becomes too large again.

After some quick research we found that adding or editing a few settings in the MySQL configuration allows you to limit the size of this file and prevent this issue from reoccurring.

Open the mysqld.cnf file. The location of our mysqld.cnf was in /etc/mysql/mysql.conf.d/mysqld.conf. According to the file you can also copy this configuration to /etc/mysql/my.cnf. Our my.cnf just had an include to the mysqld.cnf file.

The file should have some sections like:

[mysqld_safe]
[mysqld]

There should be a section for InnoDB settings. If not, it is fine to add these configuration settings at the bottom of the file.

Here is the configuration variable:

innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:1G

This value can be tweaked based on your server setup and needs but these are the settings we used for our server to set the file to start at 12 MB and limit it to 1 GB.

While setting this we also set a few other variables for optimisation:

innodb_buffer_pool_size=2048M
innodb_log_file_size=25M
innodb_log_buffer_size=80M