The issue

Last week, I experienced my first LLM crawler attack on my Redmine forge.

When I checked the most recent Nginx logs, I noticed that many lines looked like this :

47.79.5.126 - - [13/May/2025:00:08:07 +0000] "GET /projects/esmeralda/repository/master/revisions/446c37308fd875c5c932852b62740fe49f2dbba1/entry/app/views/articles/show.haml
HTTP/1.1" 200 4757 "https://redmine.jpages.dev/projects/esmeralda/repository/master/revisions/446c37308fd875c5c932852b62740fe49f2dbba1/entry/app/views/articles/show.haml"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43" 4.76 0.875

For these requests, the User-Agent was "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43" and the IP addresses began with 47.79 or 47.82. Many requests were made for each page, and often on old Mercurial revisions of my projects (that’s really stupid, as these pages are not supposed to change) At this point, it was clear that these requests were sent by a crawler written by someone who simply does not care about basic web politeness rules.

The solution

I could easily isolate the suspicious IP addresses with the following command :

$ grep "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43" /var/log/nginx/redmine-ssl-proxy-access | awk 
'{print $1}' | sort | uniq > bot_ips

This command gave me a list of 646 addresses I added in the text file /etc/nginx/blocked_ips.conf

Finally, I modified the Nginx configuration file (/etc/nginx/nginx.conf), adding the line include /etc/nginx/blocked_ips.conf in every server block.

My spammer still tried to spam my server with requests, but he only got HTTP 403 errors.