Category:SpiderAssassin
From Guifi.net - English Wiki
SpiderAssassin is a project to create an application for servers that makes statistics of use in order to determinate anomalies of access or interaction with web pages that would detect illegitimate use of the server such as comment spam spider and spiders of security holes. In future it would be able to determinate what is normal and what isn't for a particular website and send to a firewall what surpasses no-normal scale.
The database of the server would be shared with all the other servers.
This implementation consist of 4 parts:
- A core in Perl
- A front-end in PHP
- Use of fail2ban to write in iptables
- Use of .htaccess in a mainpath for any website
The licence is Affero GPL v.3.
The project is at the beginning of its developmental. If you want to cooperate, send an email to al(at)blogmail.cc or to kenneth(at)gnun.net
Contents |
Introduction
In general, the IDS (intrusion detection systems) are looking for URL's patterns/apache logs and depending on those, deny it. (URL = '=http://' , 'SELECT '../../etc. Method Disabled = TRACE, TRACK, PUT, MKCOL ...).
Unfortunately, those systems require constant manual actualization, so every day there are new patterns to search.
Mentioned above IDS can avoid between 85% & 90% (according to them) of attacks, but are forceless against Spam, which consume an important percentage of resources and time of management/administration of web server. Even though they have captcha, the load is not being reduced (CPU, RAM, databases and disc) that causes continuous access to those URLs.
The problem is that the IDS don't detect other patterns if we take into consideration the temporal variable, as they are not designed for that. Nonetheless the other patterns can be easily detected. That is the task of our project, to save those patterns into the database and to study them.
Patterns to be studies by SpiderAssassin
- All the petitions are POST: usual for spiders that send spam to forums and so. This pattern allows to determinate patterns (the same user-agent, the same number of bytes, etc...). It reduces a lot computer's load.
- Petitions are produced from a particular IP address during continuous 24 hours. Also easy to detect and significantly reduces the load.
- The percentage of GET and POST is unusually round (for IP's with more than 100 requests, it is weird to see GET/POST ratios almost round, such as 2:1 1:1 1:2 ...).
- Spiders are unlikely to download any multimedia: neither CSS nor images, etc... This could be easily confused with regular research engines, although this problem is easily solved with a tiny database. Moreover, spiders do unusual things, for example they say that they are Firefox or Explorer. It can be easily checked that they are robots, as Firefox wouldn't drop downloading images, although they are in the cache, it would send a HEAD petition.
- Regular browsers almost never send data for POST in HTTP/1.0 protocol
Work
Send logs to the database
Therefore, in order not to repeat work already done, the idea is to use the "backend" of fail2ban and to send analyzed information on attackers to the syslog. Through syslog the fail2ban collects and manages the data (white list of IPs and automatic blocking/unblocking after normative blocking time, etc)
Work progress of this part: there are scripts executable manually that do all this stuff and they work quite fine. Apache module mod_log_sql save logs in data base. It's available in Debian 5.
To read the database through http
The PHP "console" would be a kind of "super system of web statistics" that would allow seeing either individually or in crossed way protected sites, allow or deny the IP, would know the installation path, etc... very handful to recognize an unknown attack. In few hours more, it could help to create regular expressions that are necessary to make an archive .htaccess that would protect a specific website.
In this way, the system sends back feedback to admins, so they can use that console to manage the system. The system could work without a console, or at least, at the same computer, so if the system is made enough open, it can be used to identify attack/abuse patterns in other protocols (ftp), irc...
We start working with Skeith mod_log_sql Analyzer.
Computer protection
There are two techniques:
- fail2ban: blocks IP's at iptables level. this technique is used to block an attack that already occurred.
- .htaccess in a mainpath for any website with rules automatically maintained by our system, that will protect URLs attacked by regular expression patterns. It has an advantage that it eliminates future attackers at the same time allows legitimate use and reduces a load.
This category currently contains no pages or media.