Overview
Recently, I developed a WordPress plugin called WP Shieldon. The plugin is built upon Shieldon, which is a PHP library that I created. If you are using WordPress for your website, you're in luck! You can download and start using WP Shieldon immediately through the official WordPress plugin repository.
Requirement
- PHP version > 7.1.0
- WordPress version > 4.0
Download
Source | Download |
---|---|
WordPress | https://wordpress.org/plugins/wp-shieldon |
GitHub | https://github.com/terrylinooo/wp-shieldon/releases |
PHP Composer | composer create-project terrylinooo/wp-shieldon wp-shieldon |
Introduction
Similar solutions
The primary purpose of WP Shieldon is anti-scraping. Similar server-side solutions include the mod_evasive module for Apache and the ngx_http_limit_req module for Nginx.
Both of these modules are used to limit the number of requests coming from a single IP address. However, many visitors may share the same IP address if they are on the same Local Area Network (LAN), as is often the case in environments like schools or coffee shops. Consequently, these users may be blocked once the limit is reached.
The most critical issue here is not necessarily blocking genuine users, but rather blocking search engines like Google, Bing, and Yahoo. Blocking search engines adversely affects your website's position in the Search Engine Results Pages (SERP), and the trade-off is the loss of traffic that search engines would typically direct to your site. This is why I prefer not to use these server-side modules.
Shieldon protection
Shieldon analyzes a user's behavior at the application layer, employing multiple filtering methods to minimize the chances of mistakenly blocking a user. In the event that a user is temporarily blocked, they can regain access by solving a CAPTCHA.
Benefit of using WP Shieldon
WP Shieldon is the first in the order of loading plugins. After it blocks malicious access, it displays a captcha page, interrupting the loading of subsequent plugins and themes. This saves bandwidth and CPU resources that would otherwise be used in handling excess traffic from malicious visits and processing other plugin functions, as well as WordPress actions such as loading posts, lists, or searches.
For cloud services that charge based on bandwidth usage, such as Google GCE, Amazon EC2, etc., traffic is a cost. Also, the CPU won't spike due to maliciously refreshing pages that are heavy on SQL queries.
How It Works
You can find settings for the following verification methods in the WP Shieldon configuration section, allowing you to freely choose which verification methods you want to enable.
Filters
You must enable individual filters for the corresponding checks to take place. Each filter can be configured with a tolerance threshold, and the default value is 5, which is usually sufficient to rule out most false positives
Cookie
WP Shieldon generates and inserts a set of JavaScript code into the front-end pages, which creates a cookie. This method filters out crawlers that are unable to execute JavaScript. However, it does not filter out browser-based crawlers, such as those using Selenium.
You don't need to worry about some legitimate users who have not enabled the cookie functionality. They won't be able to log into your site for shopping, and the ads on your site won't be effective for them either, as there are no cookies to track preferences, leading to a lack of revenue generation.
Referrer
This method involves checking whether the header information displays a record of the previous page's URL. Generally, only the first-time visits where the URL is directly entered into the browser's address bar will have a null value, while the rest will have records. This method helps in filtering out some crawlers.
Components
Trusted Bots
This component is for trusted bots, allowing access for popular search engines such as Google, Bing, Yahoo, etc. This is a must-enable component; otherwise, the crawler bots of these search engines will undergo browsing frequency checks. If the frequency of page retrievals reaches the limit, they will be blocked...
If you enable strict mode, it will additionally cross-verify RDNS and IP for consistency. Inconsistencies are common; for example, Baidu's bot IP has RDNS, but pinging that RDNS returns no IP record. However, this is just an example; Baidu is not on the whitelist in this plugin.
Header
By enabling this component and selecting strict mode, the plugin will check if the visitor's request contains the Header information typically found in browser requests; if not, it blocks the request.
User-agent
If the Header information doesn't contain the User-agent, it is most likely a novice crawler; block it outright.
Reverse DNS
Reverse DNS (RDNS), Normally, telecommunications users will have one RDNS corresponding to one IP. For example, a user of Chunghwa Telecom with IP: 61.216.101.55
Reverse DNS (RDNS) 中文為反向域名解析,也稱為 IP 反解 (IP Resolved),正常的電信用戶都會有一個 RDNS 對應一個 IP,例如中華電信的用戶的 IP: 61.216.101.55
Most search engine crawlers' IPs can also be resolved to RDNS. In cases where it can't be resolved, it's mostly because they are not regular users but rather servers on the internet.
Basic Check
Frequency check
Generally speaking, a user viewing around 10 pages on a website in a day is considered to have high engagement. The Shieldon package has the following frequency settings.
These represent the number of views allowed per second, per minute, per hour, and per day. Once these numbers are reached, the user will be temporarily blocked and directed to a verification page. Once the user successfully completes the verification, their view data will be reset.
If you only want to perform checks per minute, you can set the other numbers to an unattainable number like 9999, so only the per-minute frequency has a chance to trigger the limit.
Excluded URLs
If you have specific pages that you don't want to be restricted by WP Shieldon, you can also exclude them in the settings.
- URLs that start with the specified string will be excluded.
By default, WP Shieldon protects the login page, registration page, and XML RPC to prevent brute force attacks. If you don't want to protect one of these, you can configure it here.
The provided Chinese text translates to English as:
IP Manager
It is divided into site-wide, login page, registration page, and XML RPC.
Use Case - Entire Site Not Public
Set your own or your company’s fixed IP for the entire site and enable Deny All, then only that IP can browse smoothly. This is suitable for technology departments of some technology companies to write their own technical documents, but only the company's IP or VPN IP can be accessed.
Login protection
Suppose you don't have a fixed IP, but you also want to use IP Manager to protect your login page, you can set a passcode.
In this example, I set it as test
.
This URL will bypass IP blocking and allow login. However, this URL should only be known to you, be sure to remember it and do not share it.
https://terryl.in/zh/wp-login.php?test
Online Session Control
The term "session" is the translation of Google Analytics. The meaning of "session" here is the same as in Google Analytics, to make it clearer, this feature is essentially limiting the number of visitors who can browse your website online.
You can set how many visitors can browse online, for example, 100
, and how long each user can browse, for example, 5
minutes. You can see the real-time report in the IP Session Table in the backend.
Reports
Online session table
This report allows you to observe the online situation in real-time, provided that you have enabled the Online Session Control
feature.
Rule table
Those identified as anomalies by filters and basic checks will be temporarily blocked with the status "CAPTCHA", meaning that the IP is on the verification page and the visitor or bot with that IP must solve the verification to continue browsing. If the verification is solved, that IP will be removed from the rule table.
By default, those blocked by components are hard blocked and won’t have a chance to be unblocked, so they won’t enter the rule table because they were blocked during the component check, before the filter check. But you can see the search engines allowed by Trusted Bots listed here.
In addition, the rule table has the function to temporarily block individual IPs. Why is it temporary? Because if you have the data reset cycle enabled, this table will be reset daily. If you want to permanently block an IP, use the IP Manager.
IP log table
Records visitor data for this data cycle.
Dashboard
The log records on the dashboard are permanent and are not affected by the data cycle. All data for each visitor processed by WP Shieldon after the plugin is enabled will be recorded.
PS: Disabling the plugin will also clear all data. Because log files are not very important. Since it is disabled or removed, just clear it.