How to Block Language Spam in Google Analytics and WordPress
By Brian Jackson, Updated: January 2, 2017
Nobody likes spam, and it can be a very frustrating for WordPress site owners as it typically involves taking time to setup filters and researching the best way to block it. Many by now are used to dealing with referrer spam, as it something that has plagued us for years. However, in the past couple months there has been a new approach to this, in the form of what everyone is now calling language spam. Most of you probably started noticing this right around the time of the 2016 US elections. Follow our tutorial below on the best ways block language spam and prevent it from skewing your traffic and analytics statistics. It is very important to fix these as soon as they start appearing.
What is Language Spam?
While referrer spam is mainly about targeting search engines, language spam typically is used by a spammer for a certain agenda or to promote their own sites or products. What happens is they manipulate the language used by real sites like motherboard.vice.com, thenextweb.com, lifehacker.com, reddit.com, etc. Language spam also typically only registers pageviews on the homepage of your WordPress site.
What have they to gain? Peter Velchev from Dowser explains it well:
The idea behind this is that once you see the URL of the new visitor, you might be tempted to trace it back to its source. This would in turn generate real visits to the hacker’s website, thus pushing it up the rating ladder…
Language spam can be seen in Google Analytics on your dashboard or under the “Audience > Geo > Language” section. Here are a couple examples of recent language spam attacks you might have seen lately popping up in your reports:
- Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
- Congratulations to Trump and all americans
- Vitaly rules google ☆*:｡゜ﾟ･*ヽ(^ᴗ^)ﾉ*･゜ﾟ｡:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(ﾟДﾟ)ﾉʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO
- o-o-8-o-o.com search shell is much better than google!
- Google officially recommends o-o-8-o-o.com search shell!
Google is apparently working on fixing this issue, but more and more keep emerging. Once one stops, another one seems to begin.
This is a screenshot taken on a brand new WordPress site, and as you can see, between November 1st and December 17th, 929 of the 1,377 sessions was from language spam! Talk about skewing your data.
The language spam problem was brought up on Search Engine Roundtable on November 9th. And if we take a look at Google Trends we can see that starting in November 2016 the activity around “google analytics spam” skyrocketed.
Why Should You Block Language Spam?
The first reason to block language spam it so that it obviously doesn’t completely skew your analytical data as seen above. If you ever want to use your language data of your visitors, say in a multilingual WordPress setup, then you want the data to be accurate.
Another important reason, that a lot of people don’t realize, is that Google Analytics filters don’t apply retroactively. This means that filters will only apply to data gathered from the day that the filters are created. That is why it is important to tackle the spam problem right away. Historical data cannot be fixed with filters. However, the downside to this is that if you implement a filter wrong, you could lose valuable data forever. There are advanced segments though which can help you with your historical data, of which we will go more into below.
How to Block Language Spam in Google Analytics
There are a couple options when it comes to tackling the language spam in Google Analytics. We don’t necessarily recommend using a WordPress plugin, as it is usually better to do this closer to the source of the problem. Also, plugins have a hard time of eliminating ghost referrals. Google Analytics is actually quite powerful when it comes to manipulating, filtering, and segmenting data. And by not using a plugin you can ensure that whatever happens to the installation of your site that the filters/segments will stay in place.
- Option 1: Block Language Spam With a Filter
- Option 2: Block Language Spam With an Advanced Segment
- Option 3: Block Language Spam With 3rd Party Lists
The first and probably one of the easiest ways to block language spam in Google Analytics is to use a filter. Filters allow you to modify and limit data. For example, you can exclude certain subdirectories, whitelist traffic from specific IP or IP ranges, etc. We recommend setting up a new view whenever you are creating filters, because if anything goes wrong you should always have access to your original data untouched. You then apply all your custom filters to the new view.
Step 1 (Optional)
The first step is copy your current view so that you can filter the data only on a separate view. This is optional for your safety. You might already have a separate view in which case you can skip to Step 2. Otherwise, click into the Admin section in Google Analytics and into your current “View Settings.” Then click on “Copy view.” The reason you want to use copy is because this will carry over any other filters and goals that you already have in place on your WordPress site.
Name your new view. In our example we chose “filtered domain.com.” Then click on “Copy view.”
Click into your new filtered view (or original view) and click into “Filters.” Then click on “+ Add Filter.”
Give your filter a name (ex: Filter Language Spam). Then choose custom from the Filter types. You will want to select the “Language Settings” filter and input the following into the Filter Pattern field:
You can then click on the “verify” button to see an example of what the filter found in the last 7 days. Then click “Save” to apply the filter.
And that is it! You now will only see valid/real languages pass through in your Google Analytics.
The second option you have for fighting language spam in Google Analytics is to use an advanced segment. These actually work with your historical data and are generally regarded as a safer option to alter your data as they don’t change anything. You can deactivate them at any time to return to the previous state. However, if you are using a separate view with a filter like we showed above, this is just as safe.
To create a segment click into the Admin section in Google Analytics and into “Segments.” Then click on “+ New Segment.”
Give your segment a name (ex: Segment Language Spam) and under the Language field, change the dropdown to “does not match regex” and enter in the following:
Then click on “Save.”
And that’s it. You can then select the language segment on your Analytics dashboard and remove “All Users.” Remember, segments modify the data in real-time. Tip: You can create a custom dashboard/shortcut with your new segment already applied for quick viewing later.
One of the most annoying parts about spam is that it is time consuming for us as website owners. We have to constantly be updating our segments and filters to ensure our data is as accurate as possible. However, there are resources and 3rd party tools out there to help speed up the process if you don’t have time. Below are a few options you might want to check out:
- Analytics-Toolkit: This company provides what they call an Auto Spam filter which is constantly updated for you.
- Analytics Edge has free pre-built segments which you can utilize with a single click. These are also consistently updated.
And if you are interested in learning more in-depth about how to best remove spam from Google Analytics, these following tutorials are great:
- Ultimate Guide to Getting Rid of All the Spam in Google Analytics
- Definitive Guide to Removing All Google Analytics Spam
As you can see, it is pretty easy to filter out and exclude this new language spam tactic. We recommend looking through the analytics on your WordPress sites and ensure your data isn’t being skewed. What are your thoughts on the language spam situation? We find it simply downright annoying and hope that in the future Google can help combat more of this useless data that business owners are now having to deal with.