Nobody likes spam, and it can be very frustrating for site owners as it typically involves taking time to set up filters and researching the best way to block it.
Many by now are used to dealing with referrer spam, as it is something that has plagued us for years. However, in the past couple of months, there has been a new approach to this, in the form of what everyone is now calling language spam.
Most of you probably started noticing this right around the time of the 2016 US elections. Follow our tutorial below on the best ways to block language spam and prevent it from skewing your traffic and analytics statistics. It is very important to fix these as soon as they start appearing.
What is Language Spam?
While referrer spam is mainly about targeting search engines, language spam typically is used by a spammer for a certain agenda or to promote their own sites or products. What happens is they manipulate the language used by real sites like motherboard.vice.com, thenextweb.com, lifehacker.com, reddit.com, etc. Language spam also typically only registers pageviews on the homepage of your site.
What have they to gain? Peter Velchev from Dowser explains it well:
The idea behind this is that once you see the URL of the new visitor, you might be tempted to trace it back to its source. This would in turn generate real visits to the hacker’s website, thus pushing it up the rating ladder…
Language spam can be seen in Google Analytics on your dashboard or under the “Audience > Geo > Language” section. Here are a couple of examples of recent language spam attacks you might have seen lately popping up in your reports:
- Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
- Congratulations to Trump and all Americans
- Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO
- o-o-8-o-o.com search shell is much better than google!
- Google officially recommends o-o-8-o-o.com search shell!
Google is apparently working on fixing this issue, but more and more keep emerging. Once one stops, another one seems to begin.
This is a screenshot taken on a brand new site, and as you can see, between November 1st and December 17th, 929 of the 1,377 sessions were from language spam! Talk about skewing your data.
The language spam problem was brought up on Search Engine Roundtable on November 9th. And if we take a look at Google Trends we can see that starting in November 2016 the activity around “google analytics spam” skyrocketed.
Why Should You Block Language Spam?
The first reason to block language spam is obviously to not completely skew your analytical data as seen above. If you ever want to use your language data of your visitors, say in a multilingual WordPress setup, for example, then you want the data to be accurate.
Another important reason, that a lot of people don’t realize, is that Google Analytics filters don’t apply retroactively. This means that filters will only apply to data gathered from the day that the filters are created. That is why it is important to tackle the spam problem right away. Historical data cannot be fixed with filters. However, the downside to this is that if you implement a filter wrong, you could lose valuable data forever. There are advanced segments though which can help you with your historical data, of which we will go more into below.
How to Block Language Spam in Google Analytics
There are a couple of options when it comes to tackling the language spam in Google Analytics. We don’t necessarily recommend using a plugin, as it is usually better to do this closer to the source of the problem. Also, plugins have a hard time eliminating ghost referrals. Google Analytics is actually quite powerful when it comes to manipulating, filtering, and segmenting data. And by not using a plugin you can ensure that whatever happens to the installation of your site that the filters/segments will stay in place.
- Option 1: Block Language Spam With a Filter
- Option 2: Block Language Spam With an Advanced Segment
- Option 3: Block Language Spam With 3rd Party Lists
Option 1: Block Language Spam with a Filter
The first and probably one of the easiest ways to block language spam in Google Analytics is to use a filter. Filters allow you to modify and limit data. For example, you can exclude certain subdirectories, whitelist traffic from specific IP or IP ranges, etc. We recommend setting up a new view whenever you are creating filters, because if anything goes wrong you should always have access to your original data untouched. You then apply all your custom filters to the new view.
Step 1 (Optional)
The first step is to copy your current view so that you can filter the data only on a separate view. This is optional for your safety. You might already have a separate view in which case you can skip to Step 2. Otherwise, click into the Admin section in Google Analytics and into your current “View Settings.” Then click on “Copy view.” The reason you want to use copy is that this will carry over any other filters and goals that you already have in place on your site.
Name your new view. In our example, we chose “filtered domain.com.” Then click on “Copy view.”
Step 2
Click into your new filtered view (or original view) and click into “Filters.” Then click on “+ Add Filter.”
Step 3
Give your filter a name (ex: Filter Language Spam). Then choose custom from the Filter types. You will want to select the “Language Settings” filter and input the following into the Filter Pattern field:
.{15,}|\s[^\s]*\s|\.|,|\!|\/
You can then click on the “verify” button to see an example of what the filter found in the last 7 days. Then click “Save” to apply the filter.
And that is it! You now will only see valid/real languages pass through in your Google Analytics.
Option 2: Block Language Spam with an Advanced Segment
The second option you have for fighting language spam in Google Analytics is to use an advanced segment. These actually work with your historical data and are generally regarded as a safer option to alter your data as they don’t change anything. You can deactivate them at any time to return to the previous state. However, if you are using a separate view with a filter as we showed above, this is just as safe.
Step 1
To create a segment click into the Admin section in Google Analytics and into “Segments.” Then click on “+ New Segment.”
Step 2
Give your segment a name (ex: Segment Language Spam) and under the Language field, change the drop-down to “does not match regex” and enter in the following:
.{15,}|\s[^\s]*\s|\.|,|\!|\/
Then click on “Save.”
And that’s it. You can then select the language segment on your Analytics dashboard and remove “All Users.” Remember, segments modify the data in real-time. Tip: You can create a custom dashboard/shortcut with your new segment already applied for quick viewing later.
Option 3: Block Language Spam With 3rd Party Lists
One of the most annoying parts about spam is that it is time-consuming for us as website owners. We have to constantly be updating our segments and filters to ensure our data is as accurate as possible. However, there are resources and 3rd party tools out there to help speed up the process if you don’t have time. Below are a few options you might want to check out:
- Analytics-Toolkit: This company provides what they call an Auto Spam filter which is constantly updated for you.
- Analytics Edge has free pre-built segments which you can utilize with a single click. These are also consistently updated.
And if you are interested in learning more in-depth about how to best remove spam from Google Analytics, the following tutorial is great:
Summary
As you can see, it is pretty easy to filter out and exclude this new language spam tactic. We recommend looking through the analytics on your sites and ensure your data isn’t being skewed. What are your thoughts on the language spam situation? We find it simply downright annoying and hope that in the future Google can help combat more of this useless data that business owners are now having to deal with.
Great guide! It will help for all your doubts for sure.
Glad it was helpful!
Hello Brian, Great article. It’s really tough to counter spam in google analytics as lots of new spam sources appear daily. You have mentioned good steps to block the language spam. It’s good if you can please suggest some referral and social spam filters too. Is there any solution which can counter all of referral and social spam by just one filter?
Glad the article was helpful. Unfortunately when it comes to referral traffic spam there really isn’t a one fits all filter. At the bottom of the article above, there are some solutions which are more automated. This one for example: https://www.analytics-toolkit.com/auto-spam-filters/ It isn’t free… but if you don’t want to spend time messing with filters it might be worth it.
We will be updating our referral spam article: https://kinsta.com/blog/referrer-spam/ over the next couple weeks with some easier solutions, so make sure to bookmark it.
Hello Brian, Great Article! Is there any way to avoid to avoid Spam signups?
Thanks Punit! What do you mean spam signups? Do you mean spam traffic? There really isn’t any way to avoid it, other than using filters and advanced segments. I have seen sites with the exact same setups get spam traffic and then others that don’t… so it appears to be luck of the draw :) in some cases.
From past few days I was noticing a large number of spam signups like more than hundred per day. I have now added a checkbox to accept TOS after which the problem is solved.
Great, glad you got your problem resolved.
I’m working on filtering language spam for a client’s website as we speak (unfortunately, it seems like the example you gave in the beginning – 60%+ of the traffic to their WordPress site is spam, ugh). Anyway, this article is the most straight-forward, clear, concise one I’ve found so far in my search! THANK YOU for creating such a user-friendly tutorial to help combat this annoying issue so many of us are dealing with. Much appreciated!
Ya that language spam has definitely become a problem! Glad our article was helpful.