Google Analytics - 'Include Filter' Limitations & Workaround Using Custom Fields
Estimated Read Time: 7m 25s
The Google Analytics Include filter is a powerful tool that allows you to limit data in your Google Analytics View based on the subdirectory, hostname, source IP address or any other required logic. However, there is one caveat for using multiple Google Analytics Include filters: When you use two or more Include type filters in your Google Analytics View, each filter works independently and only allows traffic which satisfies the filter condition. For example, if you were to filter traffic by two Include filters using /sales/ and /checkout/ subdirectories of the website, the first filter will only include hits to the page containing /sales/ (excluding /checkout/ pages) in its URL and a second one only pages with /checkout/ (excluding /sales/ pages). So unless there are pages containing both sales and checkout patterns in its address - nothing will be included in the View. The solution here is to use just one filter matching both patterns (regex expression: (/sales/)|(/checkout/)).
You don't normally see more than one Include type filter in the View for exactly this reason.
As I wrote in Google Analytics Limits and Quotes each filter can only accommodate 256 characters in the text field. This is enough for most websites, but not for your average enterprise. At Internetrix we are noticing this limitation kicking in every couple of months when implementing Google Analytics structure for our enterprise clients. Furthermore, you can't use different dimensions in the same filter.
An example could be a requirement to include hits from one hostname and hits from another hostname where Page Title contains specific phrase (for ex. "On Sale").
Integrating multiple Include filters with OR logic
A little workaround that I have found to the previously mentioned limitations is to match against conditions one by one using an Advanced type filter. Then you can use only one Include filter at the end (since Google Analytics guarantees that filter will be executed in the specified order). In other words, every filter "remembers" in a separate field that we need to include the current hit rather than simply filtering traffic. Later, the Include type filter checks if the field is set to any value, and if it is, allows this hit to be included.
Let's say we are tasked with collection of all hits to the website www.example1.com.au and hits to hostname example2.com.au where Custom Dimension with index 31 is set to Yes. We are going to create three filters which need to be applied to the View in the following order;
- Filter #1: Hostname example1.com.au to Custom Field 1 - Advanced
This filter will check if hostname matches example1.com.au and set Custom Field 1 to it's value
On the screenshot, Advanced type filter tries to match (using RegEx syntax) hostname example.1.com.au. If regex engine captures anything matching the pattern (example1.com.au), it will save it to the temporary variable named $A1. In the Output To -> Constructor filter field we write the captured value (or nothing) to the Custom Field 1.
There are two fields in Google Analytics which exist only during filter processing phase: Custom Field 1 and Custom Field 2. You can't report on them, nor should you try. Their only purpose is to save and pass information across different filters.
Our last filter will only include hits where Custom Field 1 is set to any value. Which means that no matter what will happen in the filters executing next, hits will be included in the View if they have the hostname example1.com.au or www.example1.com.au or test.example1.com.au.
- Filter #2: Concatenate Hostname and Custom Dimension 31 to Custom Field 2 - Advanced
As per the requirements, we only have to allow hits to a specific hostname where Custom Dimension with Index 31 is set to the 'Yes' value. To achieve this, I concatenate hostname and the Custom Dimension value and write the result to the Custom Field 2 field.
What happens here? - Firstly, the Google Analytics filter execution engine executes a regex engine on the hostname dimension against the .* pattern (capture everything). Basically, the filter captures hostname and writes it to the $A1 temporary variable. Next, the filter executes the regular expression with syntax (.*) against value stored in the Custom Dimension with Index 31 for the current hit. This means the whole string is captured into the temporary variable $B1. After that, concatenate $A1 and $B1 with a hyphen symbol between them.
- Filter #3: Check if Hostname and CD31 confront to Business Logic to Custom Field 1 - Advanced
At this stage, inside the Custom Field 1 we have a combination of a hostname and a Custom Dimension #31 value for the current hit. What's left to do is to check if it matches our requirement. It should be: example2.com.au-Yes
This is similar to the Filter #1 (Hostname example1.com.au to Custom Field 1 - Advanced) since it will only write to the Custom Field 1 if Custom Field 2 value contains the example2.com.au-Yes string. If the hostname is different or the Custom Dimension value is No (or anything else, really), nothing will be written to the Custom Field 1.
- Filter #4: Include if Custom Field 1 is set - Include
This Include type filter will ONLY include hits into the view where the value of Custom Field 1 is set as a result of previously executed filters.
This is fairly simple, the .+ RegEx pattern matches anything which has one or more characters. Basically, if there is a value then an action is performed.
As you can see from the above, the filter order is super-critical and should they be executed in an incorrect order - nothing will work.
What could possibly go wrong?
While things should be up and running now, we have used a bit of a non-standard approach. I am a big fan of simple and self-documenting things. This one is obviously not and therefore I only recommend doing this as a "last resort".
By simply changing the order in which filters execute we can easily get incorrect results. More importantly, if anyone accesses the View setup they will find it very hard to understand.
To use this setup effectively, my hot tips are:
- Always use Test Views. Use multiple Testing View to test different filters at the same time.
- Consider using numbers in the filter names to hint that they must be in this specific order.
- Keep filters setup AS SIMPLE AS POSSIBLE .
- If you believe it is important to document the setup (it's not self-explanatory) you may consider creating a "do nothing" filter whose name hints people that they should read the setup documentation first. From my experience that works very well
- When you are happy with the way filters works and are applying them to View(s) used for analysis, do not forget to add Annotations.
Hopefully by now you have a much better understanding of how to apply my little Include Filter workaround. If you have any questions or are looking for further assistance with any Google setup and implementation issues get in touch today. Internetrix has a diverse team of digital analytics experts with years of experience utilising the Google Marketing Platform, especially Google Analytics 360. If you have any questions or projects regarding Google Analytics, please get in touch with our expert team today!