Google Analytics PII - safe-guarding emails

Can you imagine what would happen if Google allowed the sending of Personally Identifiable Information (PII) to Google Analytics? The demand from marketing people to personalise online experience is huge (believe me, I know!) and should this be allowed, analytics teams will collect it. Effectively this would mean converting Google Analytics data into an enormous database of your personal data. It would be possible to connect the data you have supplied to one website with data you have trusted another website to store! What if a third-party has unauthorised access to it? Your transaction history, delivery addresses, everything! It's scary.

The online privacy issue is one of the major concerns nowadays and businesses are walking a fine line between personalised targeting and exposing your private data. Luckily for us, we still can't save any PII in Google Analytics. Furthermore, should any PII be found in your Google Analytics - you might lose your historical data as I have written previously: Why you might lose your Google Analytics data.

Given a large enough project and multiple people working on it over multiple years - you might still send PII to Google Analytics unintentionally which poses a problem as a pressing issue. Before the data gets permanently (well, almost) stored in Google Analytics, our last line of defence is filters which have a chance to process the data. It looks like we can set up filters to remove PII. Unfortunately, there is no universal and generic way to automatically recognise a customer's name or phone number, but we can do so for email addresses and frankly that's the most common component of the PII which may accidentally be sent to Google Analytics.

Even if your current tracking implementation is not collecting emails, you can't say for sure it won't (by mistake) in years to come. Below are examples of filter configurations to prevent email addresses from appearing in your Google Analytics data.

Remove emails with Google Analytics filters

The idea is quite simple: every dimension (value) which may contain email needs to be safeguarded. Luckily for us, Google Analytics filters with type Advanced are quite flexible. All we need to do is to use RegEx to replace email with constant string [EMAIL]. There is a large number of dimensions in Google Analytics, perhaps it is a little obsessive to safeguard them all (actually nothing stops you from doing so!) so practically speaking in every specific case, I would choose from these:

  • Event Category
  • Event Action
  • Event Label
  • Request URI
  • Referrer
  • Page Title
  • Search Term
  • Any Custom Dimensions you are afraid an email address might accidentally be sent to

Event Category, Action and Label are more likely to have an email address inside. For example, as part of the error message (which is being tracked) or as part of the user's input, etc..

Steps to create a Google Analytics filter to safeguard emails in Event Category dimension
  1. Navigate to your Google Analytics Admin section, choose Test View (create one, if you haven't done so yet) and click on Filters
  2. Click New and choose Advanced filter type and name it Exclude EMAIL from Event Category
  3. Change the Field A to be Event Category and enter (.*?)([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z])(.*) in the text field on the right
  4. Leave Field B empty
  5. Change Output to field to be Event Category and enter $A1[EMAIL]$A3 in the text field on the right
  6. Configure check boxes as in the screenshot below
  7. Click Save :)


That's pretty much it. Well done! Now you have to repeat the same for each Google Analytics dimension you would like to safeguard. Or at least for the first four.

Explanation: the filter will take the value for the Event Category dimension and execute Regular Expression matching three values: everything before the email address found email address and everything after the email address. If an email address is found, the filter will join text before the email address with a string [EMAIL] followed by the string which was after the email address in the Event Category.

The email safeguarding filters must be added to all Views in your Google Analytics property(ies) to prevent data loss. And of course, it is always better not to send Personally Identifiable Information in the first place.

Need help with your analytics? Talk to us about your ideas. Internetrix combines digital consulting with winning website design, smart website development and strong digital analytics and digital marketing skills to drive revenue or cut costs for our clients. We deliver web-based consulting, development and performance projects to customers across the Asia Pacific ranging from small business sole traders to ASX listed businesses and all levels of Australian government.