RegEx guide for Google Analytics and Tag Manager

ShareShare on Facebook13Tweet about this on TwitterShare on LinkedIn22Share on Google+7Buffer this page

Regular Expressions (RegEx) may seem complicated  at first, but once you get to know them, you will manage your Google Analytics and Google Tag Manager like never before.

There are a lot of good articles published about Regular Expressions (will mention some later on) and I decided to contribute too. [Updated in 2017]

I will start with theory, so you might want to jump to examples, if you know the basics.

Part 1. RegEx basics

Regular Expression (RegEx) is a string describing a specific text pattern. So instead of having multiple “contains” conditions, you can match different data with just one Regular Expression. Think of it as a shape fitting game for kids – you have to define the hole so only the desired shape could fit.

RegEx for kids
RegEx is all about defining a pattern only selected items could fit

RegEx ABC

To build a pattern you need to learn the RegEx characters. Here are the characters most frequently used in Google Analytics and Google Tag Manager (GTM).

RegExMeaningExample
|ora|b – matches a or b
.any single charactera.c – matches abc, acc, adc, …
?zero or one previous charactergoo?gle – matches gogle and google, but not gooogle
*zero or more previous charactersgoo*gle – matches gogle, google, gooogle
+one or more previous charactersgoo+gle – matches google, gooogle, but not gogle
^start of the string^apple – matches apple juice, but not pineapple
$end of the stingapple$ – matches pineapple, but not apple juice
[]list of items to match to[a-z] – matches any lowercase letter from a to z
b2[cb] – matches b2c, b2b
()group elementsJan(uary)? – matches Jan, January
January?  – matches Januar, January
{}define character count {x}, {x,y}[0-9]{2} – matches any two number string  from 01 to 99
[0-9]{1,3} – matches any number string from 1-999
 \treat RegEx characters like normal characters\? – matches a question mark, not zero or one character

How to know you made the RegEx right?

  1. Test.  In Google Analytics you can preview custom Segments and verify Goal setup, as well as test advanced table filters with RegEx option enabled. In GTM just go to preview mode and see if Tags are triggering correctly.
  2. Use RegEx debuggers – tools that allow you to enter your pattern and test different self-created strings to see if there is a match or not. If you google them, there will be a dozen different tools. One of my favorite is https://www.debuggex.com/ as they have a pretty cool pattern visualization.

Part 2. Regular expressions in practice

Let’s see how and where RegEx can be used.

  1. Google Analytics Goal Setup
  2. Google Analytics Custom Segments & Table Filters
  3. Google Tag Manager Triggers
  4. View Filters

1) Google Analytics Goals

Regular expression knowledge is a savior for Goal configuration where it’s the most flexible option.

Let’s say the confirmation page is www.example.com/en/ferry-tickets/booking.php?id=1234. “Equals to” and  “Begins with” options are unsuitable due to possibly changing language (/en/) and id parameters, so RegEx is the only option left. You can input just booking, but this matches also www.example.com/en/booking-rules, so to match exactly booking.php you need to include the extension, better with a backslash before a dot to match a dot, not any character (remember, a dot is a RegEx character).

More complicated example – let’s say there are a few Thank you pages, each for a specific service, and you want to configure separate Goals for them.

/en/service1/?sid=1234&success=1 – Goal 1
/en/service2/?sid=2345&success=1 – Goal 2

By having “success” as a pattern, it matches both pages. So pattern for Goal 1 should be more specific, for example (.*)/service1/(.*)success.

Here I’ve defined to look for the URLs that have “/service1/” in the beginning or the middle of the URL following “success” after “/service1/” with possibly other characters in between. Of course, this is not the only solution, much will depend on the URL structure you have.

2) Google Analytics Custom Segments & Table Filters

For me, the most frequent use of regular expressions while filtering data with Table Filters or Segments is to filter only few specific values.

2.1) Filter traffic Sources and Campaigns

For example, to create a segment  with only direct / (none) and google / organic traffic . Yes, it’s possible to have 2 “contains” conditions, but RegEx will be a more elegant solution – just  use the pipe (|).

When using the pipe, don’t forget about grouping. For example google / cpc|organic will filter traffic sources that match google / cpc or all containing organic, so bing / organic also will be included.

Regex matches google / cpc, google / organic, bing / organic, etc.

If you want to filter only google / cpc and google / organic don’t forget to use grouping. Correct regex would be google / (cpc|organic).

RegEx matches only google / cpc and google / organic when used on traffic reports

Or maybe you want to make a segment with specific campaign traffic.  For example, you could have this campaign structure and you need to select only Search non-branded campaigns.

Search_Campaing_brand_EN
Search_Campaing1_nonbrand_EN
Search_Campaing2_nonbrand_EN
GDN_Campaing1_nonbrand_EN

Search.*nonbranded will match all the campaign names with “Search” and “nonbranded” with other names and parameters between (.* – matches any character in any quantity), before or after them.

Same could be done to filter report table. Just input the regex pattern in the search field.

2.2.) Filter specific content

Few more examples with page filters in the Content report. Let’s presume you want to get all the blog posts that have one of the following link structures.

Structure 1: /lang/blog-post-name/post-id/ (example: /en/how-to-become-analytics-ninja/12/ )

RegEx would be /en/(.*)/[0-9]+/  – contains  /en/ with any characters after and at least one number (post id) or more as the 3rd URL directory. Again, you can just input the pattern directly in the search field or click to advanced filter.

Structure 2: /yyyy/mm/blog-post-name/ (example: /2014/12/how-to-become-analytics-ninja/)

RegEx would be /[0-9]{4}/[0-9]{2}/  – contains a 4 number string representing a year and a 2 number string representing a month.

Similarly, you can filter only selected landing pages, website sections and etc.

3) Google Tag Manager Triggers

With Triggers you control when to launch different Tags. Often, to separate different URLs, links or other elements, conditions like “contains” are not enough and you need to define a specific pattern with Regular Expressions.

3.1) All pages or  (.*)

If you are using Google Tag Manager you probably have seen this. Dot means any character, asterisk – none or more previous (that is any) characters. So the pattern matches any text string, that is all pages. (All pages Trigger is available by default, no need to create it manually).

3.2) Home Page 

Pattern for a homepage  usually would be ^/$. In human language it means “starts and immediately ends with /”.

As URL variable contains domain name and possible query parameters (that is what goes after ?), better to use Page Path (Make sure you have it enabled in your GTM Account, under Variables > Enabled Built-In Variables).

Enabled Variables will appear in GTM preview console

If the page is multilingual, homepage can be also /en, /en/, /lv or /lv/. The pattern then would be ^/(lv|en)?/?$.

GTM v2 Trigger for multilingual frontpage
GTM Trigger for multilingual homepage.

Looks complicated, right? :) To describe this pattern in plain English – the Page Path path must:

  1. ^/ –  start with a slash;
  2. (lv|en)? –  can have either one or none of lv or en;
  3. /?$ – must end with a / or nothing.

The question mark is the one that says “one or none characters” and () brackets are used to group elements. See the nice visualization of this RegEx below.

RegEx visualization with debuggex.com
Multilingual homepage RegEx visualization with debuggex.com

3.3) Link patterns

To track all links to pdf documents you may use “contains pdf” condition and for tracking all outgoing links use “does not contain mydomain.com”. But sometimes more complicated patterns are required.

For example, I want to separately track my social profile link clicks. The links are:

http://www.twitter.com/apasters
https://plus.google.com/+AleksandrsPasters
http://lv.linkedin.com/in/aleksandrspasters

Also, I know that social sharing buttons may have similar strings in them (in bold), like https://plus.google.com/share?url=http://www.apasters.com/blog/7-things-to-consider-for-google-analytics-friendly-website-development/

So the pattern needs to be specific – after some tests I’ve come up this this one .*(twitter|plus.google|linkedin).com/([A-Za-z/+]+)asters$

RegEx pattern visualization for my social profile links

To explain the pattern:

  1. Can start with anything (can define it more specific if you wish).
  2. Mandatory Group 1  can contain one of the three values – twitter, plus.google or linkedin.
  3. Must contain  .com/ after the website name. (For simplicity I left the dot unescaped, that means it will match any single character)
  4. Mandatory Group 2  can contain letters (both upper- and lowercase), slash or a plus sign.
  5. Must end with the last letters of my surname. Could have used the full surname, but as in G+ has it capitalized, would need to use RegEx (ignore case) option then.

3.4) Custom Event Triggers

If you want to have one Trigger for multiple Custom Events, you can use regex matching and input Event name pattern. Here the pattern is case-sensitive, so be careful with values.

Matching several Enhanced Ecommerce events with one Trigger

4) View Filters

View Filters may be not the most  frequently  visited place in Google Analytics, but surely an important one as they define data collection.

Let’s say the webpage has the following pattern /en/section-name/page-name, but one of the sections, for  some reasons, has a different one /section-name/en/page-name. With a regular expression you can match this “wrong” URI and rewrite it.

Custom View Filter
Custom View Filter

Using round brackets you create RegEx groups for the 3 Request URI parts: first must contain at least one letter or a dash;  seconds must be en; third – zero or more letters, numbers or dashes.

Few more examples of Filter usage with RegEx – add a hostname to Content report links or exclude traffic based on IP ranges.

Important: use a test View first to make sure all works as intended, as collected data changes made with Filters could not be undone.

Instead of any closing wisdom, I better share some more useful resources to check.

Theory:

Theory with Examples:

Examples:

Tools:

Fun:

Note: This article was first published on Jan 24st 2015 and updated in 2017.

[cover photo by markus spiske]

ShareShare on Facebook13Tweet about this on TwitterShare on LinkedIn22Share on Google+7Buffer this page

7 thoughts on “RegEx guide for Google Analytics and Tag Manager

  1. Simply awesome! This gives a great breakdown of some basic patterns and serves as a launch point. Now I’m really starting to understand.

    The table explains what each symbol does in plain English, and the examples are actually usable in my work.

  2. This is so helpful. Nice to get an explanation that’s well written and so straightforward that it’s easy to understand. Most articles on regex and GTM are not!
    Thanks Aleksandrs

  3. Hi aleksandrs,

    Thanks for the article. I have a question: we have a multilangual website, for example: http://www.example.com/nl/.

    I want to track the the NL site seperatly into GA, using GTM, but I’m not getting the code installed correctly, is still measures all website traffic. Is is because of the GA code, or does the GTM trakcing installed incorrectly? Below a screenshot of the settings:

    PagePath > Matches RegEx > ^/(lv|nl)?/?$
    Page Hostname > Contains > http://www.example.com

    Please help me out!

    Regards,

    Reinier

    1. Hi,

      Would need more information to propose something specific, but in your case maybe it will be enough just to have “Page Path starts with /nl/”. Or if you have multiple language versions and want to track them within different Properties, you can:
      1) JS variable with language version – get lang from the URL (see the code here)
      2) Lookup Table by language variable just created to return the language specific GA properly

Leave a Reply

Your email address will not be published. Required fields are marked *