Regular expressions are widely used in different spheres helping to solve some complex questions related to data analysis. If you are using Google Analytics, you know how much data it can give you, and you need something that will help you to deal with it. Regular expressions, or RegEx, are a good match here. In the article, I’ll explain how to use them to create pro reports and filters.
Some theory
Before creating and using regular expressions, you should speak their language. So here are some most important “words” that will help you to bridge that gap.
. – this symbol matches any single character. Therefore, two such signs match two symbols, etc.
hou.e –> house, hou7e,
hou)e; ho..e – house, ho5re, ho%be
* – equals 0 or more of the previous characters.
magent* - magentn, magent, magenta, magentm
| - the tube is used to separate parts of the regular expression from each other
hou.e| magent* - house, magento
^ - requires the matching data be at the beginning of the field
^color – matches color21, colorkjir, color-23-67, etc. but does NOT match anything like 9color, ycolor, icolor, etc.
$ - limits data search at the end of the field
htm$ - matches any-url.htm but NOT any-url.html
() – contains variations of matching items. It is usually combined with the tube.
dear (Mr|Mrs|Ms|Miss) Smith – matches dear Mr Smith, dear Mrs Smith, dear Ms Smith, dear Miss Smith.
Parenthesis are also usually combined with the asterisk * and dot . to create this part (.*) which is treated as “absolutely everything”:
/map/country/(.*) - /map/country/usa, /map/country/usa15, /map/country/belarus/, /map/country/uk/
\ – transforms any special RegEx character into a simple symbol
my-url-(promo|campaign|ref)\.html – matches my-url-promo.html, my-url- campaign.html, my-url-ref.html
Now when the theory is over, let’s come and play with Google Analytics filters.
RegEx for Google Analytics filters
The main reason to use regular expressions in Google Analytics is to filter out the data you need to explore. For example, you have 5000+ URLs and need stats for just 250 of them. If you try to get the stats for these URLs one by one, you’ll get bored soon and have to spend lots (I really mean LOTS here) of time. Instead, you can use regular expression to get the stats in a few clicks.
When it comes to Google Analytics filters, you can create them in the existing reports:
or in custom reports that are created by you:
The logic behind building a regular expression is to make a list of the needed URLs and find common parts in them. Here are some examples.
URLs in one category
Example 1:
You need all URLs in a single category:
site.com/magento-extensions/color-swatch
site.com/magento-extensions/search
site.com/magento-extensions/rma
site.com/magento-extensions/ajax-cart
site.com/magento-extensions/navigation
But not these:
site.com/promo/magento-extensions/ajax-cart
site.com/promo/magento-extensions/navigation
In these examples you see that /magento-extensions/ is the common part, so you should use it to create your RegEx:
^/magento-extensions/(.*)
^ - excludes any URLs containing anything except for the needed category after host, i.e. “site.com” in this example.
Example 2:
You need particular URLs within one category:
site.com/courses/acca/part-time.html
site.com/courses/acca/full-time.html
site.com/courses/acca/online.html
But not these:
site.com/courses/acca/part-time-promo.html
site.com/courses/acca/full-time-promo.html
site.com/courses/acca/online-promo.html
You can use the following regular expression:
/acca/(part-time|full-time|online)\.html
URLs from different categories
Example 1:
Here are some other examples. If you need to include these URLs:
site.com/gifts-for-her/mugs
site.com/gifts-for-him/mugs
site.com/gifts-for-her/caps
site.com/gifts-for-him/caps
Here is what you should use:
/gifts-for-her/(mugs|caps)|/gifts-for-him//(mugs|caps)
Example 2:
To get info on these URLs:
site.com/promotions/promo-banners.htm
site.com/navigation/internal-links.htm
site.com/customers/segmentation.htm
site.com/sales/checkout.htm
you can use this:
/promotions/promo-banners\.htm|/navigation/internal-links\.htm|/customers/segmentation\.htm|/sales/checkout\.htm
Excluding parameters from Google Analytics reports
You will find numerous URLs with parameters like ?, =, etc. They can be generated automatically by your store navigation or just can occur and you can’t control it. To exclude such params from your reports you can use this regular expression:
\?|=
Note that both inclusion and exclusion filters can be used in one report which is quite handy.
IP addresses exclusion
When a team works on a site, each member open the pages many times a day. This will result in inaccurate data in Google Analytics. The thing is you need to see actions of real users and customers but not team members. You can exclude internal traffic with an IP exclusion filter on a view level.
Make a list of all the internal IPs and create a regular expression from them. For example:
125.10.156.19
125.10.158.19
345.21.67.890
Here is what you can use:
125\.10\.(156|158)\.19|345\.21\.67\.890
Ranges can also be used here but I prefer using more simple but understandable structure. It’s also easy to add or delete any IP address from the list if you need to.
You can add your IP filter in Admin > Filters > New Filter > Custom
Important things to remember
- You should know that view-level filters like IP exclusion (those that change the way your data is collected) cannot be undone: if you’ve mistakenly excluded to many IPs and lost Analytics data for this exclusion period, you won’t get it back by removing the exclusion filter. That’s why you should always have a test GA view to apply different filters here.
- Search filters in reports are safe: you can use them in any Google Analytics view. You can also create and save custom reports with them.
- Your regular expressions should not contain any blanks as any part after a blank is ignored.
- If you need to create a super big regular expression, create it part by part. You can test each part to make sure it works and then combine them using the tube |. This way you won’t need to review huge regular expression in case something goes wrong with it.
- Testing is always the answer. This will help you in creating a particular RegEx for your site.