In response to my last post, Regular Expressions – Don’t Use Google Analytics Without Them, a few regex neophytes asked if I would expound more on how you actually use regular expressions in Google Analytics. There’s nothing worse than learning a new, cool skill but then not knowing what to do with it on a practical level. So, being a pragmatist at heart, the focus of this post will be common reporting needs that necessitate the use of regex. It’s a jungle out there, and regex can boggle the mind. And there’s only so much you can cover in one post. But I think this post will give you a good representative sample of how you can use regex to squeeze a lot more actionable data from Google Analytics.
Creating Advanced Segments
As I mentioned in my last post, Google Analytics gives you the option to sidestep regular expressions by using the or statement. For example, if you want to see just your traffic from Facebook and Twitter, you could set it up this way:
Caveat: This segment will only represent a sample of traffic you actually receive from Twitter and Facebook. You can read more about this critical step to capture all of your social media traffic in Google Analytics.
Ex 1: Create Social Media Segment
However, if you want to create a segment that collects all social media sites, that would be a lot of or statements. It would be far preferable, in this case, to use regex to create your segment. To create the social media segment we use here at Blueglass, the regex looks like this:
Remember: | means or and \ escapes out the character that follows it, telling Google Analytics to treat it like a regular character (e.g., a dot in a URL) and not a regex character.
Or you can take the shortcut and load this segment into your Google Analytics profile.
Ex 2: Monitor Traffic from Link Partners
Usually website owners engage in link building to have an army of links pointing to their site for the SEO benefits. But let’s say you have a classification of links you’ve pursued or acquired (yeah, we’ll just leave it with that), and you want to know what kind of traffic those links are garnering. You can use this same technique to create a link building bucket of sorts. It would look like this:
You could even get fancy and set up alerts to let you know if traffic from this segment spikes or dips.
Ex 3: Track Searches from Image Search
Let’s say you have a lot of images on your site, and you’ve done you due diligence in optimizing them for organic search. There’s no report baked into Google Analytics that allows you to see how your traffic from image search performs.
Usually, I would gravitate toward matching the source, as we did in the example above. However, in this case I found it easier to write regex to match the referral path instead. As much as I’d love for you to think I’m genius level, it’s really simpler than it sounds. I’ll explain …
First, you should understand that traffic from image search is interpreted by Google Analytics as a referral, not organic, so when you create the segment you will need to match the medium to referral, not organic. And you will find those visits in your Referring Sites report, not under Search Engines. You can change that by modifying your tracking code, but that exceeds the scope of this post.
To find the referral path for the top three search engines (which is just the part of the referring site’s URL after the domain), I first went to Google and performed an image search. I noticed that the referral path started with /imgres. (I know — weird, right? Those crazy Googlers.) I copied that to a text file for safe keeping and repeated my search in Yahoo Images. Its referral path started with /search/images. Rinse and repeat with Bing and noted that its referral path started with /images/search. Then I just separated all of those with pipe characters and voila — a segment is born.
Try it for yourself. Or, as usual, you can forgo the learning opportunity altogether and just copy ours. (Slacker.)
To run a report with your segment, just choose the segment from the Advanced Segments drop-down in the upper-right corner of every report. Then choose the Referring Sites report (under Traffic Sources). You should see something like this, if you’re getting traffic from image search.
You can drill down to see the referral path, which for our first line item looked like this:
Setting Up Goal Tracking
This post will not go step by step through how to set up goals in Google Analytics. If you want to learn how to set up goals, check out this Google Conversion University video. That said, you will sometimes need to use regex when defining a goal or funnel page. Here are just a couple more common examples I’ve run across.
Ex 1: Capturing a Dynamic Conversion Page
Conversion pages can get wonky on ecommerce sites. Sometimes developers just make them overly sophisticated, and other times conversion pages are appended with query parameters that include things like order IDs and the like. But with particularly large ecommerce sites, sometimes the conversion URL changes, depending on things like what area of the site a shopper checks out from.
For example, let’s say your conversion pages look like products/[store department]/thankyou.aspx, your goal URL would be represented this way:
This just means the conversion URL starts with /products/, can contain anything in the sandwiched directory, and ends with thankyou.aspx. In most cases you won’t need the ^ and $, but they’re good safety guards. And if your cart appends query parameters, do not include the $ at the end. Instead, add .* to the end to capture those.
Of course, test your regex to make sure you don’t get undesirable results. If you do, tweak away until you get it perfect.
Ex 2: Combining Pages in a Goal Funnel
Because of the linear nature of goal funnels, sometimes you might have to account for more than one page in a goal funnel step. If you don’t, your funnel abandonment rates will be inflated. I like to check the Funnel Visualization report after enough data has been collected to make sure there aren’t any goal pages I’ve missed, thus causing unnecessary abandonment reporting. If you need to pair up your pages, just use the regex MVP, the pipe character, to separate them.
Create Exclude Filters for an IP Range
Larger companies may own a range of IP addresses. This internal traffic needs to be filtered out of your reports with an exclude filter. So let’s say your company owns 188.8.131.52 to 184.108.40.206. You would add a filter (Profile Settings > Filters Applied to Profiles > Add Filter) that looks like this:
Remember that [0-8] means pick one digit in that range. Also, in the case of IP addresses, I always include the caret and dollar sign because it’s easy to pull in IPs you don’t intend to.
Filter Reports on the Fly
This is one of my fave uses of regex. Oftentimes, when you’re either in forensic or discovery mode, you want to tease out your data to identify the winners and losers. Regex empowers you to do this with Google Analytics report filters.
For example, let’s say you have a couple fair-headed keywords you’ve been optimizing for, and you want to see how they’re performing. You could pull up your organic keyword report (Traffic Sources > Keywords > Filter by non-paid), then enter your keywords into the filter at the bottom of the report, separated by a pipe(s).
Okay, now we’re going to shift gears and get a little more advanced. I’ve had a few cases where client reports were nearly impossible to interpret because of yucky, evil dynamic URLs. If you look at a landing page report and none of the pages make sense, your reports are going to have limited usefulness at the content level. Of course, the best solution is to rewrite those nasty URLs to make them search engine friendly. It’s good for SEO, good for visitors, good for those analyzing your data … everyone wins.
But if, for whatever reason (real or imagined), that’s not feasible, you don’t have to endure unintelligible content reports. You can rewrite the URLs in Google Analytics with a Search and Replace filter. For example, let’s say your contact page looks like this:
This is actually based on a true story. When I’d review this client’s content reports, it was very difficult to find any useful information because of all the gobbledygook — especially since every page on the website looked like this. So I rewrote that page to /contact-us. I chose not to include the top level domain in the rewritten URL (in this case, .asp), so that I’d know which pages in the report were rewritten URLs. But you can certainly include them if you want.
To rewrite a URL, navigate to the filter area as before. I’m not going to explain all the steps, but you can use the screenshot below as a template for your rewrite. In our example, I could tell the user parameter kicked out different values from the Top Content report (by using a line item filter and seeing all the different versions of the page). So here’s the regex I used to capture all those pages:
Let’s break it down:
^: URI starts with /index …
\: Don’t interpret the dot and question mark as regex characters.
With the user value, because all of the values were no more than two digits, I could have made the more regex more specific. Regex wildcards can go all Cookie Monster and eat up a lot of resources. If I wanted to be more specific (less lazy), I could have used:
This just means the parameter value will be at least one digit but no more than two.
Here’s what filter looked like:
Note: If you rewrite URLs that are included in goal steps or conversions, you need to update them in your goal funnels. For example, since this client included visits to the contact page as a micro conversion for his site, I had to update the goal conversion page to /contact-us. Also, before creating filters, make sure you have a profile with just your raw data. To learn how to create profiles, check out this Google Conversion University video.
Tip of the Iceberg
I use a panoply of regex for my research — almost daily. The reality is the more comfortable you are with regex, the more flexibility you’ll have in data spelunking. I’ve narrowed visitor reports to a group of cities or group of mobile devices (e.g., iphone, ipod, ipad), landing page reports to a couple of directories (e.g., combine the /articles/ and /news/ directories), AdWords reports by destination URLs or ad groups, etc.
Like I said before, I’m not a programmer; I’m an analyst. If I can learn this, [just about] anyone can. Once you get out there and start using regular expressions, you’ll get addicted to the efficiency they afford. Now, when I’m talking to a client about a reporting issue, I just think in terms of how I can best customize their setup to give me the ability to extract the real signals from the noise.
If you have any cool segments that you’d like to share with the class, please include them in the comments. If you don’t know how to do that, I wrote a post on how to share segments and reports in Google Analytics on my personal blog. (It’s one of Google’s best-kept secrets.) We’d also love for you to connect with us on Facebook or Twitter.
Also, if there’s anything you want to learn more about with Google Analytics let me know in the comments below or on Twitter.