Analytics Query Language (GAQL)
The ShufflePoint Analytics Query Language (GAQL) is used to create Excel® and PowerPoint® documents with refreshable Google™ Analytics data. It is also the basis of ShufflePoint's Google Analytics with Google Gadgets integration.
GAQL is in the genre of the OLAP query language MDX. GAQL, like MDX, allows a query to specify both the data to return and the “geometry” of the result by assigning dimensions and metrics to axes. This is implemented in GAQL using the same “ON ROWS” and “ON COLUMS” as in MDX.
Related Quick Start Guides
- GAQL Query Samples
- GA with Excel® using IQY Files - Mac and Windows
- GA with Excel using Web Queries - Windows Only
- GA with iGoogle
- GA with Google Gadgets
- GA with PowerPoint
- GA with PowerPoint - Custom Templates
Basic Syntax
The basic syntax of GAQL is:
SELECT
<metrics or dimensions> ON COLUMNS <top/by>
<metrics or dimensions> ON ROWS <top/by>
FROM
<profile list>
WHERE
<optional metrics list>
<optional timeframe>
<optional filters>
This query will return the tabular result shown below:
SELECT
METRICS ga:visits, ga:avgPageviews, ga:bounceRate ON COLUMNS
DIMENSIONS ga:continent ON ROWS BY ga:visits
FROM
default
WHERE
TIMEFRAME yearToDate
| Continent | Visits | Pages/Visit | Bounce Rate |
|---|---|---|---|
| Americas | 143 | 2 | 60.14% |
| Europe | 93 | 2 | 67.74% |
| Asia | 27 | 2 | 44.44% |
| Oceania | 11 | 2 | 63.64% |
| Africa | 1 | 2 | .00% |
Keywords
The following are reserved keywords in GAQL:
SELECT, COLUMNS, ON, ROWS, FROM, FORMAT, WHERE, DIMENSIONS, METRICS, TIMEFRAME, FILTER, BY, ASC, DESC
GAQL keywords are not case sensitive
Axis Expressions
Expressions with “ON ROWS” and “ON ROWS” are axis expressions. They are used to specify which dimensions or metrics should become the column and row headings of the result.
Things to note about axis expressions
- A given axis may contain a list of metrics or a single dimension
- Names within a list are separated by commas
- The metric and dimension names must have a “ga:” prefix (in the case of Google names) or an “sp:” prefix (in the case of ShufflePoint names)
- A given axis can have either metrics or a dimension
- Currently, only one dimension is allowed on a given axis. This limitation will be relaxed at some point.
The dimensions and metrics available witin GAQL are based on the ones supported by Google Analytics API. These are documented on this Google reference page .
Timeframe Expressions
A GAQL query must contain a timeframe. The timeframe can be specified as a named timeframe or by specifying a start and end date.
A timeframe specified with start and end dates takes the following form.
SELECT ...
WHERE
TIMEFRAME 2009-03-10:2009-04-10
Like profiles, “DEFAULT” can be specified to say “use the
default I have specified for this report or session”.
The named timeframes which GAQL recognizes are listed in the table below.
| Timeframe name | Meaning |
|---|---|
| TODAY | data for the current day |
| YESTERDAY | Yesterday |
| YESTERDAY2 | Two days ago |
| LASTWEEK | Last week |
| LASTWEEK2 | Two weeks ago |
| LAST10DAYS | Last 10 days |
| LAST30DAYS | Last 30 days |
| LAST60DAYS | Last 60 days |
| THISMONTH | This month |
| LASTMONTH | Last month |
| LASTMONTH2 | Two months ago |
| YEARTODATE | Year to date |
| LASTYEAR | Last 12 months |
From Expressions
The keyword “FROM” is followed by either the keyword “DEFAULT” or by a list of profiles. “DEFAULT” says “use the default I have specified for this report or session”. Defaults are a powerful feature in ShufflePoint that allow queryies and containing reports to easily be reused.
Profiles may be specified in several formats. The shortest format is to provide the profile ID, as in:
SELECT ...
FROM 43594
This means that the query will get data for the profile with the id “43594”. A profile name may also be specified by name. Named profiles must be wrapped by double quote or square brackets:
SELECT ...
FROM [www.shufflepoint.com]
Since profile names in Google Analytics are NOT guaranteed to be unique, the GAQL engine will use the first profile in your account which matches the provided name. If you want to be certain that you are using the correct profile, but also wish to see the name in the query, then use the format:
SELECT ...
FROM [43594].[www.shufflepoint.com]
Dimensions on each axis (pivots)
You can place a dimension the columns axis and a different dimension on the rows axis. In database and Excex speak, this is referred to as a pivot. Pivoting will also soon be available in the Google Analytics portal (but we had it first!). When you have dimensions on both axes, you need to specify a metric in the WHERE clause.
An good example would be to ask for unique visitors by continent for the last ten days:
SELECT
DIMENSIONS ga:continent ON COLUMNS
DIMENSIONS ga:date ON ROWS
FROM
default
WHERE
METRICS ga:uniquePageviews
TIMEFRAME last10days
Note that currently GAQL only supports a single metric in the WHERE clause.
Two row axis dimensions
GAQL queries in Gadget data feeds and in Excel Web Queries (interactive and IQY) support having more than one row axis dimension. In our next release cycle we will be adding support for this to the query tool.
SELECT
METRICS ga:visits ON COLUMNS
DIMENSIONS ga:pagePath, ga:pageTitle ON ROWS
FROM
[www.shufflepoint.com]
WHERE
TIMEFRAME default
Sorting and TOP
When the ROWS axis contains a dimension, you can sort the rows in the result table by a metric by using a sort expression on the row axis. The full syntax for such an axis is:
<metrics list> ON COLUMNS
<dimension> ON ROWS TOP <n> BY <metric> [ASC|DESC]
This query will return the tabular result shown below:
SELECT
METRICS ga:visits, ga:avgPageviews, ga:bounceRate ON COLUMNS
DIMENSIONS ga:country ON ROWS TOP 10 BY ga:visits DESC
FROM
default
WHERE
TIMEFRAME yearToDate
| Country/Territory | Visits | Pages/Visit | Bounce Rate |
|---|---|---|---|
| United States | 2,275 | 4 | 39.87% |
| United Kingdom | 297 | 3 | 50.84% |
| Netherlands | 191 | 3 | 42.93% |
| France | 180 | 3 | 50.56% |
| India | 170 | 2 | 55.29% |
| Spain | 156 | 2 | 53.85% |
| Canada | 149 | 3 | 57.72% |
| Japan | 144 | 2 | 59.72% |
| Germany | 142 | 3 | 41.55% |
| Brazil | 141 | 3 | 51.77% |
Node that GAQL does not yet support sorting on calculated metrics.
Filter Expressions
The Google Analytics API supports filter expressions which act to limit the data which is returned - either by metric value thresholds or by dimension member values. GAQL passes specified filter expressions directly to Google, so they are not documented here. See this Google reference page for details on the supported filters.
An example filter expressed in a GAQL query is
SELECT ...
WHERE
FILTER ga:pageviews > 10 AND ga:browser != "Safari"
Note that the right side of the filter expression must be placed within double quotes or square brackets.
SELECT ...
WHERE
FILTER ga:country == "United States"
Format Function
The format function uses a format string and one or more metric parameters.
SELECT ...
FROM
default
WHERE
METRICS FORMAT("{0:d} visits so far this year",ga:visits)
TIMEFRAME yearToDate
The format function can also take a dimension name as a third argument. If present, the second placeholder in the format expression will get the value of the count of the returned dimension members. So the following GAQL query would return a single cell with the string “5124 visits from 96 countries so far this year.”
SELECT
FROM
default
WHERE
METRICS FORMAT("{0:d} visits from {1:d} countries so far this year",ga:visits,ga:country )
TIMEFRAME yearToDate
Note that the format() function currently only supports the generation of a single text string result in a WHERE clause.
String literals in metric lists
Lists of metrics in an axis expression may contain string literals. An example of such a list is shown in the following query:
SELECT
METRICS ga:visits, "visits to the site, with an average of", ga:avgPageviews, "pageviews" ON COLUMNS
FROM
default
WHERE
TIMEFRAME lastWeek
String literals are very useful in the context of populating PowerPoint text placeholders because each cell in the GAQL result table will fill the formatted runs you define in ShufflePoint Studio. So the above query might be styled as:
56 visits to the site, with an average of 3 pageviews
Formatting Metrics
The default display formatting of a metric is specified by the Google API or by ShufflePoint if it is a calculated metric. For example, the display format for ga:bounceRate is 0.00%. One of the features of GAQL is the ability to specify a format override within the query. An example column expression without an override is:
METRICS ga:visits, ga:bounceRate ON COLUMNS
To format the bounce rate with no decimals, one would change it to:
METRICS ga:visits, ga:bounceRate:{0%} ON COLUMNS
Special percent formatting
If a count metric is formatted as a percent, then ShufflePoint will interpret this as a request to display the percent of total contributed by the metric for each dimension member.
SELECT
METRICS ga:visits, ga:visits:{%} on columns
DIMENSIONS ga:visitorType ON ROWS
FROM
default
WHERE
TIMEFRAME default
| Visitor Type | Visits | % Visits |
|---|---|---|
| New Visitor | 1453 | .867981 |
| Returning Visitor | 221 | .132019 |
Dimension Count
If a dimension is placed in a METRICS expression, then the dimension is converted to a metric of "count of dimension members". For example, the following query returns a value of 44 for languages. This is a count of the number of different languages of the visitors for the designated timeframe.
SELECT
METRICS ga:visits, ga:language ON COLUMNS
FROM
default
WHERE
TIMEFRAME default
| Profile | Visits | Language |
|---|---|---|
| 70442 | 1674 | 44 |
Shifting time periods
When a time period dimension such as ga:month or ga:week is sent to the Google API, the result set is always ordered by calendar and beginning at first day of the year. So if in June, 2009 you query:
SELECT
METRICS ga:visitors ON COLUMNS
DIMENSIONS ga:month ON ROWS
FROM
default
WHERE
TIMEFRAME lastYear
You will find that your months are ordered like so:
| Month | Visits |
|---|---|
| Jan 2009 | 1656 |
| Feb 2009 | 1871 |
| March 2009 | 2083 |
| April 2009 | 1721 |
| May 2009 | 1505 |
| June 2008 | 1593 |
| July 2008 | 1398 |
| Aug 2008 | 1428 |
| Sept 2008 | 1410 |
| Oct 2008 | 1550 |
| Nov 2008 | 1708 |
| Dec 2008 | 1501 |
You may prefer to display the rows starting with June 2008 and ending with May 2009. This can be accomplished by adding a shift argument to the month dimension:
DIMENSIONS ga:month:5 ON ROWS
This shifts the rows to achieve the desired sequence of months.
Transform clause
The new TRANSFORMS clause in GAQL is used to apply rollup operations on returned results. Currently the only transform operation supported is REPLACE. This operation does regular expression driven rollups of queries with the pagePath dimension. The following example is from our top pages blog article:
TRANSFORMS
REPLACE [(/|/Default\.aspx)$] WITH [/Default.aspx]
SELECT
METRICS ga:visits ON COLUMNS
DIMENSIONS ga:pagePath, ga:pageTitle ON ROWS TOP 10 NOOTHER BY ga:visits
FROM
[www.shufflepoint.com]
WHERE
TIMEFRAME default
The TRANSFORM says "find pages that end in "/" or in "/default.aspx" and aggregate them to the "/default.aspx" version. Without the transform clause present, the first few rows would be:
| Page path | Page title | Visits |
|---|---|---|
| /GoogleAnalytics.aspx | Features | 367 |
| / | ShufflePoint, Inc. | 275 |
| /Default.aspx | ShufflePoint, Inc. | 117 |
| /help/QuickStartExcelWebQuery.aspx | Quick Start - Excel Web Queries | 115 |
With the transform present, the 2nd ("/") and 13th ("/default.aspx") entries are merged with the 3rd entry ("/Default.aspx"). The idea is that these three entries are really the same page and so should only have a single row present in the result set, and ordering should be based upon the rolled up metric values from those three entries.
| Page path | Page title | Visits |
|---|---|---|
| /GoogleAnalytics.aspx | Features | 367 |
| /Default.aspx | ShufflePoint, Inc. | 408 |
| /help/QuickStartExcelWebQuery.aspx | Quick Start - Excel Web Queries | 115 |
NOOTHER qualifier
By default, GAQL will add an "other" row to a result set with a TOP clause. So the query:
SELECT
METRICS ga:avgPageDuration ON COLUMNS
DIMENSIONS ga:continent ON ROWS top 4 by ga:avgPageDuration
FROM
default
WHERE
TIMEFRAME default
will be run as
SELECT
METRICS ga:avgPageDuration ON COLUMNS
DIMENSIONS ga:continent ON ROWS top 4 BottomOther by ga:avgPageDuration
FROM
default
WHERE
TIMEFRAME default
because "BottomOther" is the implicit default value for the OTHER qualifier. The result set will have an "other" row added, and it will be the 4th row:
| Continent | Avg. Time on Page |
|---|---|
| Americas | 00:01:23 |
| Europe | 00:01:00 |
| Asia | 00:01:35 |
| (other) | 00:01:13 |
If you do not wish to have the "other" row added, add the "NoOther" qualifier to the query after the TOP expression:
SELECT
METRICS ga:avgPageDuration ON COLUMNS
DIMENSIONS ga:continent ON ROWS top 4 NoOther by ga:avgPageDuration
FROM
default
WHERE
TIMEFRAME default
| Continent | Avg. Time on Page |
|---|---|
| Americas | 00:01:23 |
| Europe | 00:01:00 |
| Asia | 00:01:35 |
| Oceania | 00:01:01 |
Header qualifiers
There may be times that you need a result set without row or column headings. This could be useful, for example, if you were merging several queries into one big table in Excel and you wish to avoind duplicate column headings. The "NoRowHeaders" and "NoColumnHeaders" qualifiers serve this purpose. For example, the query below will result in the table below.
SELECT NoRowHeaders
metrics ga:uniquePageviews, ga:bounceRate on columns
dimensions ga:country on rows top 5 by ga:pageviews
FROM
default
WHERE
TIMEFRAME default
| Unique Pageviews | Bounce Rate |
|---|---|
| 873 | .744015 |
| 208 | .747664 |
| 162 | .753247 |
| 151 | .758621 |
| 142 | .604167 |