This document briefly describes the most common forms of analytics tools in use in both traditional digital publishing and academic publishing, with an emphasis on what they are designed to measure and what goals they are designed to serve. Additionally, it includes a feature comparison of 3 different analytics products evaluated by the PubPub team in early 2020 ahead of a transition from using Keen to Heap for both product and web analytics.
Digital Publishing Analytics Landscape Overview
Web Analytics
Most standard web analytics tools provide a relatively consistent set of metrics that track user behavior on a website. These tools tend to be geared for media and e-commerce and used by marketers. So, they place an emphasis on understanding how users come to a site and what pathways they took along the way to converting on a defined “goal,” often purchasing a product. These tools also tend to integrate with ad buying platforms, to link ad campaigns to on-site goal conversion.
Examples: Google Analytics, Adobe Omniture, Chartbeat
Product Analytics
Product analytics tools collect similar data to web analytics tools, but are geared for people building software that is itself a product (i.e., PubPub), and thus focus more on allowing product managers to probe how users engage with specific features of a web product to determine what features are most valuable to users. These tools tend measure more fine-grained (but privacy invasive) interactions than web analytics tools like complex interactions with specific site elements, user behavior patterns over time, etc.
Examples: Heap, Mixpanel, Keen
Social Analytics
Because web analytics are collected from on-site activity only, but much traffic today comes from conversation happening on social platforms, a number of companies offer platforms to track activity on social media. Social media companies make money by selling advertisements against the data they collect. So, they tend to limit access to data about content posted on their platforms. Social analytics tools fill in those gaps.
These fall into two main categories.
Campaign Analytics
Most social media sites will give you fairly extensive data on posts you make to your own accounts on social sites. Campaign analytics tools often include schedulers, and will automatically collect and organize metrics across your accounts. Depending on the network and what analytics they provide, these metrics typically include:
Post reach (paid and unpaid)
Post views
Locations the post was seen (newsfeed vs. sidebar vs. recommendation)
Engagements with the post (likes, comments, shares, favorites, etc.)
Clicks on the post, if it’s a link
Video views, if it’s a video
Video view duration, if it’s a video
Frequency
Examples: Hootsuite, Sprout Social
Social Monitoring
These tools specialize in monitoring all social media posts and giving clients the ability to “listen” for posts that contain URLs or phrases related to the client’s content. This is the only way to know, for example, if someone posted about your article on Twitter without tagging you. The metrics these tools can provide is severely limited by most social media networks, but they can typically tell you:
Number of times a link was shared
Number of engagements (likes, shares, comments, etc.)
Sentiment of posts, decided by an NLP algorithm
These tools often provide news monitoring as well by tracking Google News, LexisNexis, etc.
Examples: Brandwatch, Crowdtangle, Awario
Alternative Scholarly Metrics Landscape Overview
Altmetrics platforms attempt to devise normalized metrics for scholarly work based on a combination of social, news, and citation monitoring.
Dimensions
Dimensions queries its database for a given DOI and provides the following metrics:
Total citations
Recent citations (citations in the last two years)
Field Citation Ratio (for articles over two years old, the relative citation performance of an article, when compared to similarly-aged articles in its subject area, where 1.0 is average)
Relative Citation Ratio (for articles over two years old, the relative citation performance of an article, when compared to other articles in its area of research, normalized to 1.0 against all NIH-funded articles in Dimensions)
Citing research categories (which categories most frequently cite the article)
Altmetric
Altmetric monitors a number of sources for mentions of a given article (by DOI, URL, and Title) and attempts to calculate a weighted “attention” score that shows how much interest the article has gotten that takes into account volume, types of sources, and quality of authors of mentions. They use the following sources:
Public policy documents
Mainstream media via a manually curated list of RSS feeds
Blogs via a manually curated list of RSS feeds
Citations, via Dimensions
Online reference managers, via Mendeley, including sharing demographic data of people who have cited your work from Mendeley
Post-publication peer review from Pubpeer and Publons
English-language Wikipedia citations
Open Syllabus Project data
Patents, via IIFI Claims
Research Highlights via F1000Prime
Social media
Facebook (mentions on manually curated list of public pages)
Twitter
LinkedIn, Google+, Sin Weibo, Pinterest History
Other platform monitoring
YouTube
Reddit
Stack Overflow
Library Analytics
There’s a final class of analytics that libraries employ to understand the usage of digital collections and make purchasing decisions about different collections based on that usage. This mostly falls outside our remit, as it involves reporting requests for content organized by the access policy for the content, but we may want to implement counter feeds if we build deeper integrations into library systems.
The standard for this type of data is the COUNTER system, defined here.
KFG’s Initial Rough Ideas
Research to find out what audience metrics actually predict certain forms of impact for the scholarly community, and which ones we can safely not collect.
Combine ethically collected on-site data about user behavior (time on page, scroll depth, etc.) with monitoring from Crossref, social media, news sites, etc.
Survey a sample of users using interactive widgets and compare with behavioral metrics to see if certain behavioral metrics predict key impacts like understanding, mind change, etc.
Allow authors/admins to manually add reports of qualitative impact (i.e., a classroom invited me to speak) to their articles’ impact sections.
Use PubPub’s article history system to more quantitatively track comments and reviews, and display metrics like “review coverage” to readers, or benchmark article impacts/retractions/corrections against the type and quality of reviews they received.
O/S analytics tools
PubPub Vendor Feature Comparison
The following is a feature comparison of Google Analytics (a commonly requested vendor), Keen (what PubPub currently uses) and Heap (the vendor PubPub just switched to). Note that because we have access to underlying data for Keen and Heap, there are a number of metrics labeled “to define” that we have the data to display, but have not defined yet.
| Google Analytics | Keen | Heap |
---|
Type | Audience/Ecommerce | Audience/Product | Product |
---|
Customizable Dashboards | Yes | Yes, with PubPub eng | Yes, with PubPub eng |
---|
Raw data access | No | Yes | Yes |
---|
User-definable metrics | No | Yes | Yes |
---|
Users | Unique users who have initiated a session within selected time; identifier can be set by Google or by admin | Unique users who have initiated a session within selected time; identifier set by Keen and PubPub user ID when logged in | Unique users who have initiated a session within selected time; identifier set by Heap and PubPub user ID when logged in |
---|
Sessions | Period of engagement by user until 30 minutes of inactivity | To define | A session is a period of activity from a single user in your app or website. It can include many pageviews or events. On web, a session ends after 30 minutes of pageview inactivity from the user. On mobile, a session ends after 5 minutes of inactivity, regardless of whether the app's background or foreground state. |
---|
Bounce Rate | % of single-page sessions where no page interaction occured | To define | To define |
---|
Session Duration | Period of time from first event recorded to last event recorded within a session | To define | No |
---|
Time on Page | No | Average length of time spent on pages during the selected time | No |
---|
Pageviews | Total number of views of pages during period, including repeated views | Total number of views of pages during period, including repeated views | Total number of views of pages during period, including repeated views |
---|
New Users | Users visiting in this time period who have not been seen before | To define | To define |
---|
Language | Set from browser setting | Set from browser setting | Set from browser setting |
---|
Location | Set from IP address | Set from IP address | Set from IP address |
---|
System | Set from browser agent string | Set from browser agent string | Set from browser agent string |
---|
Device | Set from browser agent string | Set from browser agent string | Set from browser agent string |
---|
Interests | Google Ads network | No | No |
---|
Frequency | # of sessions per user after first session | To define | To define |
---|
Recency | # of days since last session during the time frame | To define | To define |
---|
Page Depth | Number of sessions for which a user visited at least X pages | To define | To define |
---|
Referrer | Set from HTTP request header | Set from HTTP request header | Set from HTTP request header |
---|
Search Query | Set from HTTP request header | Set from HTTP request header | Set from HTTP request header |
---|
Campaign Tags | Set from URL segments | Set from URL segments | Set from URL segments |
---|
Custom Events | Defined by admin; ex ante | Defined by admin; ex ante | Defined by admin; post hoc |
---|
Custom Variables | Yes | Not really | Yes |
---|
Conversion Pathing | Defined by admin; ex ante | Defined by admin; ex ante | Defined by admin; post hoc |
---|
Site Search | | No | Yes |
---|
Realtime | Yes | No | Not really |
---|
Demographics | From Google Ad tracking | No | No |
---|
Interests | From Google Ad tracking | No | No |
---|
Benchmarks | From Google Ad tracking | To define | To define |
---|
Google Search Console | Links Google Search Console to analytics. Must opt in to ad-driven features. | No | No |
---|
E-Commerce Funnels | Defined by admin; ex ante | Defined by admin; ex ante | Defined by admin; post hoc + integrations |
---|
Scroll Height | | Collecting; to define | Can be collected |
---|
Includes PubPub Data | No | Yes | Yes |
---|
Audience Segmentation | Defined by admin; post hoc | Not really | Defined by admin; post hoc |
---|
Period over Period | Yes | Not really | Yes |
---|
Email Reports | Must login | No | Yes |
---|
Report sharing | Limited | No | Yes |
---|
3rd-Party Integrations | No | No | Yes |
---|