Drupal Analytics

jtsnow's picture

Overview: This project will improve the tracking and analysis of website traffic in Drupal.

Description: There is a need for better traffic analysis in Drupal. A possible implementation of this project may involve writing a script separate from Drupal that accesses server logs directly. Other features may include an API to allow contrib modules to log events or a way to send statistics to third party tools, such as Google Analytics.

Some ideas:

  • Provide both client and server side tracking APIs.
  • Access server logs to get statistics for files and requests not served by Drupal (JavaScript, images, other files).
  • Be aware of Drupal's page caching mechanisms and work around those.
  • Track actions- Use actions as 'goals'?
  • Track JavaScript events.
  • Extensibility: Allow analysis plug-ins to be made.
  • Performance is a big factor! Traffic tracking shouldn't produce a too big of a load on the system, whether it is database size or performance when actually loading the page.

Please share your thoughts and ideas.

Mentors:

Difficulty: Hard

I'd like to propose an idea for a project that improves the tracking and analysis of website traffic in Drupal. Here are some thoughts for the project:

Groups:
Login to post comments

Interested in the discussion

dldege's picture
dldege - Mon, 2009-03-23 18:54

We battle this a lot so I'm open to the idea and discussion.

I see pros and cons to trying to move this type of analysis into your own site and databse vs. sending all this sort of data to say google or yahoo who can store the data (which can get to be a lot of rows) forever and provide really polished dashboards and reports. However, I think there are also limitations in Google Analytics.

Another idea for tracking would be email click through for CRM type campaigns, etc.

We have considered looking into integrating Drupal with Mondrian. You might look at that a little and see what you think.

Dan DeGeest
Lead Software Developer
iMed Studios
http://www.imedstudios.com/labs


Good idea! I'm not entirely

kleinmp - Mon, 2009-03-23 21:40

Good idea!

I'm not entirely clear about what you mean by using actions as 'goals', though I am intrigued.


Actions as Goals

jtsnow's picture
jtsnow - Mon, 2009-03-23 22:12

Tools such as Google Analytics allow you to set up goals for your visitors. For example, User Registration, Mailing List Signup, or making a purchase are common goals. Google Analytics will show you things like the path users took to complete the goal or at what point they stopped before completing the goal. This is useful for analyzing, for example, what parts of the checkout process on an e-commerce site need to be improved.

The ability to mark an action as a goal would be a simple way to set goals for visitors.


I've had the same thought

rwohleb@drupal.org's picture
rwohleb@drupal.org - Tue, 2009-03-24 15:10

Google Analytics, for example, is great. However, it's often a lot more than most people need, and can be confusing to use to the average Joe. It would be nice if there was a purely Drupal option that went beyond any current metrics.

The new system could still use a post-request AJAXy tracker, like Google Analytics, to keep page requests responsive. It could hit a new script that falls outside of the Drupal menu callbacks, but instead handles its own bootstrap. The bootstrap would be minimal and just get us access to the base system (eg. DB). If we keep actions down to a simple DB insert, it would limit overhead and table locking issues.

Now that the request is logged, there's still the issue of processing. We could consider the incoming request table as a queue, and do periodic post-processing to 'clear it out' periodically via cron. The post-processing would contain all of the intelligence that considers sessions, goals, metrics, etc. The trick here would be efficiently processing data so that processing the queue table wouldn't lock the table too much on active sites, and would be able to process the queue fast enough to keep it from eternally growing.

On more active sites, where there is more control over the DB, there would be options to increase performance. For example, with MySQL, the InnoDB engine, Insert delayed, Delete Quick, and possible Delete Low_Priority could be used.


On issue with the client

dldege's picture
dldege - Tue, 2009-03-24 15:45

On issue with the client side tracking is that it makes it impossible (at least hard I'm sure there is a solution) to track things like RSS feeds, AJAX only requests, file downloads, and so forth. I think the system should provide both client and server side tracking APIs.

I agree with you about the analysis - this is where Google wins right now however, it can be hard to find what you want in GA and definitely more then needed for many sites but there is a lot of number crunching done on the data. The 24 hour data lag is also a big drag especially during development. Certain metrics always come up, time on page, time on site, etc. that required computation on the raw logs. Perhaps a big consideration for such a project could be module extensibility for creating new analysis options that can be plugged in rather then developing said analysis.

Dan DeGeest
Lead Software Developer
iMed Studios
http://www.imedstudios.com


Excellent comments. I'm glad

jtsnow's picture
jtsnow - Tue, 2009-03-24 22:50

Excellent comments. I'm glad to see that people have already put some thought into this. I definitely like the idea of having both client and server side tracking APIs.

Also an interesting idea about having analysis plugins.


Awesome idea! :)

opensanta's picture
opensanta - Sun, 2009-03-29 19:36

I currently use three separate modules for this. Firestats and Piwik, but I think Statistics Pro would be the most helpful if your goal is to keep the code in Drupal. There are a number of projects already out there for this type of stuff, but I think building on Drupal's contrib repo would yield the coolest result (views, views charts, actions, ajaxy stuff). I'm definitely with you on the hard categorization, so best of luck there.

Your perspective on what information the users after after is impeccable, but the scope of this project seems large, yet doable. I'll be very interested to see how this proposal comes together, and I'll keep an eye out for possible mentors.


Tying it all together

therzog - Thu, 2009-05-07 18:31

The challenge we've been struggling with is tying tracking and analytics from separate sources all together into a "5,000 foot view" that would make sense to non-experts, but still allow deeper dives for people that are really into the tools like google analytics. On our site, some users ask big-picture questions like "what's the trend in site visits" or "what are the most popular site sections" while other people are more in the weeds: "how many page views on my project/story page" or "who is linking to my page"

I agree that google analytics is great, but too much for most people, and it can be cumbersome to use. So part of what we're doing is developing a module that scrapes our analytics account, taking advantage of the fact that nearly every report can be exported to CSV. So our drupal site keeps a cookie so that google thinks it's logged in, and uses curl to send requests like this:

https://www.google.com/analytics/reporting/export?id=YOURACCOUNTID&pdr=2...

(or something like that). Then it parses the result and displays as a table or google chart.

So essentially it's a front-end to google analytics that can be displayed in blocks or pages within drupal. This lets us generate simplified or aggregated analytics data that is still consistent with the "deep dive" reports that we generate, and we can integrate with information from other tools like Digg.

Here is a sample: