Select average duration
I have a table
Statistics with info about user’s interaction with objects on my site. The structure of the table is like this:
id | object_id | user_id | interaction_time | interaction_type ----+-----------+---------+---------------------+------------------ 1 | 1 | 1 | 2015-07-08 12:00:00 | opened 2 | 1 | 2 | 2015-07-08 12:10:00 | opened 3 | 1 | 1 | 2015-07-08 12:15:00 | closed 4 | 2 | 3 | 2015-07-08 12:16:00 | opened 5 | 1 | 2 | 2015-07-08 12:17:00 | closed
2015-07-08 12:00:00 and closed it at
2015-07-08 12:10:00 and closed it at
What I want to get is average interaction duration for each object, i.e. for
object_id=1 it should be
(15 minutes (user_id=1) + 7 minutes (user_id=2))/2 = 11 minutes.
Can I do this without creating additional tables?
Note that there maybe glitches when
closed appears without preceding
opened or two
opened, etc. In such cases we should only count consecutive
5 Solutions collect form web for “Select average duration”
One way to do this is to use
cross apply to find the closing rows that follow each opened:
select s.object_id, avg_duration = avg(datediff(minute, s.interaction_time, o.interaction_time)) from [Statistics] s cross apply ( select top 1 * from [Statistics] where s.object_id = object_id and s.user_id = user_id and s.interaction_time < interaction_time and interaction_type = 'closed' order by interaction_time ) o where s.interaction_type = 'opened' group by s.object_id
Note that the average calculation loses fraction as it’s performed on integers, so if you want more precision you might want to use
avg(datediff(minute, s.interaction_time, o.interaction_time) * 1.0)
to force floating point calculation and maybe round the result if needed.
With an index for
(object_id, user_id, interaction_time) I believe this should perform ok (and possibly better than the other working solutions).
Sample SQL Fiddle
This is one approach. However, if you have multiple starts/ends per user and object, then this gets a bit more complicated as you have to inner join to a subset of statistics based on the min interaction time for that user and object that is grater than the one being evaluated in the Opened listing.
So here’s a version less complicated to get average… but if you have multiple open/close per user/object then we have to modify the self join to be a subset.
SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time)) FROM statistics O INNER JOIN Statistics C on O.Object_Id = C.Object_Id and O.user_ID = C.user_Id and O.Interaction_type = 'opened' and C.InteractioN_type = 'closed' GROUP BY OBJECT_ID
This utilizes a correlated subquery to identify the next closed record/time for the user object. As we are using an inner join here any object with an open but no close is ignored. Any object with a close but no open is also ignored.
any objects with TWO opens followed by one close… has BOTH opens evaluated against the same close to determine duration used in calculating the average. If this is undesired behavior, we could modify the coorlated subquery to only look at each valid pairing. I just have to think about that some more.
SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time)) FROM statistics O INNER JOIN (SELECT object_Id, user_Id, min(interaction_Time) interaction_Time, FROM statistics where interaction_Type = 'closed' and interaction_Time > O.Interaction_Time group by object_Id, user_ID) C on O.Object_Id = C.Object_Id and O.user_ID = C.user_Id and O.Interaction_type = 'opened' GROUP BY OBJECT_ID
You can do It in following:
SELECT [object_id], AVG(DATEDIFF(MINUTE, minTime, maxTime)) FROM ( SELECT [object_id], min(interaction_Time) minTime, max(Interaction_Time) maxTime, [user_id] FROM #Test GROUP BY [object_id], [user_id] )x GROUP BY [object_id]
You can do that with the aid of a CTE to assist in an “ordered” self join. That helps guaranteeing no loose ends and only picking consecutive rows.
WITH cteRN(object_id, user_id, itime, itype, RN) AS ( SELECT object_id, user_id, interaction_time, interaction_type, ROW_NUMBER() OVER(PARTITION BY object_id, user_id ORDER BY interaction_time) FROM Interactions ) SELECT cls.object_id, AVG(DATEDIFF(minute, opn.itime, cls.itime)) average_time FROM cteRN cls INNER JOIN cteRN opn ON cls.object_id = opn.object_id AND cls.user_id = opn.user_id AND cls.RN = opn.RN + 1 WHERE cls.itype = 'closed' AND opn.itype = 'opened' GROUP BY cls.object_id
Here’s a working fiddle
In order to handle consecutive opened/closed rows, we can join the Statistics table to itself to make opened/closed pairs. It’s necessary to check that the closed row follows the opened row, and that no other rows for the same object and user exist between the elements of the pair.
Once we have a list of valid pairs, getting an object’s average interaction time from the duration of its pairs is just grouping and aggregating.
SELECT o.object_id, AVG(DATEDIFF(MINUTE, o.interaction_time, c.interaction_time)) FROM [Statistics] o JOIN [Statistics] c ON o.object_id = c.object_id AND o.user_id = c.user_id AND o.interaction_type = 'opened' AND c.interaction_type = 'closed' AND o.interaction_time < c.interaction_time AND NOT EXISTS ( SELECT 1 FROM [Statistics] m WHERE o.object_id = m.object_id AND o.user_id = m.user_id AND m.id > o.id AND m.id < c.id ) GROUP BY o.object_id