Select average duration

I have a table Statistics with info about user’s interaction with objects on my site. The structure of the table is like this:

 id | object_id | user_id | interaction_time    | interaction_type 
----+-----------+---------+---------------------+------------------
 1  |     1     |    1    | 2015-07-08 12:00:00 |       opened
 2  |     1     |    2    | 2015-07-08 12:10:00 |       opened
 3  |     1     |    1    | 2015-07-08 12:15:00 |       closed
 4  |     2     |    3    | 2015-07-08 12:16:00 |       opened
 5  |     1     |    2    | 2015-07-08 12:17:00 |       closed

So the user_id=1 opened object_id=1 at 2015-07-08 12:00:00 and closed it at 2015-07-08 12:15:00, user_id=2 opened object_id=1 at 2015-07-08 12:10:00 and closed it at 2015-07-08 12:17:00.

  • SQL Server : convert Julian Date to YYYY-MM-DD
  • Can't login to SQL Server using SQL Instance Name
  • Can't create stored procedure with table output parameter
  • The multi-part identifier could not be bound
  • Differences between MySQL and SQL Server
  • Using Smo.Backup to backup SQL Server database to string
  • What I want to get is average interaction duration for each object, i.e. for object_id=1 it should be (15 minutes (user_id=1) + 7 minutes (user_id=2))/2 = 11 minutes.

    Can I do this without creating additional tables?

    Note that there maybe glitches when closed appears without preceding opened or two opened, etc. In such cases we should only count consecutive opened and closed.

    5 Solutions collect form web for “Select average duration”

    One way to do this is to use cross apply to find the closing rows that follow each opened:

    select 
        s.object_id, 
        avg_duration = avg(datediff(minute, s.interaction_time, o.interaction_time))
    from [Statistics] s 
    cross apply (
        select top 1 * from [Statistics] 
        where s.object_id = object_id 
          and s.user_id = user_id
         and s.interaction_time < interaction_time
         and interaction_type = 'closed'
         order by interaction_time
    ) o
    where s.interaction_type = 'opened'
    group by s.object_id
    

    Note that the average calculation loses fraction as it’s performed on integers, so if you want more precision you might want to use

    avg(datediff(minute, s.interaction_time, o.interaction_time) * 1.0)
    

    to force floating point calculation and maybe round the result if needed.

    With an index for (object_id, user_id, interaction_time) I believe this should perform ok (and possibly better than the other working solutions).

    Sample SQL Fiddle

    This is one approach. However, if you have multiple starts/ends per user and object, then this gets a bit more complicated as you have to inner join to a subset of statistics based on the min interaction time for that user and object that is grater than the one being evaluated in the Opened listing.

    So here’s a version less complicated to get average… but if you have multiple open/close per user/object then we have to modify the self join to be a subset.

    SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time))
    FROM statistics O
    INNER JOIN Statistics C
      on O.Object_Id = C.Object_Id
     and O.user_ID = C.user_Id
     and O.Interaction_type = 'opened'
     and C.InteractioN_type = 'closed'
    GROUP BY OBJECT_ID
    

    This utilizes a correlated subquery to identify the next closed record/time for the user object. As we are using an inner join here any object with an open but no close is ignored. Any object with a close but no open is also ignored.
    any objects with TWO opens followed by one close… has BOTH opens evaluated against the same close to determine duration used in calculating the average. If this is undesired behavior, we could modify the coorlated subquery to only look at each valid pairing. I just have to think about that some more.

    SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time))
    FROM statistics O
    INNER JOIN 
    (SELECT object_Id, user_Id, min(interaction_Time) interaction_Time,  
     FROM statistics where interaction_Type = 'closed'
     and interaction_Time > O.Interaction_Time
     group by object_Id, user_ID) C
      on O.Object_Id = C.Object_Id
     and O.user_ID = C.user_Id
     and O.Interaction_type = 'opened'
    GROUP BY OBJECT_ID
    

    You can do It in following:

    SELECT [object_id], AVG(DATEDIFF(MINUTE, minTime, maxTime))
    FROM (
        SELECT [object_id], min(interaction_Time) minTime, max(Interaction_Time) maxTime, [user_id]
        FROM #Test
        GROUP BY [object_id], [user_id]
    )x
    GROUP BY [object_id]
    

    You can do that with the aid of a CTE to assist in an “ordered” self join. That helps guaranteeing no loose ends and only picking consecutive rows.

    WITH cteRN(object_id, user_id, itime, itype, RN) AS (
      SELECT object_id, user_id, interaction_time, interaction_type,
        ROW_NUMBER() OVER(PARTITION BY object_id, user_id ORDER BY interaction_time)
      FROM Interactions
    )
    SELECT cls.object_id, AVG(DATEDIFF(minute, opn.itime, cls.itime)) average_time
    FROM cteRN cls INNER JOIN cteRN opn
      ON cls.object_id = opn.object_id AND cls.user_id = opn.user_id AND cls.RN = opn.RN + 1
    WHERE cls.itype = 'closed' AND opn.itype = 'opened'
    GROUP BY cls.object_id
    

    Here’s a working fiddle

    In order to handle consecutive opened/closed rows, we can join the Statistics table to itself to make opened/closed pairs. It’s necessary to check that the closed row follows the opened row, and that no other rows for the same object and user exist between the elements of the pair.

    Once we have a list of valid pairs, getting an object’s average interaction time from the duration of its pairs is just grouping and aggregating.

    SELECT o.object_id, AVG(DATEDIFF(MINUTE, o.interaction_time, c.interaction_time))
    FROM [Statistics] o
    JOIN [Statistics] c
      ON o.object_id = c.object_id AND o.user_id = c.user_id
     AND o.interaction_type = 'opened' AND c.interaction_type = 'closed'
     AND o.interaction_time < c.interaction_time
     AND NOT EXISTS (
        SELECT 1 FROM [Statistics] m
        WHERE o.object_id = m.object_id AND o.user_id = m.user_id
          AND m.id > o.id AND m.id < c.id
     )
     GROUP BY o.object_id
    
    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.