Complicated SQL query for a running total column

I’m trying to work out a pretty complex query in SQL Server 2008. I’d like some input from SQL experts here.

Imagine I had a Payments table with these fields:

  • how to find the version and edition of sql server instance via registry
  • Maintain users data integrity across multiple databases for ASP.NET
  • Using Temp table in A VIEW
  • Can't insert complex XML file into SQL Server table
  • Is there a way to directly compres/zips the result from a SQL query?
  • how to use sqltransaction in c#
  • PaymentID int,
    CustomerID int,
    PaymentDate datetime,
    Amount decimal

    So essentially, it is a table of payments made by a customer on specific dates. An important thing to note is that in some cases, a payment amount can be a negative value. So, over time, the total amount paid by any given customer, can go up or down.

    What we’re trying to figure out is the SQL to calculate the high point of the total amount paid per customer.

    So, if Fred made 3 payments: first for $5, second for $5, third for -$3. The report will show that Fred’s peak total paid amount was $10 (on his second payment), and his final paid amount was $7.

    We need to run this report for a hundred thousand customers (who’ve potentially made a hundred to a thousand payments each), so it’s got to be fast.

    Is there a good way to structure this query without storing the running totals in the db? We’d like to avoid storing precalculated values if at all possible.

    5 Solutions collect form web for “Complicated SQL query for a running total column”

    Your question seems to be this:

    SELECT CustomerID, SUM(Ammount) FROM table WHERE Amount > 0 GROUP BY CustomerID
    SELECT CustomerID, SUM(Ammount) FROM table GROUP BY CustomerID
    

    However, I think you mean that you want a table that appears like this

    Customer  Payment  HighPoint  RunningTotal
    123       5        5          5
    123       5        10         10
    123       -3       10         7
    

    In which case I would create a view with the two selects above so that the view is something like.

    SELECT CusotmerID, 
      PaymentDate, 
      Ammount, 
      (SELECT SUM(Ammount) 
        FROM table as ALIAS 
        WHERE ALIAS.Amount > 0 
          AND ALIAS.PaymentDate <= PaymentDate 
          AND ALIAS.CustomerID = CustomerID), 
      (SELECT SUM(Ammount) 
        FROM table as ALIAS 
        WHERE ALIAS.CustomerID = CustomerID 
        AND ALIAS.PaymentDate <= PaymentDate)
    FROM table
    

    Also, you may consider a non-unique index on the Amount column of the table to speed up the view.

    The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it’s probably not going to be fast.

    If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial — but this is what data marts are for.

    As an alternative to subqueries, you can use a running total query. Here’s how I set one up for this case. First create some test data:

    create table #payments (
        paymentid int identity,
        customerid int,
        paymentdate datetime,
        amount decimal
    )
    
    insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00)
    insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00)
    insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00)
    insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00)
    insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00)
    insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00)
    insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00)
    insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)
    

    Now you can execute the running total query, which calculates the balance for each customer after each payment:

    select cur.customerid, cur.paymentdate, sum(prev.amount)
    from #payments cur
    inner join #payments prev
        on cur.customerid = prev.customerid
        and cur.paymentdate >= prev.paymentdate
    group by cur.customerid, cur.paymentdate
    

    This generates data:

    Customer  Paymentdate        Balance after payment
    1         2009.01.01         1
    1         2009.01.02         3
    1         2009.01.03         2
    1         2009.01.04         4
    1         2009.01.05         1
    2         2009.01.01         10
    2         2009.01.02         5
    2         2009.01.03         12
    

    To look at the maximum, you can do a group by on the running total query:

    select customerid, max(balance)
    from (
        select cur.customerid, cur.paymentdate, balance = sum(prev.amount)
        from #payments cur
        inner join #payments prev
            on cur.customerid = prev.customerid
            and cur.paymentdate >= prev.paymentdate
        group by cur.customerid, cur.paymentdate
    ) runningtotal
    group by customerid
    

    Which gives:

    Customer   Max balance
    1          4
    2          12
    

    Hope this is useful.

    list = list of amounts ordered by date
    foreach in list as amount
      running += amount
      if running >= high
        high = running
    

    To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).

    I don’t think you can do this type of thing without code (stored procedures are code)

    like Andomar’s answer. You can do the running total for each payment. Then find the max peak payment…

    with
    rt as (
      select
        Payments.*,
        isnull(sum(p.Amount), 0) + Payments.Amount as running
      from
        Payments
        left outer join Payments p on Payments.CustomerID = p.CustomerID
          and p.PaymentDate <= Payments.PaymentDate
          and p.PaymentID < Payments.PaymentID
    ),
    highest as
    (
      select
        CustomerID, PaymentID, running as peak_paid
      from
        rt
      where
        PaymentID = (select top 1 rt2.PaymentID 
            from rt rt2 
            where rt2.CustomerID = rt.CustomerID
            order by rt2.running desc, rt2.PaymentDate, rt2.PaymentID)
    )
    
    select
      *,
      (select sum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid  
    from
      highest;
    

    however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.

    Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.