Complicated SQL query for a running total column
I’m trying to work out a pretty complex query in SQL Server 2008. I’d like some input from SQL experts here.
Imagine I had a Payments table with these fields:
So essentially, it is a table of payments made by a customer on specific dates. An important thing to note is that in some cases, a payment amount can be a negative value. So, over time, the total amount paid by any given customer, can go up or down.
What we’re trying to figure out is the SQL to calculate the high point of the total amount paid per customer.
So, if Fred made 3 payments: first for $5, second for $5, third for -$3. The report will show that Fred’s peak total paid amount was $10 (on his second payment), and his final paid amount was $7.
We need to run this report for a hundred thousand customers (who’ve potentially made a hundred to a thousand payments each), so it’s got to be fast.
Is there a good way to structure this query without storing the running totals in the db? We’d like to avoid storing precalculated values if at all possible.
5 Solutions collect form web for “Complicated SQL query for a running total column”
Your question seems to be this:
SELECT CustomerID, SUM(Ammount) FROM table WHERE Amount > 0 GROUP BY CustomerID SELECT CustomerID, SUM(Ammount) FROM table GROUP BY CustomerID
However, I think you mean that you want a table that appears like this
Customer Payment HighPoint RunningTotal 123 5 5 5 123 5 10 10 123 -3 10 7
In which case I would create a view with the two selects above so that the view is something like.
SELECT CusotmerID, PaymentDate, Ammount, (SELECT SUM(Ammount) FROM table as ALIAS WHERE ALIAS.Amount > 0 AND ALIAS.PaymentDate <= PaymentDate AND ALIAS.CustomerID = CustomerID), (SELECT SUM(Ammount) FROM table as ALIAS WHERE ALIAS.CustomerID = CustomerID AND ALIAS.PaymentDate <= PaymentDate) FROM table
Also, you may consider a non-unique index on the Amount column of the table to speed up the view.
The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it’s probably not going to be fast.
If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial — but this is what data marts are for.
As an alternative to subqueries, you can use a running total query. Here’s how I set one up for this case. First create some test data:
create table #payments ( paymentid int identity, customerid int, paymentdate datetime, amount decimal ) insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00) insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00) insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00) insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00) insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00) insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00) insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00) insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)
Now you can execute the running total query, which calculates the balance for each customer after each payment:
select cur.customerid, cur.paymentdate, sum(prev.amount) from #payments cur inner join #payments prev on cur.customerid = prev.customerid and cur.paymentdate >= prev.paymentdate group by cur.customerid, cur.paymentdate
This generates data:
Customer Paymentdate Balance after payment 1 2009.01.01 1 1 2009.01.02 3 1 2009.01.03 2 1 2009.01.04 4 1 2009.01.05 1 2 2009.01.01 10 2 2009.01.02 5 2 2009.01.03 12
To look at the maximum, you can do a group by on the running total query:
select customerid, max(balance) from ( select cur.customerid, cur.paymentdate, balance = sum(prev.amount) from #payments cur inner join #payments prev on cur.customerid = prev.customerid and cur.paymentdate >= prev.paymentdate group by cur.customerid, cur.paymentdate ) runningtotal group by customerid
Customer Max balance 1 4 2 12
Hope this is useful.
list = list of amounts ordered by date foreach in list as amount running += amount if running >= high high = running
To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).
I don’t think you can do this type of thing without code (stored procedures are code)
like Andomar’s answer. You can do the running total for each payment. Then find the max peak payment…
with rt as ( select Payments.*, isnull(sum(p.Amount), 0) + Payments.Amount as running from Payments left outer join Payments p on Payments.CustomerID = p.CustomerID and p.PaymentDate <= Payments.PaymentDate and p.PaymentID < Payments.PaymentID ), highest as ( select CustomerID, PaymentID, running as peak_paid from rt where PaymentID = (select top 1 rt2.PaymentID from rt rt2 where rt2.CustomerID = rt.CustomerID order by rt2.running desc, rt2.PaymentDate, rt2.PaymentID) ) select *, (select sum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid from highest;
however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.
Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.