Update all but one of duplicate records in table in SQL Server

I have a SQL Server table that has duplicate entries in one of the columns (object_id) e.g.:

+----+-----------+------------+
| id | object_id | status_val |
+----+-----------+------------+
|  1 |         1 |          0 | 
|  2 |         1 |          0 | 
|  3 |         1 |          0 | 
|  4 |         2 |          0 | 
|  5 |         3 |          0 | 
|  6 |         3 |          0 | 
+----+-----------+------------+

I need to update all of their statuses except for one when there is duplication in the object_id column. So in the table above object_id 1 and 3 are duplicated. So I would want to change their status_val to 2, except one of the entries. The result would look like:

  • Self-Join with missing rows on both sides
  • Store procedure for a select statement which should take input
  • How to detect that trigger was created with ExecuteNonQuery
  • String sql is not in correct format insertion fails using vb.net?How
  • Accessing to SQL Server Filetable from IIS APP POOL
  • Error when trying to connect a web service with a database
  • | id | object_id | status_val |
    +----+-----------+------------+
    |  1 |         1 |          0 | 
    |  2 |         1 |          2 | 
    |  3 |         1 |          2 | 
    |  4 |         2 |          0 | 
    |  5 |         3 |          0 | 
    |  6 |         3 |          2 | 
    +----+-----------+------------+
    

    It doesn’t matter which one of the duplicated rows has it’s status updated.

    Any help would be appreciated.

    3 Solutions collect form web for “Update all but one of duplicate records in table in SQL Server”

    You may solve this problem without a join, which means it should have better performance. The idea is to group the data by your object_id, counting the row number of each object_id. This is what “partition by” does. Then you can update where the row_num is > 1. This will update all duplicated object_id except the first one!

    update t set t.status_val = 'some_status' 
    from (
        select *, row_number() over(partition by object_id order by (select null)) row_num  
        from foo
    ) t 
    where row_num > 1 
    

    On a test table of 82944 records, the performance was such (your mileage may vary!):
    Table ‘test’. Scan count 5, logical reads 82283, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    CPU time = 141 ms, elapsed time = 150 ms.

    We can certainly also solve this problem by using an inner join, however, in general this should lead to more logical reads and higher CPU:

    Table ‘test’. Scan count 10, logical reads 83622, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table ‘Worktable’. Scan count 4, logical reads 167426, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    CPU time = 342 ms, elapsed time = 233 ms.

    To loop over the results and update in smaller batches:

    declare @rowcount int = 1;
    declare @batch_size int = 1000;
    
    while @rowcount > 0 
    begin
        update top(@batch_size) t set t.status_val = 'already updated'
        from (
            select *, row_number() over(partition by object_id order by (select null)) row_num  
            from foo
            where status_val <> 'already updated' 
        ) t 
        where row_num > 1 
        set @rowcount = @@rowcount;
    end
    

    This will help keep locking down if other concurrent sessions are trying to access this table.

    UPDATE Table
    SET Table.status_val = '2'
    FROM Table
    INNER JOIN
    (SELECT id, row_number()OVER(PARTITION BY object_id ORDER BY id) as seq FROM Table) other_table
    ON Table.id = other_table.id AND seq <> 1
    

    According to your question, it seems that for each value of object_id, you want to keep the status_val = 0 for the object_id with lowest id and = 2 for the others. If that is indeed the case, and IF an object_id repeats only a maximum of 3 times, then I have a very simple solution for you. Use the modulo or remainder operator to get what you want. Here is the answer which I will explain later:

    update [MyTable]
    set status_val = 2
    where (id%3) != 1
    

    When you divide any value of id by 3, the remainder can be only 0,1 or 2. So, for each object_id where id%3 is not 1, we change the status_val to 2.

    Before executing the above code, see the output of this query –

    select id, (id%3) as flg, object_id, status_val 
    from MyTable
    
    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.