Extracting codes from strings in SQL

I have a column which contains codes and descriptions and need to split them.

Multiple codes may or may not reside in the field.

  • SELECT all N between @min and @max
  • sql server function that indicates if previous query returns results
  • Find top n closest places
  • What does exec sp_updatestats do?
  • Duplicating a TABLE using Microsoft SQL Server Mangement
  • Asp.net c# SQL Server Update query not working for all columns
  • Eg;

    "CSxxxxx description text here"
    
    "WSxxxxx description text here"
    
    "CSxxxxx, WSxxxxx, CSxxxxx description text here"
    

    If there are multiple codes, then the description should be repeated, so the output for row 3 above would be 3 lines inserted into a secondary table.

    The codes, for the moment, would always be in the format of:

    FixedLetter-FixedLetter-Number-Number-Number-Number-Number 
    

    OR

    FixedLetter-FixedLetter-Number-Number-Number-Number
    

    I can write the regex to identify the first couple, but i’m not familiar with cutting this from the string and inserting to another table?

    Thanks,
    Dave

  • Parametrized Query Using JDBC
  • exec sp_executesql @sql and exec (@sql) SQL Server
  • TSQL PIVOT MULTIPLE COLUMNS
  • SQL Server 2012 : extract Regex groups
  • C# windows Application attachmentFilename
  • SQL Server Table Fails to Update
  • 2 Solutions collect form web for “Extracting codes from strings in SQL”

    Another option is with a CROSS APPLY in concert with a little XML as the parser (assuming not 2016)

    Example

    Declare @YourTable Table (ID int,[SomeCol] varchar(100))
    Insert Into @YourTable Values 
     (1,'CS12345 description text here for id 1')
    ,(2,'WS67890 description text here for id 2')
    ,(3,'CS23456, WS34567, CS45678 description text here for id 3')
    ,(4,'No Codes Just a Desc')
    
     ;with cte as (
         Select A.ID
              ,B.*
         From @YourTable A
         Cross Apply (
                        Select RetSeq = Row_Number() over (Order By (Select null))
                              ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
                        From  (Select x = Cast('<x>' + replace(A.SomeCol,' ','</x><x>')+'</x>' as xml).query('.')) as A 
                        Cross Apply x.nodes('x') AS B(i)
                     ) B
     )
    Select ID
          ,Code  = replace(RetVal,',','')
          ,Descr = Stuff((Select ' ' +RetVal From cte Where A.ID=ID and RetVal Not Like '%[0-9][0-9][0-9][0-9]%' Order By RetSeq  For XML Path ('')),1,1,'')
     From  cte A
     Where RetVal Like '%[0-9][0-9][0-9][0-9]%'
    Union All
    Select ID
          ,Code = ''
          ,SomeCol
     From @YourTable
     Where SomeCol not like '%[0-9][0-9][0-9][0-9]%'
    

    Returns

    ID  Code      Descr
    1   CS12345   description text here for id 1
    2   WS67890   description text here for id 2
    3   CS23456   description text here for id 3
    3   WS34567   description text here for id 3
    3   CS45678   description text here for id 3
    4             No Codes Just a Desc
    

    The only way I can think of doing this “easily” is by splitting the string. SQL Server 2016 has a string split function. You can find one on the web by Googling “SQL Server split”.

    Then it would be something like this:

    select t.*, c.code
    from t outer apply
         (select s.code
          from dbo.split(textcol, ' ')) s(code)
          where replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]' or
                replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9]'
         ) c(code);
    

    Note that this should work for the examples in your question. It might not work on all your data. Text is never clean.

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.