Extracting codes from strings in SQL
I have a column which contains codes and descriptions and need to split them.
Multiple codes may or may not reside in the field.
"CSxxxxx description text here" "WSxxxxx description text here" "CSxxxxx, WSxxxxx, CSxxxxx description text here"
If there are multiple codes, then the description should be repeated, so the output for row 3 above would be 3 lines inserted into a secondary table.
The codes, for the moment, would always be in the format of:
I can write the regex to identify the first couple, but i’m not familiar with cutting this from the string and inserting to another table?
2 Solutions collect form web for “Extracting codes from strings in SQL”
Another option is with a CROSS APPLY in concert with a little XML as the parser (assuming not 2016)
Declare @YourTable Table (ID int,[SomeCol] varchar(100)) Insert Into @YourTable Values (1,'CS12345 description text here for id 1') ,(2,'WS67890 description text here for id 2') ,(3,'CS23456, WS34567, CS45678 description text here for id 3') ,(4,'No Codes Just a Desc') ;with cte as ( Select A.ID ,B.* From @YourTable A Cross Apply ( Select RetSeq = Row_Number() over (Order By (Select null)) ,RetVal = LTrim(RTrim(B.i.value('(./text())', 'varchar(max)'))) From (Select x = Cast('<x>' + replace(A.SomeCol,' ','</x><x>')+'</x>' as xml).query('.')) as A Cross Apply x.nodes('x') AS B(i) ) B ) Select ID ,Code = replace(RetVal,',','') ,Descr = Stuff((Select ' ' +RetVal From cte Where A.ID=ID and RetVal Not Like '%[0-9][0-9][0-9][0-9]%' Order By RetSeq For XML Path ('')),1,1,'') From cte A Where RetVal Like '%[0-9][0-9][0-9][0-9]%' Union All Select ID ,Code = '' ,SomeCol From @YourTable Where SomeCol not like '%[0-9][0-9][0-9][0-9]%'
ID Code Descr 1 CS12345 description text here for id 1 2 WS67890 description text here for id 2 3 CS23456 description text here for id 3 3 WS34567 description text here for id 3 3 CS45678 description text here for id 3 4 No Codes Just a Desc
The only way I can think of doing this “easily” is by splitting the string. SQL Server 2016 has a string split function. You can find one on the web by Googling “SQL Server split”.
Then it would be something like this:
select t.*, c.code from t outer apply (select s.code from dbo.split(textcol, ' ')) s(code) where replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]' or replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9]' ) c(code);
Note that this should work for the examples in your question. It might not work on all your data. Text is never clean.