Most efficient way to split string into rows
I am using the following function to split a string into rows. It is much faster than the previous function that I was using, however I need to somehow churn through this data quicker (its an ETL job):
ALTER FUNCTION [dbo].[ArrayToTable] ( @InputString VARCHAR(MAX) = '' , @Delimitter VARCHAR(1) = ',' ) RETURNS @RESULT TABLE([Position] INT IDENTITY, [Value] VARCHAR(MAX)) AS BEGIN DECLARE @XML XML SELECT @XML = CONVERT(XML, SQL_TEXT) FROM ( SELECT '<root><item>' + REPLACE(@InputString, @Delimitter, '</item><item>') + '</item></root>' AS SQL_TEXT ) dt INSERT INTO @RESULT([Value]) SELECT t.col.query('.').value('.', 'VARCHAR(1000)') AS [Value] FROM @XML.nodes('root/item') t(col) RETURN END
Can anyone think of a better/quicker way to turn a delimited string into rows? I am using a
cross apply on my query to join to these results.
Can anyone think of a more efficient method to turn delimited strings into rows?
3 Solutions collect form web for “Most efficient way to split string into rows”
Here’s the most performant function I have:
CREATE FUNCTION [Resource].[udf_SplitByXml] (@Data NVARCHAR(MAX), @Delimiter NVARCHAR(5)) RETURNS @Table TABLE ( Data NVARCHAR(MAX) , SequentialOrder INT IDENTITY(1, 1)) AS BEGIN DECLARE @TextXml XML; SELECT @TextXml = CAST('<d>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@Data, '&', '&'), '<', '<'), '>', '>'), '"', '"'), '''', '''), @Delimiter, '</d><d>') + '</d>' AS XML); INSERT INTO @Table (Data) SELECT Data = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM(T.split.value('.', 'nvarchar(max)'))), '&', '&'), '<', '<'), '>', '>'), '"', '"'), ''', '''') FROM @TextXml.nodes('/d') T(Split) RETURN END
Here are sample calls you can use to test the results:
SELECT * FROM Resource.udf_SplitByXml('yes, no, maybe, so', ','); SELECT * FROM Resource.udf_SplitByXml('who|what|where|when|why|how|Uh, I don''t know!', '|'); SELECT * FROM Resource.udf_SplitByXml('Government, Education, Non-profit|Energy & Power|Yes|No', '|'); SELECT * FROM Resource.udf_SplitByXml('Energy & Power|Some<Thing>Wicked''This"Way Comes', '|');
Another option is to try out the CLR solution based on Adam Machanic’s code that was the winner of a performance test in this blog.
Here’s another function I just wrote that, to my surprise, is marginally faster than the XML method above, although just barely.
Comparing both functions to process 1,000 rows of delimited strings yields no performance difference. Processing 50K rows of delimited strings, the XML method takes 129 seconds to generate 435,217 extracted rows, compared to 122 seconds to generate extracted 435,217 rows for my simple string manipulation method.
So it is not really substantially faster, although I guess it would make a difference if you’re processing hundreds of thousands of rows. The main benefits of my function are that it is easy to read and understand, it is not dependent on XML functionality that may change in future versions of SQL Server, and it should be easily portable to any language. I was really hoping to find something really fast, but I guess this is the best we can do for now.
CREATE FUNCTION dbo.ufn_util_Split ( @RawText varchar(max), @SplitCharacter varchar(2) ) RETURNS @t_Results TABLE ( RowIndex int IDENTITY(1,1), RowValue varchar(max) ) AS BEGIN DECLARE @vc_RowValue varchar(max) = '' DECLARE @vc_Remainder varchar(max) = CASE WHEN RIGHT(@RawText,1) = @SplitCharacter THEN @RawText ELSE @RawText + @SplitCharacter END --the string must end in the split character in order for this to work DECLARE @int_SplitPosition int = 0 DECLARE @int_LenSplitChar int = 0 SELECT @int_LenSplitChar = LEN(@SplitCharacter) --determine the first segment to start with SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder) WHILE (@int_SplitPosition > 0) BEGIN SELECT @vc_RowValue = LEFT(@vc_Remainder,@int_SplitPosition-1) INSERT INTO @t_Results (RowValue) VALUES (@vc_RowValue) --now strip off the segment we just extracted and determine where the next segment ends, and continue SET @vc_Remainder = SUBSTRING(@vc_Remainder,@int_SplitPosition+@int_LenSplitChar,LEN(@vc_Remainder)) SELECT @int_SplitPosition = CHARINDEX(@SplitCharacter,@vc_Remainder) CONTINUE END RETURN END
Hi, Try this –
create procedure sp_getAllItems @input varchar(100) as BEGIN create table #tmpFruits (name varchar(10)) Declare @Qry Varchar(500) Set @Qry = '' Select @Qry = @Qry + ' Insert into #tmpFruits ' Select @Qry = @Qry + Replace( 'Select ''' + Replace(Replace(Replace(Replace(@input,CHAR(9),''),' ',''),CHAR(10),''),CHAR(13),'') , ',',''' Union Select ''') + '''' Exec (@Qry) select * from Fruitstest where name in (select name from #tmpFruits) drop table #tmpFruits END exec sp_getAllItems @input = 'cherry,banana'