How do I remove extended ASCII characters from a string in T-SQL?

I need to filter out (remove) extended ASCII characters from a SELECT statement in T-SQL.

I’m using a stored procedure to do so.

  • What is the purpose of system table master..spt_values and what are the meanings of its values?
  • Good book to learn Server Management Objects (SMO) for C# and SQL Server 2008?
  • MS SQL equivalent to MySQL user defined variables in queries
  • What's the better way to structure this data within a database?
  • What is the difference between Scope_Identity(), Identity(), @@Identity, and Ident_Current()?
  • Connecting SQL Server 2008 to Java: Login failed for user error
  • Expected input:

    ËËËËeeeeËËËË
    

    Expected output:

    eeee
    

    All that I’ve found is for MySQL.

    I’m using :

    Microsoft SQL Server Management Studio  11.0.2100.60
    Microsoft .NET Framework    4.0.30319.17929
    

    5 Solutions collect form web for “How do I remove extended ASCII characters from a string in T-SQL?”

    OK, give this a try. It seems the same issue they have. Anyway you need to modify it based on your requirements.

    CREATE FUNCTION RemoveNonASCII 
    (
        @nstring nvarchar(255)
    )
    RETURNS varchar(255)
    AS
    BEGIN
    
        DECLARE @Result varchar(255)
        SET @Result = ''
    
        DECLARE @nchar nvarchar(1)
        DECLARE @position int
    
        SET @position = 1
        WHILE @position <= LEN(@nstring)
        BEGIN
            SET @nchar = SUBSTRING(@nstring, @position, 1)
            --Unicode & ASCII are the same from 1 to 255.
            --Only Unicode goes beyond 255
            --0 to 31 are non-printable characters
            IF UNICODE(@nchar) between 32 and 255
                SET @Result = @Result + @nchar
            SET @position = @position + 1
        END
    
        RETURN @Result
    
    END
    GO
    

    Check it out at SqlServerCentral

    The accepted answer is using a loop which should be avoided…

    My solution is completely inlineable, it’s easy to create an UDF (or maybe even better: an inline TVF) from this.

    The idea: Create a set of running numbers (here it’s limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.

    DECLARE @tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
    INSERT INTO @tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');
    
    WITH RunningNumbers AS
    (
        SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
        FROM sys.objects
    )
    ,SingleChars AS
    (
        SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
        FROM @tbl AS tbl
        CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn 
    )
    SELECT ID,EvilString
          ,(
            SELECT '' + Chr 
            FROM SingleChars AS sc
            WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
            ORDER BY sc.Nmbr
            FOR XML PATH('')
          ) AS GoodString
    FROM @tbl As tbl
    

    The result

    1   ËËËËeeeeËËËË    eeee
    2   ËaËËbËeeeeËËËcË abeeeec
    

    Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

    Thank you for sharing your code.

    I needed something like it, not just for cleaning up e-mail addresses but for general purpose, filtering user web site input before it reaches SAP ERP via integration modules.

    Running in Brazil, it must obey language accents…

    Here goes the resulting code.

    Maybe it can help someone someday, like it did to me.

         IF EXISTS
         (
             SELECT * 
               FROM sys.objects 
              WHERE object_id = OBJECT_ID(N'[dbo].[fnRemoveNonASCII]') 
                AND type IN (N'FN')
         )
             DROP FUNCTION dbo.fnRemoveNonASCII
         GO
    
         CREATE FUNCTION [dbo].[fnRemoveNonASCII] 
         (
             @nstring nvarchar(MAX) 
         )
         RETURNS nvarchar(MAX)
         AS
         BEGIN
    
             DECLARE @nchar nvarchar(1)      -- individual char in string
             DECLARE @nUnicode nvarchar(3)   -- ASCII for individual char in string
             DECLARE @position int           -- subscript to control loop in the string
             DECLARE @Result nvarchar(MAX)   -- return valus
    
             SET @Result = ''
             SET @position = 1
    
             WHILE @position <= LEN(@nstring)
    
                BEGIN
    
                    --Unicode & ASCII are the same from 1 to 255.
                    --Only Unicode goes beyond 255
                    --0 to 31 are non-printable characters
    
                    SET @nchar = SUBSTRING(@nstring, @position, 1)
    
                    SET @nUnicode = UNICODE(@nChar) 
    
                    IF @nUnicode = 10 
                       OR @nUnicode = 13
                       OR @nUnicode BETWEEN 32  AND 126
                       OR @nUnicode = 160
                       OR @nUnicode BETWEEN 192 AND 207
                       OR @nUnicode BETWEEN 210 AND 213
                       OR @nUnicode BETWEEN 217 AND 219
                       OR @nUnicode BETWEEN 224 AND 227
                       OR @nUnicode BETWEEN 231 AND 234
                       OR @nUnicode = 236
                       OR @nUnicode = 237
                       OR @nUnicode BETWEEN 242 AND 245
                       OR @nUnicode = 247
                       OR @nUnicode = 249
                       OR @nUnicode = 250        
    
                       SET @Result = @Result + @nchar
    
                    ELSE IF @nUnicode = 9 -- TAB
                       SET @Result =  @Result + '  '
    
                    ELSE
                       SET @Result =  @Result + ' '
    
                    SET @position = @position + 1
             END
    
             RETURN @Result
         END
    
         /*
         ---------------------------------------------------------------------------------------------------------------
    
         -- Tabela dos caracteres Unicode/ASCII exportáveis
    
         &#9;   |   &#x9;      |    %9     =    TAB
         &#10;     |    &#xa;      |    %a     =    0A   Line Feed
         &#13;     |    &#xd;      |    %d     =    0D   Carriage Return
         &#32;      |   &#x20;  |   %20 =   <space>  
         &#33;     |    &#x21;  |   %21 =   !            
         &#34;     |    &#x22;  |   %22 =   "            
         &#35;     |    &#x23;  |   %23 =   #            
         &#36;     |    &#x24;  |   %24 =   $            
         &#37;     |    &#x25;  |   %25 =   %            
         &#38;     |    &#x26;  |   %26 =   &            
         &#39;     |    &#x27;  |   %27 =   '            
         &#40;     |    &#x28;  |   %28 =   (            
         &#41;     |    &#x29;  |   %29 =   )            
         &#42;     |    &#x2a;  |   %2a =   *            
         &#43;     |    &#x2b;  |   %2b =   +            
         &#44;     |    &#x2c;  |   %2c =   ,            
         &#45;     |    &#x2d;  |   %2d =   -            
         &#46;     |    &#x2e;  |   %2e =   .            
         &#47;     |    &#x2f;  |   %2f =   /            
         &#48;     |    &#x30;  |   %30 =   0            
         &#49;     |    &#x31;  |   %31 =   1            
         &#50;     |    &#x32;  |   %32 =   2            
         &#51;     |    &#x33;  |   %33 =   3            
         &#52;     |    &#x34;  |   %34 =   4            
         &#53;     |    &#x35;  |   %35 =   5            
         &#54;     |    &#x36;  |   %36 =   6            
         &#55;     |    &#x37;  |   %37 =   7            
         &#56;     |    &#x38;  |   %38 =   8            
         &#57;     |    &#x39;  |   %39 =   9            
         &#58;     |    &#x3a;  |   %3a =   :            
         &#59;     |    &#x3b;  |   %3b =   ;            
         &#60;     |    &#x3c;  |   %3c =   <            
         &#61;     |    &#x3d;  |   %3d =   =            
         &#62;     |    &#x3e;  |   %3e =   >            
         &#63;     |    &#x3f;  |   %3f =   ?            
         &#64;     |    &#x40;  |   %40 =   @   
         &#65;     |    &#x41;  |   %41 =   A
         &#66;     |    &#x42;  |   %42 =   B
         &#67;     |    &#x43;  |   %43 =   C
         &#68;     |    &#x44;  |   %44 =   D
         &#69;     |    &#x45;  |   %45 =   E
         &#70;     |    &#x46;  |   %46 =   F
         &#71;     |    &#x47;  |   %47 =   G
         &#72;     |    &#x48;  |   %48 =   H
         &#73;     |    &#x49;  |   %49 =   I
         &#74;     |    &#x4a;  |   %4a =   J
         &#75;     |    &#x4b;  |   %4b =   K
         &#76;     |    &#x4c;  |   %4c =   L
         &#77;     |    &#x4d;  |   %4d =   M
         &#78;     |    &#x4e;  |   %4e =   N
         &#79;     |    &#x4f;  |   %4f =   O 
         &#80;     |    &#x50;  |   %50 =   P
         &#81;     |    &#x51;  |   %51 =   Q
         &#82;     |    &#x52;  |   %52 =   R
         &#83;     |    &#x53;  |   %53 =   S
         &#84;     |    &#x54;  |   %54 =   T
         &#85;     |    &#x55;  |   %55 =   U
         &#86;     |    &#x56;  |   %56 =   V
         &#87;     |    &#x57;  |   %57 =   W
         &#88;     |    &#x58;  |   %58 =   X
         &#89;     |    &#x59;  |   %59 =   Y
         &#90;     |    &#x5a;  |   %5a =   Z
         &#91;     |    &#x5b;  |   %5b =   [
         &#92;     |    &#x5c;  |   %5c =   \
         &#93;     |    &#x5d;  |   %5d =   ]
         &#94;     |    &#x5e;  |   %5e =   ^
         &#95;     |    &#x5f;  |   %5f =   _
         &#96;     |    &#x60;  |   %60 =   `
         &#97;     |    &#x61;  |   %61 =   a
         &#98;     |    &#x62;  |   %62 =   b
         &#99;     |    &#x63;  |   %63 =   c
         &#100; |   &#x64;  |   %64 =   d
         &#101; |   &#x65;  |   %65 =   e
         &#102; |   &#x66;  |   %66 =   f
         &#103; |   &#x67;  |   %67 =   g
         &#104; |   &#x68;  |   %68 =   h
         &#105; |   &#x69;  |   %69 =   i
         &#106; |   &#x6a;  |   %6a =   j
         &#107; |   &#x6b;  |   %6b =   k
         &#108; |   &#x6c;  |   %6c =   l
         &#109; |   &#x6d;  |   %6d =   m
         &#110; |   &#x6e;  |   %6e =   n
         &#111; |   &#x6f;  |   %6f =   o
         &#112; |   &#x70;  |   %70 =   p
         &#113; |   &#x71;  |   %71 =   q
         &#114; |   &#x72;  |   %72 =   r
         &#115; |   &#x73;  |   %73 =   s
         &#116; |   &#x74;  |   %74 =   t
         &#117; |   &#x75;  |   %75 =   u
         &#118; |   &#x76;  |   %76 =   v
         &#119; |   &#x77;  |   %77 =   w
         &#120; |   &#x78;  |   %78 =   x
         &#121; |   &#x79;  |   %79 =   y
         &#122; |   &#x7a;  |   %7a =   z
         &#123; |   &#x7b;  |   %7b =   {
         &#124; |   &#x7c;  |   %7c =   |
         &#125; |   &#x7d;  |   %7d =   }
         &#126; |   &#x7e;  |   %7e =   ~
         &#160; |   &#xa0;  |   %a0 =   <nbsp>      
         &#192; |   &#xc0;  |   %c0 =   À
         &#193; |   &#xc1;  |   %c1 =   Á
         &#194; |   &#xc2;  |   %c2 =   Â
         &#195; |   &#xc3;  |   %c3 =   Ã
         &#196; |   &#xc4;  |   %c4 =   Ä
         &#197; |   &#xc5;  |   %c5 =   Å
         &#198; |   &#xc6;  |   %c6 =   Æ
         &#199; |   &#xc7;  |   %c7 =   Ç
         &#200; |   &#xc8;  |   %c8 =   È
         &#201; |   &#xc9;  |   %c9 =   É
         &#202; |   &#xca;  |   %ca =   Ê
         &#203; |   &#xcb;  |   %cb =   Ë
         &#204; |   &#xcc;  |   %cc =   Ì
         &#205; |   &#xcd;  |   %cd =   Í
         &#206; |   &#xce;  |   %ce =   Î
         &#207; |   &#xcf;  |   %cf =   Ï
         &#210; |   &#xd2;  |   %d2 =   Ò
         &#211; |   &#xd3;  |   %d3 =   Ó
         &#212; |   &#xd4;  |   %d4 =   Ô
         &#213; |   &#xd5;  |   %d5 =   Õ
         &#217; |   &#xd9;  |   %d9 =   Ù
         &#218; |   &#xda;  |   %da =   Ú
         &#219; |   &#xdb;  |   %db =   Û
         &#224; |   &#xe0;  |   %e0 =   à
         &#225; |   &#xe1;  |   %e1 =   á
         &#226; |   &#xe2;  |   %e2 =   â
         &#227; |   &#xe3;  |   %e3 =   ã
         &#231; |   &#xe7;  |   %e7 =   ç
         &#232; |   &#xe8;  |   %e8 =   è
         &#233; |   &#xe9;  |   %e9 =   é
         &#234; |   &#xea;  |   %ea =   ê
         &#236; |   &#xec;  |   %ec =   ì
         &#237; |   &#xed;  |   %ed =   í
         &#242; |   &#xf2;  |   %f2 =   ò
         &#243; |   &#xf3;  |   %f3 =   ó
         &#244; |   &#xf4;  |   %f4 =   ô
         &#245; |   &#xf5;  |   %f5 =   õ
         &#247; |   &#xf7;  |   %f7 =   ÷
         &#249; |   &#xf9;  |   %f9 =   ù
         &#250; |   &#xfa;  |   %fa =   ú
         */
    
         GO     
    

    Just correcting above code (it was cutting DOTs)

    CREATE FUNCTION [dbo].[RemoveNonASCII] 
    (
        @nstring nvarchar(255)
    )
    RETURNS nvarchar(255)
    AS
    BEGIN
    
        DECLARE @Result nvarchar(255)
        SET @Result = ''
    
        DECLARE @nchar nvarchar(1)
        DECLARE @position int
    
        SET @position = 1
        WHILE @position <= LEN(@nstring)
        BEGIN
            SET @nchar = SUBSTRING(@nstring, @position, 1)
            --Unicode & ASCII are the same from 1 to 255.
            --Only Unicode goes beyond 255
            --0 to 31 are non-printable characters
            IF (UNICODE(@nchar) between 192 and 198) or (UNICODE(@nchar) between 225 and 230) -- letter A or a with accents
                SET @nchar = 'a'
            IF (UNICODE(@nchar) between 200 and 203) or (UNICODE(@nchar) between 232 and 235) -- letter E or e with accents
                SET @nchar = 'e'
            IF (UNICODE(@nchar) between 204 and 207) or (UNICODE(@nchar) between 236 and 239) -- letter I or i with accents
                SET @nchar = 'i'
            IF (UNICODE(@nchar) between 210 and 214) or (UNICODE(@nchar) between 242 and 246) or (UNICODE(@nchar)=240) -- letter O or o with accents
                SET @nchar = 'o'
            IF (UNICODE(@nchar) between 217 and 220) or (UNICODE(@nchar) between 249 and 252)  -- letter U or u with accents
                SET @nchar = 'u'
            IF (UNICODE(@nchar)=199)  or (UNICODE(@nchar)=231)  -- letter Ç or ç 
                SET @nchar = 'c'
            IF (UNICODE(@nchar)=209)  or (UNICODE(@nchar)=241)  -- letter Ñ or ñ 
                SET @nchar = 'n'
            IF (UNICODE(@nchar) between 45 and 46) or (UNICODE(@nchar) between 48 and 57) or (UNICODE(@nchar)  between 64 and 90) or (UNICODE(@nchar) = 95) or (UNICODE(@nchar)  between 97 and 122)
                SET @Result = @Result + @nchar
            SET @position = @position + 1
        END
        set @Result = lower(@Result) -- e-mails in lower case
        RETURN @Result
    
    END
    

    The solution provided by amit kohan works, but was to slow for me. With large string data, the iteration trough each character is not optimal. I’ve made the following function, with isn’t very compact but fast.

    CREATE FUNCTIONdbo.F_StripLowAscii
    (
       @Name nvarchar(max)
    )
    RETURN nvarchar(max) as
    BEGIN
    DECLARE @Result nvarchar(max) 
    
        If @Name IS NULL 
          RETURN @Name 
    
        DECLARE @BlankRange VARCHAR(15) 
        DECLARE @FoundAt INTEGER 
    
        -- ASCII CHAR #0 needs a special treatment
        SET @BlankRange = '%[' + CHAR(0) + ']%' 
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        WHILE @FoundAt > 0 
        BEGIN 
            SET @name = left(@name, @FoundAt-1 ) + SUBSTRING(@name, @FoundAt+1, LEN(@Name))
            SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        END 
    
    
        SET @BlankRange = '%[' + CHAR(1)+'-'+CHAR(8) + ']%' 
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        WHILE @FoundAt > 0 
        BEGIN 
            SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
            SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        END 
    
        SET @BlankRange = '%[' + CHAR(11)+'-'+CHAR(12) + ']%' 
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        WHILE @FoundAt > 0 
        BEGIN 
            SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
            SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        END 
    
        SET @BlankRange = '%[' + CHAR(14)+'-'+CHAR(31) + ']%' 
        SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        WHILE @FoundAt > 0 
        BEGIN 
            SET @name = Replace(@Name, SUBSTRING(@Name, @FoundAt,1),'')
            SET @FoundAt = PATINDEX(@BlankRange ,@Name COLLATE SQL_Latin1_General_CP850_Bin) 
        END 
    
        RETURN @Name 
    
    END
    GO
    
    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.