how to overcome the limitation of IN cause in sql query

I have written an sql query like :

select field1, field2 from table_name;

The problem is this query will return 1 million records/ or more than 100k records.
I have a directory in which I have input files (around 20,000 to 50,000 records) that contain field1 . This is the main data I am concerned with.
Using perl script, I am extracting from the directory.
But , if I write a query like :

  • Creating a SQL Server table from a C# datatable
  • SQL NOT IN not working
  • SQL Sum MTD & YTD
  • INSERT IF NOT EXISTS but return the identity either way
  • Table dependencies SQL Server
  • Display a ConnectionString dialog
  • select field1 , field2 from table_name 
    where field1 in (need to write a query to take field1 from directory);
    

    If I use IN cause then it has limitation of processing 1000 entries, then how should I overcome the limitation of IN cause?

  • T-SQL - create partition function and scheme - SQL Server 2008
  • How can I use SUM for bit columns?
  • How to create Temp table with SELECT * INTO tempTable FROM CTE Query
  • How to read in a text file from a stored procedure
  • Hierarchical query in SQL Server 2005
  • In Sql Server 2014 ORDER BY clause with OFFSET FETCH NEXT returns weird results
  • 4 Solutions collect form web for “how to overcome the limitation of IN cause in sql query”

    In any DBMS, I would insert them into a temporary table and perform a JOIN to workaround the IN clause limitation on the size of the list.

    E.g.

    CREATE TABLE #idList
    (
        ID INT
    )
    INSERT INTO #idList VALUES(1)
    INSERT INTO #idList VALUES(2)
    INSERT INTO #idList VALUES(3)
    
    SELECT * 
    FROM 
        MyTable m 
        JOIN #idList AS t 
        ON m.id = t.id
    

    In SQL Server 2005, in one of our previous projects, we used to convert this list of values that are a result of querying another data store (lucene index) into XML and pass it as XML variable in the SQL query and convert it into a table using the nodes() function on XML data types and perform a JOIN with that.

    DECLARE @IdList XML
    SELECT @idList = '
    <Requests>
        <Request id="1" />
        <Request id="2" />
        <Request id="3" />
    </Requests>'
    
    SELECT * 
    FROM 
        MyTable m 
        JOIN (
                SELECT id.value('(@id)[1]', 'INT') as 'id' 
                FROM @idList.nodes('/Requests/Request') as T(id)
             ) AS t 
        ON m.id = t.id
    

    Vikdor is right, you shouldn’t be querying this with an IN() clause, it’s faster and more memory efficient to use a table to JOIN.

    Expanding on his answer I would recommend the following approach:

    1. Get a list of all input files via Perl
    2. Think of some clever way to compute a hash value for your list that is unique and based on all input files (I’d recommend the filenames or similar)
    3. This hash will serve as the name of the table that stores the input filenames (think of it as a quasi temporary table that gets discarded once the hash changes)
    4. JOIN that table to return the correct records

    For step 2. you could either use a cronjob or compute whenever the query is actually needed (which would delay the response, though). To get this right you need to consider how likely it is that files are added/removed.

    For step 3. you would need some logic that drops the previously generated tables once the current hash value differs from last execution, then recreate the table named after the current hash.

    For the quasi temporary table names I’d recommend something along the lines of

    input_files_XXX (.i.e. prefix_<hashvalue>)
    

    which makes it easier to know what stale tables to drop.

    You could split your 50’000 ids in 50 lists of 1000 ids, do a query for each such list, and collect the result sets in perl.

    Oracle wise, the best solution with using a temporary table – which without indexing won’t give you much performance is to use a nested tabled type.

    CREATE TYPE my_ntt is table of directory_rec;

    Then create a function f1 that returns a variable of my_ntt type and use in the query.

    select field1 , field2 from table_name where field1 in table (cast (f1 as my_ntt));

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.