how to overcome the limitation of IN cause in sql query

I have written an sql query like :

select field1, field2 from table_name;

The problem is this query will return 1 million records/ or more than 100k records.
I have a directory in which I have input files (around 20,000 to 50,000 records) that contain field1 . This is the main data I am concerned with.
Using perl script, I am extracting from the directory.
But , if I write a query like :

  • SQL Server: Convert Between UTC and Local Time Precisely
  • How to return all the free periods of time from an appointment set
  • Table Per Concrete Type (TPC) Inheritance in Entity Framework 6 (EF6)
  • SQL Server 2008 - HashBytes computed column
  • How to Bulk insert with a dynamic value for a column
  • Is there any difference between DateTime in c# and DateTime in SQL server?
  • select field1 , field2 from table_name 
    where field1 in (need to write a query to take field1 from directory);
    

    If I use IN cause then it has limitation of processing 1000 entries, then how should I overcome the limitation of IN cause?

  • How do i connect to MSSQL database from php
  • Separation of table based on a trey component in sql server
  • Very slow SQL join, with limited amount of data
  • SQL Server Rounding Issue Looking for Explanation
  • How to add columns in select list SQL which are rows of other table : SQL Server 2008
  • EXEC sp_who2 LastBatch showing future date
  • 4 Solutions collect form web for “how to overcome the limitation of IN cause in sql query”

    In any DBMS, I would insert them into a temporary table and perform a JOIN to workaround the IN clause limitation on the size of the list.

    E.g.

    CREATE TABLE #idList
    (
        ID INT
    )
    INSERT INTO #idList VALUES(1)
    INSERT INTO #idList VALUES(2)
    INSERT INTO #idList VALUES(3)
    
    SELECT * 
    FROM 
        MyTable m 
        JOIN #idList AS t 
        ON m.id = t.id
    

    In SQL Server 2005, in one of our previous projects, we used to convert this list of values that are a result of querying another data store (lucene index) into XML and pass it as XML variable in the SQL query and convert it into a table using the nodes() function on XML data types and perform a JOIN with that.

    DECLARE @IdList XML
    SELECT @idList = '
    <Requests>
        <Request id="1" />
        <Request id="2" />
        <Request id="3" />
    </Requests>'
    
    SELECT * 
    FROM 
        MyTable m 
        JOIN (
                SELECT id.value('(@id)[1]', 'INT') as 'id' 
                FROM @idList.nodes('/Requests/Request') as T(id)
             ) AS t 
        ON m.id = t.id
    

    Vikdor is right, you shouldn’t be querying this with an IN() clause, it’s faster and more memory efficient to use a table to JOIN.

    Expanding on his answer I would recommend the following approach:

    1. Get a list of all input files via Perl
    2. Think of some clever way to compute a hash value for your list that is unique and based on all input files (I’d recommend the filenames or similar)
    3. This hash will serve as the name of the table that stores the input filenames (think of it as a quasi temporary table that gets discarded once the hash changes)
    4. JOIN that table to return the correct records

    For step 2. you could either use a cronjob or compute whenever the query is actually needed (which would delay the response, though). To get this right you need to consider how likely it is that files are added/removed.

    For step 3. you would need some logic that drops the previously generated tables once the current hash value differs from last execution, then recreate the table named after the current hash.

    For the quasi temporary table names I’d recommend something along the lines of

    input_files_XXX (.i.e. prefix_<hashvalue>)
    

    which makes it easier to know what stale tables to drop.

    You could split your 50’000 ids in 50 lists of 1000 ids, do a query for each such list, and collect the result sets in perl.

    Oracle wise, the best solution with using a temporary table – which without indexing won’t give you much performance is to use a nested tabled type.

    CREATE TYPE my_ntt is table of directory_rec;

    Then create a function f1 that returns a variable of my_ntt type and use in the query.

    select field1 , field2 from table_name where field1 in table (cast (f1 as my_ntt));

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.