how to overcome the limitation of IN cause in sql query
I have written an sql query like :
select field1, field2 from table_name;
The problem is this query will return 1 million records/ or more than 100k records.
I have a directory in which I have input files (around 20,000 to 50,000 records) that contain field1 . This is the main data I am concerned with.
Using perl script, I am extracting from the directory.
But , if I write a query like :
select field1 , field2 from table_name where field1 in (need to write a query to take field1 from directory);
If I use IN cause then it has limitation of processing 1000 entries, then how should I overcome the limitation of IN cause?
4 Solutions collect form web for “how to overcome the limitation of IN cause in sql query”
In any DBMS, I would insert them into a temporary table and perform a
JOIN to workaround the IN clause limitation on the size of the list.
CREATE TABLE #idList ( ID INT ) INSERT INTO #idList VALUES(1) INSERT INTO #idList VALUES(2) INSERT INTO #idList VALUES(3) SELECT * FROM MyTable m JOIN #idList AS t ON m.id = t.id
In SQL Server 2005, in one of our previous projects, we used to convert this list of values that are a result of querying another data store (lucene index) into XML and pass it as XML variable in the SQL query and convert it into a table using the
nodes() function on XML data types and perform a
JOIN with that.
DECLARE @IdList XML SELECT @idList = ' <Requests> <Request id="1" /> <Request id="2" /> <Request id="3" /> </Requests>' SELECT * FROM MyTable m JOIN ( SELECT id.value('(@id)', 'INT') as 'id' FROM @idList.nodes('/Requests/Request') as T(id) ) AS t ON m.id = t.id
Vikdor is right, you shouldn’t be querying this with an
IN() clause, it’s faster and more memory efficient to use a table to
Expanding on his answer I would recommend the following approach:
- Get a list of all input files via Perl
- Think of some clever way to compute a hash value for your list that is unique and based on all input files (I’d recommend the filenames or similar)
- This hash will serve as the name of the table that stores the input filenames (think of it as a quasi temporary table that gets discarded once the hash changes)
JOINthat table to return the correct records
For step 2. you could either use a cronjob or compute whenever the query is actually needed (which would delay the response, though). To get this right you need to consider how likely it is that files are added/removed.
For step 3. you would need some logic that drops the previously generated tables once the current hash value differs from last execution, then recreate the table named after the current hash.
For the quasi temporary table names I’d recommend something along the lines of
input_files_XXX (.i.e. prefix_<hashvalue>)
which makes it easier to know what stale tables to drop.
You could split your 50’000 ids in 50 lists of 1000 ids, do a query for each such list, and collect the result sets in perl.
Oracle wise, the best solution with using a temporary table – which without indexing won’t give you much performance is to use a nested tabled type.
CREATE TYPE my_ntt is table of directory_rec;
Then create a function f1 that returns a variable of my_ntt type and use in the query.
select field1 , field2 from table_name where field1 in table (cast (f1 as my_ntt));