How can I delete the columns in DataFlow Task in SSIS?

I use SQL Server 2016 and I have a very busy DataFlow task. In my DataFlow task, I use Multicast component for some reason. After creating a new Flow in my DataFlow, I need to delete some of the columns in the new flow because they are useless.

enter image description here

  • What are the advantages of a query using a derived table(s) over a query not using them?
  • How can I force a subquery to perform as well as a #temp table?
  • Microsoft SQL Server error with Laravel: Why this solves my mistake?
  • What are prepared statements? How are they different from dynamic sql?
  • Insert into with multiple subqueries as values
  • Convert varchar to float IF ISNUMERIC
  • Just for more information, I need to do that because I have more than 200 columns in my flow and I need less than 10 of those columns.

    How can I delete the columns in DataFlow Task in SSIS?

  • C#/SQL : backup and restore by copy and replace database files?
  • How do I generate CRUD stored procedures from a table in SQL Server Management Studio
  • Missing Enumerator in Foreach Loop Editor
  • PHP to SQL Server without ODBC or MSSQL support
  • RAND not different for every row in T-SQL UPDATE
  • MSSQL Select with “vertical”-where
  • 3 Solutions collect form web for “How can I delete the columns in DataFlow Task in SSIS?”

    You can add an extra component of some sort. However, this will never reduce complexity or improve performance. Just thinking about it, logically, you are adding an additional interface that needs to be maintained. Performance-wise, anything that will eliminate columns means copying one set of rows from one buffer to a whole other buffer. This is called an asynchronous transformation, and it is better described here and here. You can imagine that copying rows is less efficient than updating them in place.

    Here are some recommendations for reducing complexity, which will, in turn, improve performance:

    • Reduce the columns at the source. If you are selecting columns that
      are not subsequently used in any way, then remove them from the query
      or uncheck them from the source component. Removing columns in this way removes them from the buffer, which will occupy less memory.
    • Reduce the number of components in the dataflow. Very long dataflows are easy to create, a pain to test and even harder to maintain. Dataflows are expecting a unit of work, i.e. a data stream from here to there with a few things in the middle. This is where dataflows shine, in fact, they protect themselves from complexity with memory limitations and a max number of threads. It is better to divide the work into separate dataflows or stored procs. You could stage the data into a table and read it twice, rather than use a multicast, for example.
    • Use the database. SSIS is as much an orchestration tool as it is a data-moving tool. I have often found that using simple dataflows to stage the data, followed by calls to stored procedures to process the data, always out-performs an all-in-one dataflow.
    • Increase the number of times you write the data. This is completely counter intuitive, but if you process data in smaller sets of operations, it is faster running and easier to test. Given a clean slate, I will often design an ETL to write data from the source to a staging table, perform a cleansing step from the stage table to another, optionally, add a conforming step to combine data from different sources to yet another table and, finally, a last step to load a target table. Note that each source is pushed to its own target table and later combined, leveraging the database. The first and last steps are set up to run fast and avoid locking or blocking at either end.
    • Bulk Load. The prior step really does well, when you insure that bulk loading is happening. This can be a tricky thing, but generally you can get there by using “fast load” in the OLEDB destination and by never using the oledb command. Removing indexes and re-adding them is faster than loading in place (with few exceptions).

    These guidelines will get you headed in the general direction, but do post more questions for tuning specific performance problems.

    I believe that you can pass just one data flow path to a UNION ALL task to remove columns from that single data flow.

    Take the single data flow path that you would like to remove columns from and pass it to a Union All task. Then open up the Union All task right click on the column(s) you would like to remove from that path and select delete.

    Usually I think the source of the data should be altered to not send the unwanted columns out, but your case is special. With one path out of the multicast needing all of the columns from the source, while one path does not.

    First of all, i don’t think that what you are asking will give a better performance because the data is loaded from source then multiplied when using Multicast Then The component that will reduce the column number

    You can do this multiple way:

    1. If you can create another DataFlow Task with a Reduced columns source (ex: OLEDB command with specific columns) it is better

    2. You can add Script component with an Asynchronous Output (like shown in the image below) and add the specifid columns to the output, map them using a Vb.net or C# script, something like this:

      Output0Buffer.AddRow()
      Output0Budder.OutColumn = Row.inColumn
      

    enter image description here

    1. Add a UNION ALL component and select the columns you need

    Side Note: It is good to test each scenario performance and choose the better

    MS SQL Server is a Microsoft SQL Database product, include sql server standard, sql server management studio, sql server express and so on.