Thursday, 6 November 2014

SSRS Page Breaks


In some reports, you may want to place a page break at the end of a specified number of rows instead of, or in addition to, on groups or report items. To do this, create a group that contains the groups or detail records you want, add a page break to the group, and then add a group expression to group by a specified number of rows.

The following expression, when placed in the group expression, assigns a number to each set of 25 rows. When a page break is defined for the group, this expression results in a page break every 25 rows.

=Ceiling(RowNumber(Nothing)/25)

To allow the user to set a value for the number of rows per page, create a parameter named RowsPerPage and base the group expression on the parameter, as shown in the following expression:

=Ceiling(RowNumber(Nothing)/Parameters!RowsPerPage.Value)

SQL DataTypes

SQL: Heap Table


Hi All,

Last week, I attended a session on Index Architecture and asked about Heap table and Indexed table.

What is Heap Table?

In simple term - It can be define as a table created without any indexes. A table created and doesn't has any indexes on the same.

Then what is the uses of Heap Table:

A Heap Table helps to improve the performance of INSERTING data by using INSERT statements while loading data in to a table in a data-ware house or any data mart or a database.

Why it improve performance:

Beacause, while data is loading in to a heap table, it doesn't need to insert data in a order. 
I mean, Data is not stored in any particular order.Specific data can not be retrieved quickly, unless there are also non-clustered indexes. Since there is no clustered index, additional time is not needed to maintain the index.Since there is no clustered index, there is not the need for additional space to store the clustered index tree.

Then what about Indexed Table or Clustered Table:

Data is stored in order based on the clustered index key.Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns. Data pages are linked for faster sequential access.

SQL - CTE vs TEMP tables


Temp Tables in SQL Server:

  1. Are real materialized tables that exists in tempdb
  2. can be indexed
  3. can have constraints
  4. Persists for the life of the current connection
  5. Can be referenced by the queries or sub procedures
  6. Have dedicated statistics generated by sql engine
  7. A table created on disk
  8. Use temp tables for longer or larger queries data.
  9. Table name looks like #Tablename
CTE - Common Table Expressions:
  1. CTE is an expression that can be thought of as a temporary result set which is defined   within the execution of a single SQL statement. A CTE is similar to derived table in that it is not stored as an object and lasts only for the duration of the query.
  2. Using CTE improves the readability and makes maintenance of complex queries easy
  3. The query can be divided in to separate, simple, logical buildings blocks which an be then used to build more complex CTE's until final result set is generated
  4. CTE can be defined in functions, SP's, Triggers and views
  5. After CTE is defined, it can be used as a table or a view and can select,Insert,Update or Delete
  6. No additional statistics stored in sql server engine
  7. No indexes
  8. Cannot have constraints
  9. Are essentially disposible views
  10. Persist only until the next query is run
  11. Can be recursive
  12. CTE's and Tbale Variables can be use for small dataset
  13. Can be substituted for a view
  14. Can reference itself multiple times

SSIS - Tuning SSIS Data Flow


Dear All,

I have found couple of points to improve data flow task level.I would like to describe below:

Data Flow Performance:

I do consider tuning at ETL( Extract, Transform and Load) sections.

Extraction Level:-

Network Tuning: 
1. Change the network packet size in the connection manager
  •       Higher values typically yield fast through put
  •       By Default, it will be ZERO, you can change MAX value as: 32767



2. Try to experiment with shared memory vs TCP/IP
3. Enable JUMBO frames on network. To do this please consult your network specialists
4. If you have OLEDB Command statements, then try to create another connection manager with     low package size.( Not go with 32767 max value )

SQL Queries:
1. Consider using NOLOCK hint on table in your source query. 
  •     Removes locking overhead
  •     Improves the speed of large tables scan
  •     Risky side effects
  •     Understand before using hints in the source query
2. SELECT query with only selected columns
3.Do require conversions in the source query, don't go with Data Conversion T/F later.

LookUp:-
1.Changes SELECT statement to only use the columns you need. So that, it optimizes memory        usage
2.Consider adding NOLOCK hint.
3.In SSIS 2008, use shared lookup cache. Create cache transform and cache connection manager
4.Use SQL query for reference table
5.Use WHERE condition in SQL query in LookUp T/F.
6.When your input row count is large then go for using partial cache or full cache lookup to improve performance.
7.No cache lookup is a row based operation
8.A partial cache builds the cache as the lookup T/F is executing and also comes with high transaction impact. A partial cache approach is viable solution if you have large number of input rows.
9.However, best option is use "Full Cache" of lookup and filter the reference table.
10. Apply an 80/20 rule and load one lookup T/F with 20 percent of the most common matching records, which will allow 80 percent matches in the full cache.

Transform Level:-

Different transformations:

1.Row based(synchronous)
  •    Logically works row by row
  •    Data conversion,Derived column examples
  •    Buffer reused
2.Partial blocking(Asynchronous)
  •     Works with groups of rows
  •     Merge,Merge Join,Union All,LookUp examples
  •     Data copied to new buffers
3. Blocking( Asynchronous)
  •     Need all input rows before producing any output rows
  •     Aggregate,Sort,Pivot&UnPivot examples
  •     Data copied to new buffers
Data Types:
1.Make data types as narrow as possible so you will allocate less memory for your transformation
2.Do not perform excessive casting of data types
  •   It will degrade performance
  •   Cast source types at the database using CAST/CONVERT functions where ever possible
Push Up or Down where ever possible:
1.Data flow transformations can often be optimized
  • Sort: Push to source queries when possible, instead of using SORT transformation for sorting cross database joins.
  • Use IsSorted and SortKeyPosition option to sort data fro OLEDB source data instead of going with SORT transformation.
  • Go with MERGE sql statement instead of SCD and LookUp T/F's for SCD's data handling.
  • Use GROUP BY in sql source query instead of Aggregation T/F.
  • Use INSERT INTO statement instead of a data flow task on a single instance
  • Find about DELTA load VS RELOAD data
Loading Level:-

1. Use SQL Server destination 
  •  only when package and server are on the same server
  •  Error handling weaker than OLEDB Destination
2.Commit size = 0, then fast
3.Drop some indexes based on load growth %
4.Load data in to partitons tables
5. Truncate table instead of DELETE statement

Other options to consider:

1.BLOB Temp Storage Path: Binary Large Object 
2.Buffer Temp Storage Path:  change the Buffer Temp Storage Path and BLOB Temp Storage Path to drive locations that have been optimized and ensure they are not using the C:/ system drive.
3.Default Buffer Max Rows - 10,000 means no single buffer can have more than 10,000 rows
4.Default Buffer Size - 10,48,5760 Specify number of bytes that a single buffer cannot exceed
If single row is 1200 bytes the max rows(10,000) times row width(1200)
10,000 * 1200 ==> 12000000 which is greater than default buffer size
5.Engine Threads - 20
6.Run in Optimized Mode - True - Ignores unused source columns,destination columns and any T/F's.
I Hope, all the above points might help you while you are trying to improve performance in Data Flow task level.

Create a Comma Delimited List from a Column in a Table

Hi,

Last week I was working on concatenate column values separated with semi colon in a table.I found a good script to do this example. It is pretty fast when I compared with different examples. Lets see how it looks:
Scenario:


ID TeamID FirstName
1 1         Maruthi
2 1         Siva
3 1         Prasad
4 2         Parviz
5 2         Javadian
6 2         P
7 3         Amit
8 3         Kumar

Output:

TeamID MemberList
1         Maruthi; Prasad; Siva
2         Javadian; P; Parviz
3         Amit; Kumar

Implementation:

-- Create a table with Sample Data 
CREATE TABLE CommaSeparated
   (  ID    int IDENTITY,
      TeamID      int, 
      FirstName   varchar(50)
   )

-- Load Sample Data
INSERT INTO CommaSeparated VALUES ( 1, 'Maruthi' )
INSERT INTO CommaSeparated VALUES ( 1, 'Siva' )
INSERT INTO CommaSeparated VALUES ( 1, 'Prasad' )
INSERT INTO CommaSeparated VALUES ( 2, 'Parviz' )
INSERT INTO CommaSeparated VALUES ( 2, 'Javadian' )
INSERT INTO CommaSeparated VALUES ( 2, 'P' )
INSERT INTO CommaSeparated VALUES ( 3, 'Amit' )
INSERT INTO CommaSeparated VALUES ( 3, 'Kumar' )

Select * from CommaSeparated

--Retrieve data with semicolon 
SELECT
   t1.TeamID,
   MemberList = substring((SELECT ( DISTINCT '; ' + FirstName )
                           FROM CommaSeparated t2
                           WHERE t1.TeamID = t2.TeamID
                           ORDER BY 
                              TeamID,
                              FirstName
                           FOR XML PATH( '' )
                          ), 3, 1000 )FROM CommaSeparated t1
GROUP BY TeamID

-----------------------------------------------------------------------------------
Another way to implement the same logic:
-- But the below query will be very slow if you have huge data in table.


with FinalCommaSeparated as
(
select *, ROW_NUMBER() over (partition by TeamID order by FirstName) rownum
from CommaSeparated

select distinct TeamID,(
select FirstName 
+ case when s1.rownum = (select MAX(rownum) from FinalCommaSeparated where TeamID = s1.TeamID)  
then '' else '; ' end from FinalCommaSeparated s1
where s1.TeamID = s2.TeamID
for xml path(''),type).value('(.)[1]','Nvarchar(max)') MemberList

from FinalCommaSeparated s2
------------------------------------------------------------------------------------
Added one more way to do the same by using STUFF function in SQL:


SELECT DISTINCT
   t1.TeamID,
   MemberList = STUFF((SELECT ( DISTINCT '; ' + FirstName )
                           FROM CommaSeparated t2
                           WHERE t1.TeamID = t2.TeamID
                           FOR XML PATH( '' )
                          ), 1, 1,'' )FROM CommaSeparated t1
GROUP BY TeamID

Foreach Loop Container,ControlFlow,SSIS


Foreach Loop Container:
For each loop is used to perform the dynamic looping of files to be stored in the destination in SSIS. Like we can retrieve 10 excel sheets or 10 text files or any type of files from an existing folder or by creating folder. Just we need to pass the path of the path and mention the Extensions of the files like * .xls ,*.txt …
In Briefly about For Each Loop Container…
The Foreach Loop container defines a repeating control flow in a package. The loop implementation is similar to Foreach looping structure in programming languages. In a package, looping is enabled by using a Foreach enumerator. The Foreach Loop container repeats the control flow for each member of a specified enumerator.
SQL Server 2005 Integration Services (SSIS) provides the following enumerator types:
  • Foreach File enumerator to enumerate files in a folder. The enumerator can traverse subfolders. For example, you can read all the files that have the *.log file name extension in the Windows folder and its subfolders.
  • Foreach from Variable enumerator to enumerate the enumerable object that a specified variable contains. The enumerable object can be an array, an ADO.NET DataTable, an Integration Services enumerator, and so on. For example, you can enumerate the values of an array that contains the name of servers.
  • Foreach Item enumerator to enumerate items that are collections. For example, you can enumerate the names of executables and working directories that an Execute Process task uses.
  • Foreach Nodelist enumerator to enumerate the result set of an XML Path Language (XPath) expression. For example, this expression enumerates and gets a list of all the authors in the classical period: /authors/author [@period='classical'].
The Foreach loop uses the Foreach File enumerator, and the File System task is configured to copy a file. If the folder that the enumerator specifies contains four files, the loop repeats four times and copies four files.
You can use a combination of variables and property expressions to update the property of the package object with the enumerator collection value. First you map the collection value to a user-defined variable, and then you implement a property expression on the property that uses the variable. For example, the collection value of the Foreach File enumerator is mapped to a variable called MyFile and the variable is then used in the property expression for the Subject property of a Send Mail task. When the package runs, the Subject property is updated with the name of a file each time that the loop repeats. For more information, see Using Property Expressions in Packages.
Variables that are mapped to the enumerator collection value can also be used in expressions and scripts.
A Foreach Loop container can include multiple tasks and containers, but it can use only one type of enumerator. If the Foreach Loop container includes multiple tasks, you can map the enumerator collection value to multiple properties of each task.