Search This Blog

Friday, July 10, 2009

Four Key Data Pipeline Best Practice Considerations

Here are 4 key considerations when developing feeds to get the most value out of BPM Connect, and adhere to best practices.
  • No Stored Procedures - These are evil. While seductive on the surface during initial stages of feed development, they tear down portability and cause a maintenance and management nightmare in the long run.
  • Minimize use of REMOTESTAGE option in Raw Data tab - While a step above stored procedures due to the portability factor, like stored procedures, these tend to be prone to maintenance and management problems, and allow the developer to bypass the Data Pipeline Best practices. Sometimes, these are an unavoidable necessity in complex scenarios, however, in 95% of cases, other than VERY complex models, these should not be used.
  • "Select * from" in Raw Data tab - This should be used as much as possible. Allow the data map to define the column headers. This simplifies and minimizes code DRAMATICALLY. Makes maintenance quick, easy and stress free. This allows the feeds act more dynamically in complex changing environments.  The using of GROUP BY statements in the EXTRACT query is not recommended.  This puts unnecessary workload on the source database, increases code complexity, and creates redundant optimization (optimization occurs as a natural part of the data pipeline).
  • Minimize use of Linked Server - Linked servers are NOT efficient, and add HUGE amounts of performance drag. Linked servers, as with remote staging, are only to be used in exceptions scenarios (I.E. XP command shell operations like FTP direct transfers, 64 driver accommodations, etc.). Direct use of an ODBC driver is always the desired approach.

No comments:

Post a Comment