Informatica性能调优(初级)(8)
发布时间:2021-06-06
发布时间:2021-06-06
Informatica性能调优(初级)
optimizer looses track of the index statistics. Again - utilize staging tables if possible. In utilizing staging tables, views in the database can be built which join the data together; or Informatica's joiner object can be used to join data together - either one will help dramatically increase speed.
11. Separate complex maps - try to break the maps out in to logical threaded sections of processing. Re-arrange the architecture if necessary to allow for parallel processing. There may be more smaller components doing individual tasks, however the throughput will be proportionate to the degree of parallelism that is applied. A discussion on HOW to perform this task is posted on the methodologies page, please see this discussion for further details.
12. BALANCE. Balance between Informatica and the power of SQL and the database. Try to utilize the DBMS for what it was built for: reading/writing/sorting/grouping/filtering data en-masse. Use Informatica for the more complex logic, outside joins, data integration, multiple source feeds, etc... The balancing act is difficult without DBA knowledge. In order to achieve a balance, you must be able to recognize what operations are best in the database, and which ones are best in Informatica. This does not degrade from the use of the ETL tool, rather it enhances it - it's a MUST if you are performance tuning for high-volume throughput.
13. TUNE the DATABASE. Don't be afraid to estimate: small, medium, large, and extra large source data set sizes (in terms of: numbers of rows, average number of bytes per row), expected throughput for each, turnaround time for load, is it a trickle feed? Give this information to your DBA's and ask them to tune the database for "wost case". Help them assess which tables are expected to be high read/high write, which operations will sort, (order by), etc... Moving disks, assigning the right table to the right disk space could make all the difference. Utilize a PERL script to generate "fake" data for small, medium, large, and extra large data sets. Run each of these through your mappings - in this manner, the DBA can watch or monitor throughput as a real load size occurs.
14. Be sure there is enough SWAP, and TEMP space on your PMSERVER machine. Not having enough disk space could potentially slow down your entire server during processing (in an exponential fashion). Sometimes this means watching the disk space as while your session runs. Otherwise you may not get a good picture of the space available during operation. Particularly if your maps contain aggregates, or lookups that flow to disk Cache directory - or if you have a JOINER object with heterogeneous sources.
15. Place some good server load monitoring tools on your PMServer in development - watch it closely to understand how the resources are being utilized, and where the hot spots are. Try to follow the recommendations - it may mean upgrading the hardware to achieve throughput. Look in to EMC's disk storage array - while expensive, it appears to be extremely fast, I've heard (but not verified) that it has improved performance in some cases by up to 50%
16. SESSION SETTINGS. In the session, there is only so much tuning you can do. Balancing the throughput is important - by turning on "Collect Performance Statistics" you can get a good feel for what needs to be set in the session - or what needs to be changed in the database. Read the performance section carefully in the Informatica manuals. Basically what you should try to achieve is: OPTIMAL READ, OPTIMIAL THROUGHPUT, OPTIMAL WRITE. Over-tuning one of these three pieces can result in ultimately slowing down your session. For example: your write throughput is governed by your read and transformation speed, likewise, your read throughput is governed by your transformation and write speed. The best method to tune a problematic map, is to break it in to components for testing: 1) Read Throughput, tune for the
上一篇:分级预警系统使用手册