How to integrate Informatica Data Quality (IDQ) with Informatica MDM

Overview

Data cleansing and standardization is an important aspect of any Master Data Management (MDM) project. Informatica MDM Multi-Domain Edition (MDE) provides reasonable number of cleanse functions out-of-the-box. However, there are requirements when the OOTB cleanse functions are not enough and there is a need for comprehensive functions to achieve data cleansing and standardization, for e.g. address validation, sequence generation. Informatica Data Quality (IDQ) provides an extensive array of cleansing and standardization options. IDQ can easily be used along with Informatica MDM.

This blog post describes the various options to integrate Informatica MDM and IDQ, explains the advantages and disadvantages of each approach to aid in deciding the optimal approach based on the requirements.

Informatica MDM-IDQ Integration Options

There are three options through which IDQ can be integrated with Informatica MDM.

  1. Informatica Platform staging
  2. IDQ Cleanse Library
  3. Informatica MDM as target

Option 1: Informatica Platform Staging

Starting with Informatica MDM’s Multi-Domain Edition (MDE) version 10.x, Informatica has introduced a new feature called “Informatica Platform Staging” within MDM to integrate with IDQ (Developer Tool). This feature enables to directly stage/cleanse data using IDQ mappings to MDM’s Stage tables bypassing Landing tables.

Figure 1: Informatica Platform Staging Process

Advantages

  • Stage tables are immediately available to use in the Developer tool after synchronization eliminating the need to manually create physical data objects.
  • Changes to the synchronized structures are reflected into the Developer tool automatically.
  • Enables loading data into Informatica MDM’s staging tables bypassing the landing tables.

Disadvantages

  • Creating connection for each Base Object folder in Developer tool can be cumbersome to maintain.
  • Hub Stage options like Delta detection, hard delete detection, and audit trails are not available.
  • System generated columns need to be populated manually.
  • Rejected records are not captured in _REJ table of the corresponding stage table but get captured in .bad file.
  • Invalid lookup values are not rejected while data loads to stage unlike in Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.

Option 2: IDQ Cleanse Library

IDQ allows creating functions as operation mappings and deploy them as web service which can then be imported in Informatica MDM Hub implementation as a new type of cleanse library defined as IDQ cleanse library. This functionality allows usage of the imported IDQ cleanse functions, just like any other out-of-the-box cleanse function. Informatica MDM Hub acts as a Web service client application that consumes IDQ’s web services.

Figure 2: Hub Stage Process design using IDQ imported function

Advantages

  • Easily build transformations in IDQ’s Informatica Developer tool rather than creating complex java functions.
  • Unlike Informatica Platform staging, Hub Stage process options such as delta detection, hard delete detection, audit trail are available for use.

Disadvantages

  • Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.
  • IDQ function must contain all transformation logic to leverage batching of records. If any transformation logic is additionally defined in the MDM map, then calls to the IDQ web service will be single record leading to performance issue.
  • Web service invocations are synchronous only, which can be a concern for large data volume.

Option 3: Informatica MDM as target

3.1 Loading data landing tables

Informatica MDM can be used as target for loading the data to landing tables in Informatica MDM.

Figure 3: Using IDQ as ETL tool for loading data into landing table

Advantages

  • Single connection created in Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.
  • No need to standardize data in the Hub Stage Process.
  • Unlike Informatica Platform staging, Hub Stage process options – delta detection, hard delete detection, audit trail are available to use.

Disadvantages

  • Physical data objects need to be manually created for each landing table and manually updated for any changes to the table.
  • Need to create mappings at two levels (i) source to landing and (ii) landing to staging (direct mapping).

3.2 Loading data staging tables (bypassing landing tables)

Informatica MDM can be used as target for loading the directly to staging tables in Informatica MDM, bypassing landing tables.

Figure 4: Using IDQ as ETL tool to load data directly into staging tables

Advantages

  • Single connection created in Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.
  • Can be used for lower version of Informatica MDM where Informatica Platform staging option is not available.

Disadvantages

  • Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.
  • Hub Stage Delta detection, hard delete detection, and audit trails options are not available.
  • System generated columns need to be populated manually.
  • Rejected records are not captured in _REJ table of the corresponding stage table but get captured in .bad file.
  • Invalid lookup values are not rejected while data loads to stage unlike in Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.

Conclusion

While there are multiple options to integrate IDQ with Informatica MDM, based on client’s requirements, the ideal approach is analyzed and recommended.

About the Author

Author: Meenatchi D.

Meenatchi D. is a tech savvy Consultant with widespread experience in Data Integration (DI) and Business Intelligence (BI) space and worked on leading tools from IBM, Informatica and Microsoft. She also has sound knowledge of Master Data Management (MDM) concepts and is trained in IBM and Informatica MDM.

Co-Author: Sachin Dedhia

Sachin Dedhia is an Architect at Mastech InfoTrellis and has 13 plus years of extensive experience in design and development of Data Integration (DI), Business Intelligence (BI) and Master Data Management (MDM) solutions. He leads the Informatica MDM practice providing his expertise on the projects, designing internal trainings, assets and accelerators.

Data Quality, IDQ, Informatica Data Quality, Informatica IDQ, Informatica MDM, Landing Table, MDE, MDM, Multi-Domain Edition, Staging Table

Leave a Reply

Your email address will not be published. Required fields are marked *