Informatica MDM MDE Batch Process in a nutshell


Data is siloed across wide variety of platforms in an enterprise environment and the data needs to be processed, cleansed, and mastered to ensure it is same across source systems for effective reporting and analysis. To cater to this need, Informatica provides Master Data Management (MDM) product called Multi-Domain Edition (MDE). To master the data in this tool, the data needs to be loaded into the Informatica MDM Hub. In Informatica MDM data can be loaded in two different modes (i) Batch and (ii) Real-time.

In this blog we will delve into Batch Processing for loading the data into the Informatica MDM Hub Store, and  the various features and options that are available. Before diving into the batch process, let’s understand how data is organized in Informatica MDM Hub Store.

About MDM Hub Store

Informatica MDM Hub follows three layered approach, typically seen in an Extract, Transform and Load (ETL) process, to load the data.The layers are:

  1. Landing – this forms a pre-staging layer, tables in this layer are used to receive batch loads from multiple source systems.
  2. Staging – tables in this layer are used to hold cleansed and standardized data before loading to the final target.
  3. Target – tables in his layer, referred as Base Objects (BO), contain the master data or golden records.

These tables and other related objects, along with master data, content metadata, rules for processing the source data and rules for defining the master data are stored in a database called as Operational Reference Store or ORS.

Figure 1: Informatica Hub Store

Batch Process

Informatica MDM Hub provides a sequence of steps and options to load the data, maintain lineage of data for tracking & other metadata for effective processing and finally consolidate the data to build the Best version of Truth (BVT). Let’s look at the batch processes at a high level and understand the features they offer.

Land Process

Land Process is external to Informatica MDM Hub and is executed using external batch processes or external applications that directly populates landing tables in the Hub Store.

Land Process is optional and can be skipped, along with other Land Process options as discussed in our blog, How to integrate Informatica Data Quality (IDQ) with Informatica MDM.

Figure 2: Land Process

Stage Process

Stage Process transfers source data from a “landing table to source specific staging table” associated with a base object. The movement of data from landing table to staging table is defined using Mappings offered by Informatica MDM Hub.

Figure 3: Stage Process


Mappings define the movement of data from the source column in the landing table to the target column in staging table. Data from landing table is cleansed and standardized before loading to staging table. The cleansing and standardization can be done within Informatica MDM HUB or outside using external tools like Informatica Data Quality (IDQ) or Informatica Power Center.

Figure 4: Landing Table to Staging Table Mapping

Reject Records

During the Stage Process, records that have problems are rejected, transferred along with the reason for rejection to the Reject Table associated with the Staging Table. If there are multiple reasons for rejection, the first reason alone is persisted.

Advantage of this feature

This feature helps investigate the rejected records after running Stage Process. As this feature is is built into Informatica MDM, it saves considerable development and implementation time.

Audit Trail

Stage Process is capable of maintaining a copy/replica of source data in a table (called as RAW table) associated with each of the staging tables. This is possible only if audit trail is enabled for the staging table for configurable number of stage job runs or retention period.

Advantage of this feature

This feature is useful for auditing purposes and tracking data issues, identifying the missing data back to the source data. Informatica MDM Hub provides this feature with an easy configuration, saving considerable development and implementation time.

Delta Detection

Stage Process can detect the records in the landing table that are new or updated if delta detection is enabled for a staging table. The comparison for delta detection can be configured based on all columns or for any date column or specific set of columns as required.

Advantage of this feature

This feature is useful, if the data sent by source system is a full datset, to limit the volume to changed data only and thus improve performance in further processes.

Load Process

Load Process moves data from staging table to the corresponding base object. In addition to this, Load Process performs lookup, computes the trust score for merging of data and running the validation rules. Each base object has tables associated with it to capture the lineage of data, track the history of changes and other details.

Figure 5: Load Process

Reject Records

During the Load Process, records are rejected and transferred along with reason for rejection to the Reject Table associated with Base Object’s Staging Table for that source system.

Advantage of this feature

This feature is based on cental reject table for maintaining reject records for both Stage & Load Process. Informatica knowledge base article KB 90407 provides a good insight into the reject handling process.

Tokenize Process

Tokenization process generates match tokens and stores them in a match key (strip) table associated with the base object. The match tokens are subsequently used by the Match Process to identify suspects for matching.

Figure 6: Tokenize Process

Match & Consolidate Process

Matching is the process of identifying whether two records are similar or duplicate of the other either by exact (deterministic) or fuzzy (probabilistic) matching.

Consolidation is the process of consolidating data from matched records into a single, master record once the match pairs or suspects have been identified in the Match process. Records can flag for auto merge if the suspects are sufficiently similar or sent for manual merge if suspects are likely to be duplicates.

Figure 7: Load & Consolidate Process


The following figure provides a detailed perspective on the overall flow of data through the Informatica MDM Hub using batch processes, including individual processes, source systems, base objects, and support tables.

Figure 8: Informatica MDM MDE Batch Process (Image source: Informatica)

In Informatica MDM, Batch mode is often used for initial data load (first time business data is loaded), as it can be the most efficient way to populate large number of records. Batch mode provides necessary options and logic for loading data, for updating lineage of the data and other metadata for processing the data.

About the Author

Author: Sathish Kumar

Sathish Kumar is a Senior Consultant at Mastech Infotrellis having 6 plus years of experience in development and implementation of ETL and MDM solutions. He has good knowledge in various Informatica products like Power Center, IDQ, Informatica Cloud and Informatica MDM MDE.

Co-Author: Sachin Dedhia

Sachin Dedhia is an Architect at Mastech InfoTrellis and has 13 plus years of extensive experience in design and development of Data Integration (DI), Business Intelligence (BI) and Master Data Management (MDM) solutions. He leads the Informatica MDM practice providing his expertise on the projects, designing internal trainings, assets and accelerators.

Batch Process, Informatica, Land Process, Load Process, MDE, MDM, Stage Process

7 responses to “Informatica MDM MDE Batch Process in a nutshell”

  1. This is something worth sharing. I am amazed to see how well you organized this post on Informatica MDM MDE Batch Process in a nutshell

    • Kevin says:

      Hi Satish,

      I love all the posts, I really enjoyed.
      I would like more information about this, because it is very nice., Thanks for sharing.

      Are there any log messages in the domain log, in the catalina.out file, in the exceptions.log file, in the node.log file?
      The analyst can create the reference tables that the developers need to standardize, cleanse and enrich the data. Once mapplets have been developed by the developers, they can be reviewed by the analyst and modified if required Informatica Data quality training.
      Furthermore the PAM (Product Availability Matrix) for 10.1.1 indicates that Oracle 12c and 12cR1 are supported for PowerCenter, but not 12.1. Or is 12cR1 the same as 12.1? I’m no Oracle specialist, hence this seemingly stupid question.

      Super likes !!! for this amazing post. I thinks everyone should bookmark this.


  2. […] Informatica MDM MDE Batch Process in a nutshell […]

  3. Suma says:

    Hi Satish,

    You make learning and reading addictive. All eyes fixed on you. Thank you being such a good and trust worthy guide.
    We are working on integration of SFDC (SalesForce CRM) with Exact Target(ET) (SalesForce Marketing Cloud) through Informatica PowerCenter 9.5.1.
    But facing below issues:
    1) While importing SalesForce object into PowerCenter : Getting an error “”timed out””
    2) How to connect to Exact target from power Centre using API (we tested this from Java by passing proxy to webservie but not sure how to pass proxy to web service in power Centre).
    The source data is sent to the Informatica developer tool (IDQ) for cleansing and standardizing the data which ensures the quality of the data.

    Awesome! Thanks for putting this all in one place. Very useful!

    Many Thanks,

  4. Irene Hynes says:

    Hi There,

    Thanks a trillion mate!
    It works like charm, saved a lot of energy & time.

    I am working MDM 10.2 provisioning tool. I need to add a filter to consolidation indicator in provisioning tool. I am able to add the filter. But while publishing change list is blank. PFA for snapshots. Informatica MDM Training USA
    How can I set filter on consolidation indicator(System variable)? Could anyone please help me.

    THANK YOU!! This saved my butt today, I’m immensely grateful.

    Irene Hynes

  5. Unknown says:

    I am very new to this.. I am currently looking for a tool which can do data matching/profiling. Some one proposed to use Informatica MDM tool rather than IDQ. So i would like to know, is MDM & MDM hub are same? If not, is Extract, Transfer & Load is also supports by MDM?

Leave a Reply

Your email address will not be published. Required fields are marked *