Posted by sathishbaskaran on Tuesday, May 12, 2015 @ 9:43 AM

MDM BatchProcessor is a multi-threaded J2SE client application used in most of the MDM implementations to load large volumes of enterprise data into MDM during initial and delta loads. Oftentimes, processing large volumes of data might cause performance issues during the Batch Processing stage thus bringing down the TPS (Transactions per Second).

Poor performance of the batch processor often disrupts the data load process and impacts the go-live plans. Unfortunately, there is no panacea available for this common problem. Let us help you by highlighting some of the potential root causes that influence the BatchProcessor performance. We will be suggesting remedies for each of these bottlenecks in the later part of this blog.

Infrastructure Concerns

Any complex, business-critical Enterprise application needs careful planning, well ahead of time, to achieve optimal performance and MDM is no exception. During development phase it is perfectly fine to host MDM, DB Server and BatchProcessor all in one physical server. But the world doesn’t stop at development. The sheer volume of data MDM will handle in production needs execution of a carefully thought-out infrastructure plan. Besides, when these applications are running in shared environments Profiling, Benchmarking and Debugging become a tedious affair.

CPU Consumption

BatchProcessor can consume lot of precious CPU cycles in most trivial of operations when it is not configured properly. Keeping an eye for persistently high CPU consumption and sporadic surges is vital to ensure CPU is optimally used by BatchProcessor.


Deadlock is one of the frequent issues encountered during the Batch Processing in multi-threaded mode. Increasing the submitter threads count beyond the recommended value might lead into deadlock issue.

Stale Threads

As discussed earlier, a poorly configured BatchProcessor might open up Pandora’s Box. Stale threads can be a side-effect of thread count configuration in BatchProcessor. Increasing the submitter threads, reader and writer threads beyond the recommended numbers may cause some of the threads to wait indefinitely thus wasting precious system resources.

100% CPU Utilization

“Cancel Thread” is one of the BatchProcessor daemon threads, designed to gracefully shutdown BatchProcessor when the user intends to. Being a daemon thread, this thread is alive during the natural lifecycle of the BatchProcessor. But the catch here is it hogs up to nearly 90% of CPU cycles for a trivial operation thus bringing down the performance.

Let us have a quick look at the UserCancel thread in BatchProcessor client. The thread waits for user interruption indefinitely and checks for the same every 2 seconds once while holding on the CPU all the time.

Thread thread = new Thread(r, “Cancel”);



while (!controller.isShuttingDown()) {



            int i =;

            if (i == -1)






              catch (InterruptedException e) {}




              char ch = (char)i;

              if ((ch == ‘q’) || (ch == ‘Q’)) {





          catch (IOException iox) {}


BatchProcessor Performance Optimization Tips

We have so far discussed potential bottlenecks in running BatchProcessor at optimal levels. Best laid plans often go awry. What is worst is not having a plan. A well thought out plan needs to be in place before going ahead with data load. Now, let us discuss some useful tips that could help to improve the performance during data load process.

Infrastructure topology

For better performance, run the MDM application, DB Server and BatchProcessor client on different physical servers. This will help us to leverage the system resources better.

Follow the best thread count principle

If there are N number of physical CPUs available to IBM InfoSphere MDM Server that caters to BatchProcessor, then the recommended number of submitter threads in BatchProcessor should be configured between 2N and 3N.

For an example, assume the MDM server has 8 CPUs then start profiling the BatchProcessor by varying its submitter threads count between 16 and 24. Do the number crunching, keep an eye on resource consumption (CPU, Memory and Disk I/Os) and settle on a thread count that yields optimal TPS in MDM.


You can modify the Submitter.number property in to change the Submitter thread count.

For example:

Submitter.number = 4

Running Multiple BatchProcessor application instances

If MDM server is beefed up with enough resources to handle huge number of parallel transactions, we should consider parallelizing the load process by dividing the data into multiple chunks. This involves running two or more BatchProcessor client instances in parallel, either in same or different physical servers depending on the resources available in that server. Each BatchProcessor application instance here must work with a separate batch input and output; however they can share the same server-side application instance or operate against a dedicated instance(each BatchProcessor instance pointing to a different Application Server in the MDM cluster). This exercise will increase the TPS and lower the time spent in data load.

Customizing the Batch Controller

Well, this one is a bit tricky. We are looking at modifying the OOTB behavior here. Let us go ahead and do it as it really helps.

  • Comment out the following snippet in runBatch() method ofjava


  • Recompile the BatchProcessor class and keep it in the jar
  • Replace the existing DWLBatchFramework.jar, present under <BatchProcessor Home>/lib with this new one which contains modified BatchController class
  • Bounce the BatchProcessor instance and check the CPU consumption

Manage Heap memory

Memory consumption may not be a serious threat while dealing with BatchProcessor but in servers that host multiple applications along with BatchProcessor the effective memory that can be allocated to it could be very low. During the data load process if high memory consumption is observed then allocating more memory to BatchProcessor helps to ensure a smooth run. In the BatchProcessor invoking script (named as runbatch.bat in Windows environments and in UNIX environments), there are couple of properties that control the memory allocated to the BatchProcessor client.

set minMemory=256M

set maxMemory=512M

It is recommended to keep the minMemory and maxMemory at 256M & 512M respectively. If the infrastructure is of high-end, then minMemory and maxMemory can be increased accordingly. Again, remember to profile the data load process and settle for optimal numbers.

Reader and Writer Thread Count

It is recommended by IBM to keep the Reader and Writer Number thread counts as 1. Since, they are involved in lightweight tasks this BatchProcessor configuration should suit most of the needs.

Shuffle the data in the Input File

By shuffling the data in the input file,  the percentage of similar records (records with high probability of getting collapsed/merged in MDM) being processed at the same time can be brought down thus avoiding long waits and deadlocks.

Scale on the Server side

Well, well, well. We have really strived hard to make BatchProcessor client to perform at optimal levels. Still, poor performance is observed resulting in very low TPS? It is time to look into the MDM application. Though optimizing MDM is beyond the scope of this blog let us provide a high-level action plan to work on.

You can either:

  1. Increase the physical resources(more CPUs, more RAM) for the given server instance
  2. Hosting MDM in a clustered environment
  3. Allocating more application server instances to the existing cluster which hosts MDM
  4. Having dedicated cluster with enough resources for MDM rather than sharing the cluster with other applications
  5. Logging only critical, fatal errors in MDM
  6. Enabling SAM and Performance logs in MDM and tweaking the application based on findings

Hope you find this blog useful. Try out these tips when you are working on a BatchProcessor data load process next time and share how useful you find them. I bet you’ll have something to say!

If you are looking at any specific recommendations on BatchProcessor, feel free to contact Always happy to assist you.

Topics: InfoTrellis Master Data Management MasterDataManagement mdm mdm hub MDM Implementation
Posted by manasa1991 on Monday, May 11, 2015 @ 5:36 PM

Calvin: “You can’t just turn on creativity like a faucet. You have to be in the right mood.”
Hobbes: “What mood is that?”
Calvin: “
Last-minute panic.”

Okay, apologies for an unscheduled delay on the follow up post. Let’s get back to discussing how we manage our MDM Projects.

In my previous post, we talked about the first two stages of “InfoTrellis SMART MDM Methodology”, namely “Discovery and Assessment” and “Scope and Approach”. In these two stages, we spoke about activities around understanding business expectations, helping clients formulate their MDM strategy, help them identify scope of an MDM implementation along with defining right use cases and the optimal solution approach. I also mentioned that we generally follow a “non-iterative” approach to these stages as this helps us build a solid foundation before we can go on to the actual implementation.


Once scope of an MDM project is defined and client agrees to the solution approach, we enter the iterative phases of the project. We group them into two stages in our methodology:

  1. Analysis and Design
  2. Development and QA

Through these stages, we perform detailed requirements analysis, technical design, development and functional testing across several iterations.

Requirements Analysis:

At this stage of the project, high level business requirements are already available and we must start analyzing and prioritizing which requirements need to go into which phase. For Iteration I, we typically take up all foundation aspects of MDM such as the data model changes, initial Maintain services, ETL initial load and related activities. An MDM product consultant will interpret the business requirements, and work with the technical implementation leads to come up with:

  1. Custom Data Model with additions and extensions, as per project requirements
  2. Detailed data mapping document that captures source to MDM mapping for services as well as Initial load (one time migration) – data mapping is tricky; there will be different channels through which data will be brought into MDM. All different channels need to be identified and specific mapping for all these channels have to be completed; Doing this right will help us avoid surprises at a later stage
  3. Functional Requirements for each of the features – Services, Duplicate processing and so on

Apart from the requirements analysis, work on the “Requirements Traceability Matrix” should start at this stage. This is one document that captures system traceability of requirements to test cases and will come in handy throughout the implementation.


Functional requirements are translated into detailed technical design for both MDM and ETL. Significant design decisions are listed out, Object model, business objects designed, and detailed design sequence diagrams are created. Similar sets of design artifacts are created for ETL components as well. The key items that are worked on during the design phase are:

  • Significant use cases – From a technical perspective, functional use cases are interpreted so the developer has a better grip on use cases and how they are connected together to form the overall solution
  • Detailed design elements – Elaboration on each technical component so development team has to just interpret what is designed as MDM code or ETL components
  • Unit Test cases – The technical lead plans unit test cases so 360 degree coverage is ensured during unit testing, and most of the simple unit level bugs are identified

Within the sphere of tools that we use, if unit test automation is possible we do that as well.


MDM and ETL development happen in most of our projects. Apart from IBM’s MDM suite, we also work on a spectrum of ETL tools such as IBM DataStage, Informatica Power Center, SAP PI, IBM CastIron, Talend ETL, and Microsoft SSIS. Some aspects that we emphasize on across all our projects are:

  • Coding standards – MDM and ETL teams have respective coding standards which are periodically reviewed as per changes in different product releases, and technological changes in general. The developers are trained to follow these standards when they write code
  • Continuous Integration – Most of our clients have svn repositories and our development teams actively use these repositories so the code remains an integral unit. We also have local repositories that can be used when the client does not have a repository of their own and explicitly allow us to host their code in our network
  • Peer code review – Every module is reviewed by a peer who acts as another pair of eye to bring in a different perspective
  • Lead code review – Apart from peer review, code is also reviewed by the tech lead to ensure development is consistent and error free
  • Unit Testing – Thorough unit testing is driven off the test cases written by development leads during design phase. Wherever possible, we also automate unit test cases redundancy and efficiency

With these checks and balances the developed code moves into testing phase.


QA lead comes up with comprehensive test strategy covering Functional, system, performance and user acceptance testing. The different types of testing that we participate in differs from project to project, based on client requirements. We typically take up functional testing within the iterative Implementation phase. Rest are done once all functional components are developed and tested thoroughly.

Functional testing is driven off functional requirements. Our QA lead reviews the design as well to understand significant design decisions that helps in creating optimal test scenarios. Once requirements and design documents are reviewed, detailed test scenarios and test cases are created and are reviewed by the Business Analyst to ensure sufficient coverage. A mix of manual and automated testing is performed based on allowed scope in the project. Functional testing process will involve the following:

  • Test Definition – Scenarios / cases created, test environments identified, defect management and tracking methodology established, test data prepared or planned for
  • Test execution – Every build is subject to a build acceptance test, and upon being successful, the build is tested in detail for functionality
  • Regression runs – Once we enter defect fixing mode, multiple runs of (mostly automated) regression tests are run to ensure that test acceptance criteria is met
  • Test Acceptance – Our commitment is to provide a thoroughly tested product at the end of each iteration. For every release, we ensure all severity 1 and severity 2 defects are fixed, and low severity defects if deferred are documented and accounted for in subsequent releases.


In the deployment stage, we group the following activities together:

  1. System, UAT, Performance testing – All aspects of testing that sees the implementation as a single functional unit are performed
  2. MDM code deployment – MDM code will be deployed in production environment, and delta transactions (real time, or near real time) will be started
  3. One time migration or Initial Load to MDM – From various source systems, data will be extracted, transformed and loaded into MDM as a one-time exercise.

Deployment is very critical as it is a culmination of all work done until that point in the project. This is also the point at which the MDM system will get exposed to all other external systems in the client organization. If MDM is part of a much detailed revamp, or a bigger program, there will be many other projects that will need to go live or get deployed at the same time. To ensure deployment is successful, the following key points are to be considered:

  • Identify all interconnecting points and come up with an system plan that covers MDM and all integrating systems
  • If applicable, participate actively at program level activities as well to ensure the entire program accounts for all the items that have been built as part of the MDM project
  • Initial load happens across many days mostly in 24-hour cycles. Come up with clear plan, team, roles and responsibilities and if possible perform a trial / mock run of initial load

There is typically a post deployment support period and in this period we monitor the MDM hub to ensure master data is created as planned. If needed, optimizations and adjustments are made to ensure that the MDM hub performs as desired.

Once deployment is successfully completed, don’t forget to celebrate with the project team!!!

Topics: Master Data Management MasterDataManagement MDM Implementation

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by Kumaran Sasikanthan on Friday, Sep 27, 2013 @ 2:50 AM

InfoTrellis has begun its hiring at universities and colleges in Canada, US and India for 2013-14. We hire primarily for technical roles from universities and colleges offering high quality education in computer science.


At InfoTrellis, we work on some of the most challenging Master Data Management projects in the world, serving clients across multiple domains and industries. Ours is a David vs. Goliath story; we compete with companies that employ armies of professionals and massive resources – and in spite of their size advantages we win based on the strength of our reputation for superior client service and competency.

Members of our team of experts are often called upon to speak at industry events.

With our specialist skills, niche focus, practical light weight processes, relaxed and empowering team structures and most importantly smart and highly motivated people, we have consistently won clients and delivered superior results. We are currently looking for entrepreneurial minded individuals to join us in taking our company to the next level.

Our team comes together to tackle the big challenges and to celebrate the big wins!


I wanted to write this blog article specifically to share some insights on how we hire, what we look for and what you can expect once you join us.

  1. We don’t necessarily look for what you have done, but what you can do. New skills can be attained quickly, but aptitude can’t be developed overnight. We look for strong aptitude for analytical and logical reasoning. All candidates go through a written test that tests your analytical and logical reasoning skills. Puzzles and logical questions are common in our interviews.
  2. We work primarily on Java technologies, so obviously programming skills in Java are highly desirable. It isn’t a total dealbreaker, though: if you’ve got experience with any other programming languages like C or C++, you should be fine during the interview process. Expect a lot of programming questions both in the written test and personal interviews.
  3. Our focus is in Master Data Management and related technologies, so knowledge of databases and SQL is a very strong asset. We don’t look for any specific vendor database skills. Expect quite a few data modelling and design questions that will test your understanding of the various database concepts.
  4. We believe that titles are for facilitating external interactions and not for flaunting within the organization. You will find our people multi-cultural, humble yet ambitious. Expect questions that seek to determine your fit into our environment of enthusiastic and success-motivated team players.
  5. Every InfoTrellis employee is part of working towards our overall mission: to build a company that is recognized worldwide as the premier consulting company in the field of information management. We have a lot of be proud of, looking at what we’ve already accomplished, but there is a lot more to be achieved and all this will require hard work and deep commitment to our unified vision. Expect questions that probe your drive to be part of a group that is highly motivated and works in a very fast paced environment.
  6. Our work has a direct impact on our client’s top line and bottom line. With that in mind, our consultants need to have the acumen and eagerness to understand our client’s business and how technology can solve their business problems. Expect questions that seek to assess your business acumen.

Once you join us, you can expect:

  1. Training on core skills that will allow you to ramp up quickly in our focus areas. Live POC’s and mock projects to help you prepare for the real thing will be part of the training program
  2. Regular interactions with senior members of the company, including the founding members and executive team
  3. Working as part of a global team, interacting with different geographies and cross functional teams
  4. Opportunity to work on high profile client engagements very early in your career
  5. If you are based in North America, opportunity to travel to different client locations, build up your mileage points and live the exciting life of a high end consultant.

InfoTrellis is the ideal company for sharp-minded young professionals who want to start building the foundations of a rewarding and challenging career – if you find yourself thinking you want more than just a job, this is the place for you.

I wish you all the best in your search for a career.

The author is the VP for Consulting at InfoTrellis and is directly responsible for hiring, training, retaining all of our people across US, Canada and India. . Please feel free to provide comments and send any queries you may have to

Topics: computer science employment hiring HR infoformation technology information management InfoTrellis IT jobs MDM Implementation

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by marianitorralba on Friday, Sep 6, 2013 @ 2:27 PM

Deterministic Matching versus Probabilistic Matching

Which is better, Deterministic Matching or Probabilistic Matching?

I am not promising to give you an answer.  But through this article, I would like to share some of my hands-on experiences that may give some insights to help you make an informed decision in regards to your MDM implementation.

Before I got into the MDM space three years ago, I worked on systems development encompassing various industries that deal with Customer data.  It was a known fact that duplicate Customers existed in those systems.  But it was a problem that was too complicated to address and was not in the priority list as it wasn’t exactly revenue-generating.  Therefore, the reality of the situation was simply accepted and systems were built to handle and work around the issue of duplicate Customers.

Corporations, particularly the large ones, are now recognizing the importance of having a better knowledge of their Customer base.  In order to achieve their target market share, they need ways to retain and cross-sell to their existing Customers while at the same time, acquire new business through potential Customers.  To do this, it is essential for them to truly know their Customers as individual entities, to have a complete picture of each Customer’s buying patterns, and to understand what makes each Customer tick.   Hence, solving the problem of duplicate Customers has now become not just a means to achieve cost reduction, higher productivity, and improved efficiencies, but also higher revenues.

But how can you be absolutely sure that two customer records in fact represent one and the same individual?  Conversely, how can you say with absolute certainty that two customer records truly represent two different individuals?  The confidence level depends on a number of factors as well as on the methodology used for matching.  Let us look into the two methodologies that are most-widely used in the MDM space.

Deterministic Matching

Deterministic Matching mainly looks for an exact match between two pieces of data.  As such, one would think that it is straightforward and accurate.  This may very well be true if the quality of your data is at a 100% level and your data is cleansed and standardized in the same way 100% of the time.  We all know though that this is just wishful thinking.  The reality is, data is collected in the various source systems across the enterprise in many different ways.  The use of data cleansing and standardization tools that are available in the market may provide significant improvements, but experience has shown that there is still some level of customization required to even come close to the desired matching confidence level.

Deterministic Matching is ideal if your source systems are consistently collecting unique identifiers like Social Security Number, Driver’s License Number, or Passport Number.  But in a lot of industries and businesses, the collection of such information is not required, and even if you try to, most customers will refuse to give you such sensitive information.  Thus, in majority of implementations, several data elements like Name, Address, Phone Number, Email Address, Date of Birth, and Gender are deterministically matched separately and the results are tallied to come up with an overall match score.

The implementation of Deterministic Matching requires sets of business rules to be carefully analyzed and programmed.  These rules dictate the matching and scoring logic.  As the number of data elements to match increases, the matching rules become more complex, and the number of permutations of matching data elements to consider substantially multiplies, potentially up to a point where it may become unmanageable and detrimental to the system’s performance.

Probabilistic Matching

Probabilistic Matching uses a statistical approach in measuring the probability that two customer records represent the same individual.  It is designed to work using a wider set of data elements to be used for matching.  It uses weights to calculate the match scores, and it uses thresholds to determine a match, non-match, or possible match.  Sounds complicated?  There’s more.

I recently worked on a project using the IBM InfoSphere MDM Standard Edition, formerly Initiate, which uses Probabilistic Matching.  Although there were other experts in the team who actually worked on this part of the project, here below are my high-level observations.  Note that other products available in the market using the Probabilistic Matching methodology may generally work around similar concepts.

  • It is fundamental to properly analyze the data elements, as well as the combinations of such data elements, that are needed for searching and matching.  This information goes into the process of designing an algorithm where the searching and matching rules are defined.
  • Access to the data up-front is crucial, or at least a good sample of the data that is representative of the entire population.
  • Probabilistic Matching takes into account the frequency of the occurrence of a particular data value against all the values in that data element for the entire population.  For example, the First Name ‘JOHN’ matching with another ‘JOHN’ is given a low score or weight because ‘JOHN’ is a very common name.  This concept is used to generate the weights.
  • Search buckets are derived based on the combinations of data elements in the algorithm.  These buckets contain the hashed values of the actual data.  The searching is performed on these hashed values for optimum performance.  Your search criteria are basically restricted to these buckets, and this is the reason why it is very important to define your search requirements early on, particularly the combinations of data elements forming the basis of your search criteria.
  • Thresholds (i.e. numeric values representing the overall match score between two records) are set to determine when two records should: (1) be automatically linked since there is absolute certainty that the two records are the same; (2) be manually reviewed as the two records may be the same but there is doubt; or (3) not be linked because there is absolute certainty that the two records are not the same.
  • It is essential to go through the exercise of manually reviewing the matching results.  In this exercise, sample pairs of real data that have gone through the matching process are presented to users for manual inspection.  These users are preferably a handful of Data Stewards who know the data extremely well.  The goal is for the users to categorize each pair as a match, non-match, or maybe.
  • The categorizations done by the users in the sample pairs analysis are then compared with the calculated match scores, determining whether or not the thresholds that have been set are in line with the users’ categorizations.
  • The entire process may then go through several iterations.  Per iteration, the algorithm, weights, and thresholds may require some level of adjustment.

As you can see, the work involved in Probabilistic Matching appears very complicated.  But think about the larger pool of statistically relevant match results that you may get, of which a good portion might be missed if you were to use the relatively simpler Deterministic Matching.

Factors Influencing the Confidence Level

Before you make a decision on which methodology to use, here are some data-specific factors for you to consider.  Neither the Deterministic nor the Probabilistic methodology is immune to these factors.

Knowledge of the Data and the Source Systems

First and foremost, you need to identify the Source Systems of your data.  For each Source System that you are considering, do the proper analysis, pose the questions.  Why are you bringing in data from this Source System?  What value will the data from this Source System bring into your overall MDM implementation?  Will the data from this Source System be useful to the enterprise?

For each Source System, you need to identify which data elements will be brought into your MDM hub.  Which data elements will be useful across the enterprise?  For each data element, you need to understand how it is captured (added, updated, deleted) and used in the Source System, the level of validation and cleansing done by the Source System when capturing it, and what use cases in the Source System affect it.  Does it have a consistent meaning and usage across the various Source Systems supplying the same information?

Doing proper analysis of the Source Systems and its data will go a long way in making the right decisions on which data elements to use or not to use for matching.

Data Quality

A very critical task that is often overlooked is Data Profiling.  I cannot emphasize enough how important it is to profile your data early on.  Data Profiling will reveal the quality of the data that you are getting from each Source System.  It is particularly vital to profile the data elements that you intend to use for matching.

The results of Data Profiling will be especially useful in identifying the anonymous and equivalence values to be considered when searching and matching.

Here are some examples of Anonymous values:

Here are some examples of Equivalence values:

  • First Name ‘WILLIAM’ has the following equivalencies (nicknames): WILLIAM, BILL, BILLY , WILL, WILLY, LIAM
  • First Name ‘ROBERT’ has the following equivalencies (nicknames): ROBERT, ROB, ROBBY, BOB, BOBBY
  • In Organization Name, ‘LIMITED’ has the following equivalencies: LIMITED, LTD, LTD.
  • In Organization Name, ‘CORPORATION’ has the following equivalencies: CORPORATION, CORP, CORP.

If the Data Profiling results reveal poor data quality, you may need to consider applying data cleansing and/or standardization routines.  The last thing you want is polluting your MDM hub with bad data.  Clean and standardized data will significantly improve your match rate.  If you decide to use cleansing and standardization tools available in the market, make sure that you clearly understand its cleansing and standardization rules.  Experience has shown that some level of customization may be required.

Here are important points to keep in mind in regards to Address standardization and validation:

  • Some tools do not necessarily correct the Address to produce exactly the same standardized Address every time.  This is especially true when the tool is simply validating that the Address entry is mailable.  If it finds the Address entry as mailable, it considers it as successfully standardized without any correction/modification.
  • There is also the matter of smaller cities being amalgamated into one big city over time.  Say one Address has the old city name (e.g. Etobicoke), and another physically the same Address has the new city name (e.g. Toronto).  Both Addresses are valid and mailable addresses, and thus both are considered as successfully standardized without any correction/modification.

You have to consider how these will affect your match rate.

Take the time and effort to ensure that each data element you intend to use for matching has good quality data.  Your investment will pay off.

Data Completeness

Ideally, each data element you intend to use for matching should always have a value in it, i.e. it should be a mandatory data element in all the Source Systems.  However, this is not always the case.  This goes back to the rules imposed by each Source System in capturing the data.

If it is important for you to use a particular data element for matching even if it is not populated 100% of the time, you have to analyze how it will affect your searching and matching rules.  When that data element is not populated in both records being compared, would you consider that a match?  When that data element is populated in one record but not the other, would you consider that a non-match, and if so, would your confidence in that being a non-match be the same as when both are populated with different values?

Applying a separate set of matching rules to handle null values adds another dimension to the complexity of your matching.

Timeliness of the Data

How old or how current is the data coming from your various Source Systems?  Bringing outdated and irrelevant data into the hub may unnecessarily degrade your match rate, not to mention the negative impact the additional volume may have on performance.  In most cases, old data is also incomplete, and collected with fewer validation rules imposed on it.  As a result, you may end up applying more cleansing, standardization, and validation rules to accommodate such data in your hub.  Is it really worth it?  Will the data, which might be as much as 10 years old in some cases, truly be of value across the enterprise?

Volume of the Data

Early on in the MDM implementation, you should have an idea on the volume of data that you will be bringing in to the hub from the various Source Systems.  It will also be worthwhile if you have some knowledge on the level of Customer duplication that currently exists in each Source System.

A fundamental decision that will have to be made is the style of your MDM implementation.  (I will reserve the discussion on the various implementation styles for another time.)  For example, you may require a Customer hub that will just persist the cross reference to the data but the data is still owned by and maintained in the Source Systems, or you may need a Customer hub that will actually maintain, be the owner and trusted source of the Customer’s golden record.

Your knowledge of the volume of data from the Source Systems, combined with the implementation style that you need, will give you an indication of the volume of data that will in fact reside in your Customer hub.  This will then help you make a more informed decision on which matching methodology will be able to handle that volume better.

Other Factors to Consider

In addition to the data-specific factors above, here are other factors that you should give a great deal of thought.

Goal of the Customer Hub

What are your short-term and long-term goals for your Customer hub?  What will you use it for?  Will it be used for marketing and analytics only, or to support your transactional operations only, or both?  Will it require real-time or near-real-time interfaces with other systems in the enterprise?  Will the interfaces be one-way or two-way?

Just like any software development project, it is essential to have a clear vision of what you need to achieve with your Customer hub.  It is particularly important because the Customer hub will touch most, if not all, facets of your enterprise.  Proper requirements definition early on is key, as well as the high-level depiction of your vision, illustrating the Customer hub and its part in the overall enterprise architecture.   You have a much better chance of making the right implementation decisions, particularly as to which matching methodology to use, if you have done the vital analysis, groundwork, and planning ahead of time.

Tolerance for False Positives and False Negatives

False Positives are matching cases where two records are linked because they were found to match, when they in fact represent two different entities.  False Negatives are matching cases where two records are not linked because they were found to not match, when they in fact represent the same entity.

Based on the very nature of the two methodologies, Deterministic Matching tends to have more False Negatives than False Positives, while Probabilistic Matching tends to have more False Positives than False Negatives.  But these tendencies may change depending on the specific searching and matching rules that you impose in your implementation.

The question is: what is your tolerance for these false matches?  What are the implications to your business and your relationship with the Customer(s) when such false matches occur?  Do you have a corrective measure in place?

Your tolerance may depend on the kind of business that you are in.  For example, if your business deals with financial or medical data, you may have high tolerance for False Negatives and possibly zero tolerance for False Positives.

Your tolerance may also depend on what you are using the Customer hub data for.  For example, if you are using the Customer hub data for marketing and analytics alone, you may have a higher tolerance for False Positives than False Negatives.

Performance and Service Level Requirements

The performance and service level requirements, together with the volume of data, need careful consideration in choosing between the two methodologies.   The following, to name a few, may also impact performance and hence need to be factored in: complexity of the business rules, transactions that will retrieve and manipulate the data, the volume of these transactions, and the capacity and processing power of the machines and network in the system infrastructure.

In the Deterministic methodology, the number of data elements being used for matching and the complexity of the matching and scoring rules can seriously impact performance.

The Probabilistic methodology uses hashed values of the data to optimize searching and matching, however there is also that extra overhead of deriving and persisting the hashed values when updating/adding data.  A poor bucketing strategy can degrade the performance.

On-going Match Tuning

Once your Customer hub is in production, your work is not done yet.  There’s still the on-going task of monitoring how your Customer hub’s match rate is working for you.  As data is added from new Source Systems, new locales, new lines of business, or even just as updates to existing data are made, you have to observe how the match rate is being affected.   In the Probabilistic methodology, tuning may include adjustments to the algorithm, weights, and thresholds.  For Deterministic methodology, tuning may include adjustments to the matching and scoring rules.

Regular tuning is key, more so with Probabilistic than Deterministic methodology.  This is due to the nature of Probabilistic, where it takes into account the frequency of the occurrence of a particular data value against all the values in that data element for the entire population.  Even if there is no new Source System, locale, or line of business, the Probabilistic methodology requires tuning on a regular basis.

It is therefore prudent to also consider the time and effort required for the on-going match tuning when making a decision on which methodology to use.


So, which is better, Deterministic Matching or Probabilistic Matching?  The question should actually be: ‘Which is better for you, for your specific needs?’  Your specific needs may even call for a combination of the two methodologies instead of going purely with one.

The bottom line is, allocate enough time, effort, and knowledgeable resources in figuring out your needs.  Consider the factors that I have discussed here, which by no means is an exhaustive list.   There could be a lot more factors to take into account.  Only then will you have a better chance of making the right decision for your particular MDM implementation.

Topics: CDI Data Deterministic matching Integration Master Data Management Match Matching mdm MDM Implementation Probabilistic Probabilistic matching

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by deeparadhakrishnan on Wednesday, Apr 24, 2013 @ 1:37 PM

Master Data Management (MDM) is no longer a “fast follower” initiative but is now a generally accepted part of any information management program.  Many enterprises have well established MDM programs and many more are at the beginning stages of implementation.  In order to be successful with MDM you need continuous insights into that master data itself and how it is being used otherwise it is impossible to truly manage the master data.  An MDM dashboard is an effective tool for obtaining these insights.

What is an MDM Dashboard?

 It is difficult to improve a process without having a means to measure that process.  In addition, it is difficult to gauge continuous improvement without being able to track performance on a regular basis.  Dashboards are a common tool that is used to communicate key measures and their trends to stakeholders on a regular basis.

An MDM dashboard provides key measures about the master data such as:

  •  Metrics and trends of how the master data is changing.
  •  Metrics and trends of quality issues, issue resolution and issue backlogs.
  •  Insights into the composition of the master data.
  •  Metrics and trends of how the master data is being used by consumers across the enterprise and their experience such as meeting or failing service level agreements.

Additionally, the MDM dashboard must highlight significant changes and provide insights into key improvement areas and risk areas as these are what need to be actioned.  For example, perhaps a sudden increase in high severity quality issues coming from a particular source system.

The stakeholders for the MDM dashboard will be broad given master data is an enterprise asset.  Stakeholders will consist of a mix of business and IT resources from a variety of areas.


Use of Dashboard

A representative from each consumer of the master data (e.g., call-center applications, e-commerce applications, data warehouses and so on)
  • They will want insights on how many transactions they executed against the MDM hub, failure rates and SLA attainment and how it compares to past periods.
  • They will want to understand trends in quality for the data they are using, or plan to use, because data quality directly impacts business outcomes.
  • They will want to understand the composition (or profile) of the data they are using.
A representative from each provider of the master data (i.e., source systems that feed the MDM hub)
  • They will want trends in quality issues for the specific data they have provided to the MDM Hub so they get insights into their own data quality and can prioritize addressing quality issues at source.
  • They will want to reconcile change metrics in their system with change metrics in the MDM hub.
Executives responsible for managing the MDM program
  • They will want insights on MDM hub operations and performance to evaluate whether the system is meeting defined SLAs for the many consumers across the enterprise.
  • They will want to understand trends in data quality and data usage to not only optimize the MDM hub, but also to justify the MDM program.
Data Governance Council members responsible for setting and measuring policies including data quality initiatives
  • They will want insights into all aspects of master data and its use, including quality trends, change trends, consumer activity etc.  However what is most important is in highlighting any significant changes from period to period so that the council can take action where required to identify and prevent potential issues before they escalate.

The frequency for producing and delivering an MDM dashboard that targets these stakeholders varies from client to client but a common time frame is monthly.  However, this does not negate the need for frequent, detailed reports delivered to other stakeholders. Daily and weekly reports, for example, are essential to the team members that are responsible for implementing the MDM program.

What are the contents of the ideal MDM Dashboard?

The business cares most about significant changes in metrics and it is those that must be highlighted.  The goal of any dashboard should not be to look at everything available but rather to look at the information that is most important and most revealing – to gain insights into what is happening within the business unit with the end goal of making better decisions and identifying and anticipating issues before they can have a negative impact on the business.  An MDM dashboard can help to identify how effective the MDM and governance programs are in meeting the needs of the organization.

Breaking down the metrics

Every metric is nice to have but not every metric is key at the strategic level.  For example, metrics which show that MDM helped increase the accuracy of customer data by 10% aren’t likely to impress the management, but metrics which show that customer retention or cross-selling rates increased as a result of MDM will.

To make the link between goals and strategy, organizations should focus on specific metrics instead of trying to measure everything that can possibly be measured.  Therefore, organizations should look at the top five to 10 items that are important to measure on a regular basis.

Standard key metrics to be captured in the dashboard include:

Key Metric


Master Data Composition A static view of the master data – very much important because data is brought together from multiple sources and this gives you insight into what your “combined view” looks like.

  • Number of master data records (eg, number of Customers, Accounts, …)
  • Number of key elements and ratios (eg, Number of Addresses and average number of Addresses per Customer, number of Customers with no Addresses, number of Customers with many Addresses and so on).
Master Data Change Provide understanding on how the master data has changed over the time.

  • Number of de-duplicated and split master data records.
  • Number of New records, Updated records for the month.  Additionally a comparison on change trends from last periods with significant variances highlighted.
  • Change for key elements of interest.  For example, New / Updated Email Addresses if there is a campaign to obtain as many email addresses as possible for direct marketing purposes.  Again, with comparisons to prior periods.
Master Data Quality Provide master data quality trends.  Quality concerns differ from one client to another but common concerns for customer master data include anonymous values in names, invalid addresses, customers that don’t fit a n expected profile (such as too many addresses) and default birth data (such as 1900-01-01).

  • Number of quality issues discovered in the reporting period (by severity and type of issue).
  • Number of quality issues resolved in the reporting period.
  • State of the quality issue backlog to be addressed.
  • Sources contributing to the quality issues.
  • Trends compared to previous periods.
Master Data Usage Provide an understanding on how the master data is being consumed.  Managed master data only has value when it is consumed and so it is important to understand who is using it and how it’s being used.

  • Top consumers of the data including SLA attainment and error rates.
  • Trends using past reporting periods with significant variances highlighted.  If a consumer’s activity spikes for one month it may indicate an issue on their side or new requirements on using the data.

MDM hub performance details that can be used for capacity planning and performance tuning.

  • Number of transactions broken down by transaction type.
  • Success versus failure rates.
  • Processing Rate (transactions per second).
  • Min, Max, Average response times and message sizes.

Semantically speaking, organizations can define their metrics in more business-oriented terms that are meaningful to stakeholders. For example, strategic metrics related to the operational effectiveness (e.g. cost measures), customer intimacy (e.g. customer retention rates), and so on. The bottom line is key metrics drive the success of the organization.

Example Uses

The following are examples of how an MDM dashboard can be used to support and optimize business initiatives.

Example 1: Reduced mailing costs in marketing campaigns

The marketing team of a retail company uses the customer and address data from its MDM hub for its direct mail campaigns. Investigations revealed there is approximately $4 in processing costs for each returned mail item plus an undetermined amount of lost revenue since the mailed item did not reach its destination and fulfill its purpose.

An MDM dashboard would provide fundamental metrics and trends on address data including:

  • Number of customers and addresses
  • Number of new and updated addresses in this period
  • Number of addresses in standard form

The dashboard would provide advanced metrics and trends including:

  • Number of addresses with quality issues broken down by severity
  • Number of addresses that are aging and have become unreliable due to data decay

The marketing team can use this information to understand trends and be more strategic in how they approach their campaigns from a cost perspective.

Example 2: Addressing quality initiatives at the source.

Many source systems don’t have quality controls and trending information on master data such as customers and products that resides in their databases.  Analyzing the master data within an MDM Hub provides a “one-stop-shop” for finding and tracking quality issues that trace back to particular source systems.  It is always best to address quality issues at source and an MDM dashboard would provide management the metrics they need to understand how quality of the data and the backlog of issues in their source systems are trending.  Likewise, it gives the MDM team insights into how source systems are contributing to the MDM effort.

Example 3: Capacity Planning

As MDM gains momentum in the enterprise, it takes on more and more consumers.  Examples of consumers are CRM systems, e-Commerce, Web Applications, source systems and data warehouses.  As with any mission critical system, it is important to ensure the MDM Hub is providing all of these consumers with high quality service. This includes (but is not limited to) providing maximum availability and the ability to fulfill transaction requests within defined service level agreements (SLAs).

It is critical then to understand transaction metrics for each consumer including:

  • Number of transactions executed
  • Types of transactions executed
  • Failure rates
  • SLA attainment rates

These metrics, along with high level trending information, can be used to plan for future capacity needs to ensure the technical resources are there to satisfy the demands placed on the MDM hub.

It also gives your data stewardship team the means to identify anomalies and items in need of investigation – for example, if a consumer’s transaction workload drastically increases one month or suddenly begins to experience an unusual number of failures.


If you want to manage it then you must first measure it.

This is no less true just because your organization has implemented MDM – how can you expect your teams to manage your master data if they have no way to measure it?  An MDM dashboard is a tool that provides the measurements to various audiences so that you can optimize your MDM program and get more business value from it.

InfoTrellis has incorporated over 12 years of experience both in MDM product development and MDM implementation in a unique MDM dashboard solution “MDM Veriscope” that provides  you with the metrics you need to manage your MDM program.

Please click here for a recent announcement (April 2013) on the release of MDM Veriscope 3.0.

Topics: data governance Data Quality master data analytics master data governance Master Data Management mdm MDM Dashboard MDM Implementation reporting

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by zahidna33m on Thursday, Mar 28, 2013 @ 10:52 AM

This is the second half of an article that I’ve divided up into two subsections: four questions you should be asking yourself and four that you should be asking a potential SI. For the full introduction and the four questions you should be asking yourself as you go through the process of SI selection, make sure you see part one.

Here are the four most important questions (in my humble opinion) to be asking about potential systems integrators as you plan out your MDM implementation, carrying over from the first four questions in the earlier article to bring our total up to eight.

5.       How closely have you scrutinized their MDM credentials?

I realize I’m stating the obvious, but it is such an important factor for a successful MDM program that failing to judge the SI on its MDM experience equates to nothing less than gambling with your organization’s invested funding. As you explore the MDM credentials of candidate SIs, there are a couple of things that should be addressed which might not be as self-evident as they seem.

Quite often MDM projects involve multiple SIs and it’s important to keep in mind that not every SI will have had the same level of involvement or contribution towards the MDM program. Be sure to seek details not only on the list of projects a SI has been involved in but also what their level of individual involvement was.

For example, when a SI refers to having worked with MDM before, it could also mean they have been involved in the front-end side of things, developing consuming applications and systems around MDM. This type of involvement, among other types of surface-level MDM involvement, is generally not enough for a team to gain the required insight and experience in designing and implementing a robust MDM solution.

Once you’ve established the SI’s experience with MDM, the next thing you need to question is whether the team they’ll be sending to work on your implementation individually reflects the same experience. When engaging with the SI, you naturally want to make sure you’re getting the A-team.

With an ever-growing MDM market, it’s understandable that not everyone on an implementation team can be an expert of many decades, but a successful MDM program does require that the people in key decision-making and managing roles be established MDM veterans.

Without experienced leadership in the team, you can expect many bumps and even the occasional dead end along the road of your implementation – and, unfortunately, these are the sorts of obstacles that tend to show up fairly late in the cycle when you’ve already spent a lot of time and money. Look for a bare minimum of true MDM experts to be assigned to your specific project – there’s no sense in paying a premium MDM SI for their extensive experience if the team they send you is made up entirely of newer hires.

6.       Will the accelerator(s) offered by the SI truly accelerate your MDM program?

If the SI you’re looking at has been involved in the MDM space for a significant amount of time, expect them to have MDM-specific accelerators that they can offer you, ideally designed to reduce the costs and improve the quality of project delivery. Almost all vendors will have some sort of line-up of accelerators to try to sell you on – the important thing is ensuring you understand what kind of accelerators will actually provide your MDM solution with true advantages.

Right off the bat you should be subjecting any proposed accelerator to questions that can validate its inherent usefulness:

  • How much time can you expect to save by using it?
  • How much cost can you expect to save by using it?
  • What will happen when the implementation is completed?
  • Is it useful only to the SI or also to you as a client?
  • Will you be charged extra to keep it over time?
  • What support is there for the accelerator?
  • Have their other clients used this accelerator? Did they benefit?
  • If it’s new and unproven, was it created to address a real business need?

Keep in mind that every MDM project is slightly different; an accelerator may have inherent value for one organization, but your MDM goals and challenges won’t be identical to theirs, and this can impact whether the price of a particular add-on is worth the expense.

A good accelerator can be of immense benefit to an implementation, but beware of sales pitches trying to push a shiny box on you that may, ultimately, prove to be empty.

7.       What sort of relationship do they have with the MDM product vendor?

MDM is a long-term investment. The consequences of each business and technical decision around your MDM solution will stay with your organization for years to come. With that in mind, it’s important to develop a solid understanding of the MDM product vendor’s future direction and stay in touch with them as that direction evolves.

Both you and your system integrator will need to ensure this alignment with the product vendor, as the SI will be making significant decisions on your behalf and their attention to and knowledge of the product vendor is extremely critical to both working with the current version and planning for what’s coming in the future.

It’s also worth noting that a SI’s willingness and ability to influence a product vendor is a key factor in resolving any product issues or enhancements.

One aspect of this relationship is having a thorough and attentive understanding of the Product Vendor’s license terms. More often than not, the license terms for a piece of MDM software are quite complex; they’re usually imposing to approach for even the strong-of-heart. Understanding the implications of architectural decisions to the license terms of your MDM software should be one of the responsibilities of your SI, and so a solid understanding these terms is an extremely important trait in the team that will be implementing your MDM solution.

For example, a product vendor might sell the product with limitations on which aspects can be used, mandating that certain parts of the model, services, or data volume must be within specified limits. Your SI should be aware of these constraints while making decisions on how to configure and customize the MDM product, or you may find yourself with unexpected charges down the road to pay for using a product beyond the terms of your license.

10.       Is their approach one that produces a solution with true longevity?

Now we come to the question that I think is the most important one to ask and the one at the heart of every other question: will this MDM solution be built to last? Any number of factors can result in an MDM implementation that is fine on the surface level but expires after only a few years. The best SIs will create a solution for you that is imbued from the ground up with best-practices designed for long-term success. This is the sign of a SI that views you as a business partner and not as a one-time customer – if anything about their approach feels transactional in nature, you should be hearing those alarm bells going off in your head.

Successful and effective MDM programs require the level of discipline and rigour practised by product development teams. Most MDM programs mature over a number of releases and the organization builds its trust in the MDM solution based on the success of those releases. Consuming applications, which can be considered the clients of an MDM system, expect the reliability and repeatability of a fully-fledged “product” in every MDM release. This is only achieved by running the MDM program with the same standards found in a product development process.

Additionally, most businesses and architectural decisions made during an MDM implementation require the foresight and anticipation to ensure any services and data offered by the MDM solution will be of use not just to immediate consumers but also to consumers that may come on board further down the road.

Product development teams make such anticipatory decisions every day; many system integration teams, however, have little experience with or foundation in product development. This results in two significantly different methodologies driving superficially similar vendors: product development style teams put a lot of emphasis on documentation, automation, testing, configurability, and re-usability  where SI teams without the influence of product development ancestry will traditionally place less emphasis on these aspects of the software.

As a client, targeting vendors with a background in product development is one easy way to ensure that the solution you’re purchasing will be built with an inherent robustness designed to stand the test of time as well as scale with your organization.

This points to an overall consideration that you should be looking for in any potential MDM implementation partner: a willingness to work with you to understand your long-term strategic vision for your MDM. Your MDM implementation partner should be applying their experience to your specific needs, and not trying to redefine your needs to better fit their experience. When it comes to MDM, there is no such thing as a “one size fits all” solution, and the good system integrators will be able to quickly identify the commonalities and the differences between your organization’s unique goals and the goals of the projects they’ve been part of in the past. Their job is not to create The Perfect MDM Solution – it’s to create the MDM solution that is perfectly tailored to you.

This is even more important if you’re not starting from a place with clearly established MDM goals to begin with. While I strongly urge you not to even begin looking for your implementation partner until you have these goals clearly defined, I do recognize that the sad reality is that many clients will invest in putting a MDM solution in place without fully understanding the potential advantages it presents for their business and the potential challenges it will pose. Trustworthy advice from an expert can mean the difference between success and failure in a situation like this. Consultants who’ve worked on dozens of different MDM projects, both successful and unsuccessful, can provide an informed perspective that no amount of theoretical prediction can match.

Closing Remarks

This is my ultimate piece of advice for you in your search for the perfect SI to do your MDM implementation: Do not shy away from investing in a small project to assess the true capabilities of your SI partner’s team before making a long term commitment. The scope of work can range from establishing a high-level architecture, developing a POC to explore some specific features of a MDM tool, assessment of your current MDM program etc. Such a project doesn’t have to be many months long   – a few weeks of working with a potential SI can be all you need to have complete confidence that they’re the ideal partner to help you with your MDM implementation. Not only will it help you to understand the SI’s strengths and weaknesses, it will also give you a good picture of where you stand as an organization with your MDM program.

Do your homework before you make decisions and you’re setting yourself up for a world of success with your MDM program.

Topics: Industry Integration mdm MDM Implementation SI Selection

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by zahidna33m on Friday, Mar 22, 2013 @ 10:32 AM

I’d like to start right off the bat by making it clear I’m writing from a somewhat biased perspective: as the CTO and one of the co-founders of a company operating within the MDM system integration sphere, my feelings on what makes for a strong partnership between an organization and their SI are obviously grounded in my personal and professional practices.

That said, I feel confident that my admittedly not-quite-impartial perspective is still one with a few nuggets of wisdom worth imparting. I’ve been working in the MDM space for over a decade now and I’ve seen my fair share of successful as well as botched MDM projects. It hurts everybody in the information management industry when an organization’s investment doesn’t provide them with the results they were assured would eventually manifest. The client’s time and money is wasted, the promise of the value of better data management loses credibility, and at the end of the day nobody wins.

My hope is simply this: that by providing a few insider tips on what questions you should be asking when deciding on which SI to select for your implementation, you’ll be better positioned to navigate the different choices available to you and to plan for the highest level of success right from the beginning.

I’ve divided these eight questions up into two subsections: four questions you should be asking yourself and four that you should be asking a potential SI. I’ve broken the two sections into two different articles and I’ve saved the most important question of all for last, so make sure you see both part one and part two.

 1.      MDM is a big investment – should you get a big SI Vendor to implement it?

System Integrators come in all shapes and sizes. When it comes to implementing a Master Data Management program, bigger isn’t necessarily better.

Based on my years of experience working in the industry (often times working with a client to help recover from a failed or failing MDM project) I’ve found that the bigger the SI, the lower the odds of a successful implementation. MDM programs don’t benefit from having armies of people involved in their implementation; the best implementations I’ve seen done have been at the hands of an experienced MDM SWAT team with a smaller number of highly specialized members.

For the overwhelming majority of MDM programs, the ideal number of people assigned to the project never goes above twenty-five people, and for small- and medium-sized programs you’re looking at somewhere between ten and fifteen team members.

So be warned if you’re in discussion with a potential SI and they’re proposing team sizes much bigger than this – armies are inefficient by their very nature, and an army of people all working on one MDM implementation will generate more meaningless churn than actual productivity.

2.      Have you been led to believe you’ll have to compromise on either expertise or flexibility?

MDM solutions are not silos. The entire vision behind MDM as a way to improve and enhance a business is to bring down those silos and unify an organization’s data across all of its systems.

In order to accomplish this, you’re going to need some serious integration – sometimes several different types of integrations just to make one MDM implementation work. The vendors you’re looking at should be capable of end-to-end solution, which means having the proper balance of skills that encompass integration ability. Bigger system integrators will tend to have better coverage across multiple tools, but a lack of core MDM tool-specific skills can play a big role in adding to your overall project risks. On the other hand, some small niche SIs lack the range and flexibility of these bigger vendors, meaning that they don’t have the capabilities to staff the wide range of necessary roles. This can even result in a vendor dictating their client’s tool choice based on their own skill-set rather than on the needs of the client.

In a quick example of this, we saw one client that already had an ETL tool in place and a large in-house team supporting it, but (at the urging of the vendor) was considering using a competing tool for their MDM implementation that would have left the client’s established team completely out of the work stream, wasting a huge amount of valuable human resources.

A vendor limited in either their range of skills or their level of experience will produce less than optimal results; if you set out expecting to have to compromise, you will, to nobody’s surprise, end up compromising somewhere.

As a client, you have every right to expect a breadth and depth of expertise encompassing multiple integration tools.

In order to make full use of your MDM investment, you should be trying to make sure that not only is your MDM rightly configured and implemented, the various integrations into and out of MDM are properly implemented as well. To accomplish that, you really need a team that has expertise in both core MDM and the other technologies that form part of the MDM ecosystem.

3.       Are you prepared to hire a generalist in order to work with a company that you have a pre-established relationship with?

The inclination to work with someone already familiar to you is a tough habit to break yourself of. We all do it in our day-to-day lives, and sometimes it’s a perfectly reasonable impulse to turn to whoever you’ve known the longest or worked with the closest for something you think they should be able to do. Unfortunately, for MDM this isn’t the case – just because a company has been working with you on other projects doesn’t automatically make them the vendor best suited to an MDM implementation.

Let me put it this way: while your family doctor who has known you for many years might be the right person to diagnose a heart condition, the surgical procedure itself is best performed by a specialist.

MDM is a large and impactful investment – a tier-1 system in most companies. Having acquired costly licenses for the software, it seems illogical to trust your implementation to anybody who doesn’t have established expertise with that software.

Push yourself outside of your comfort zone and look beyond your immediate contacts for a true specialist; you’ll be thankful for it later when your MDM program is running beautifully and your trust in your other IT vendors, who likely specialize in very different areas, remains intact.

4.       Are you calculating costs by unit price or by overall cost?

As in any large purchase, cost will always be a factor in MDM SI selection. When you’re stacking up the potential costs of one SI against another, you should ensure you’re comparing the total cost of implementation and ownership and not the unit cost per resource. You’d be amazed how many clients fall into this trap.

Shopping for your MDM implementation partner is not like shopping for groceries. Going by the unit cost of paper towels or laundry detergent can be a great way to make sure you’re getting the best deal and the most value for your money. The reason this works is because you will always need more of this product, it’s not going to go bad if you buy in bulk, and the difference in quality between products doesn’t have much of an impact. Approaching your MDM vendor selection with this philosophy is wasteful and dangerous – it’s easy to be fooled into thinking you’re getting the cheaper deal when in reality the vendors with the lowest resource unit costs are often the ones that need to employ a far bigger team and take more time to get the same job done.

Lower unit cost doesn’t mean lower project cost. Your expenditures on the implementation can balloon quickly when you add the multipliers of larger teams, longer project duration, and – worst of all – a combination of the two.

Other aspects of cost that can be hard to determine at the front-end include rework due to bad design decisions, unnecessary customizations to the MDM tool, and expensive support work due to incorrect or inconsistent implementations, just to name a few. The absolute worst case scenario is that all the millions of dollars invested in a MDM program end up in the garbage when the solution can’t be stabilized, the client team feels unable to trust the MDM hub, and the whole program gets shelved or discarded.

It’s also worth noting that the sooner you get your MDM system in production the sooner you can start increasing your revenue and decreasing your marketing costs. As you’re making cost comparisons, try to factor in the opportunity cost of having a system in place sooner rather than later.

So you’ve asked yourself these essential questions and you’re ready to move on to grilling your potential SI partners. Stay tuned for the other half of the eight questions that you should be asking, these ones directed at the SIs.

Topics: Integration Master Data Management mdm MDM Implementation SI Selection

Leave a Reply

Your email address will not be published. Required fields are marked *