Posted by sathishbaskaran on Tuesday, May 12, 2015 @ 9:43 AM

MDM BatchProcessor is a multi-threaded J2SE client application used in most of the MDM implementations to load large volumes of enterprise data into MDM during initial and delta loads. Oftentimes, processing large volumes of data might cause performance issues during the Batch Processing stage thus bringing down the TPS (Transactions per Second).

Poor performance of the batch processor often disrupts the data load process and impacts the go-live plans. Unfortunately, there is no panacea available for this common problem. Let us help you by highlighting some of the potential root causes that influence the BatchProcessor performance. We will be suggesting remedies for each of these bottlenecks in the later part of this blog.

Infrastructure Concerns

Any complex, business-critical Enterprise application needs careful planning, well ahead of time, to achieve optimal performance and MDM is no exception. During development phase it is perfectly fine to host MDM, DB Server and BatchProcessor all in one physical server. But the world doesn’t stop at development. The sheer volume of data MDM will handle in production needs execution of a carefully thought-out infrastructure plan. Besides, when these applications are running in shared environments Profiling, Benchmarking and Debugging become a tedious affair.

CPU Consumption

BatchProcessor can consume lot of precious CPU cycles in most trivial of operations when it is not configured properly. Keeping an eye for persistently high CPU consumption and sporadic surges is vital to ensure CPU is optimally used by BatchProcessor.

Deadlock

Deadlock is one of the frequent issues encountered during the Batch Processing in multi-threaded mode. Increasing the submitter threads count beyond the recommended value might lead into deadlock issue.

Stale Threads

As discussed earlier, a poorly configured BatchProcessor might open up Pandora’s Box. Stale threads can be a side-effect of thread count configuration in BatchProcessor. Increasing the submitter threads, reader and writer threads beyond the recommended numbers may cause some of the threads to wait indefinitely thus wasting precious system resources.

100% CPU Utilization

“Cancel Thread” is one of the BatchProcessor daemon threads, designed to gracefully shutdown BatchProcessor when the user intends to. Being a daemon thread, this thread is alive during the natural lifecycle of the BatchProcessor. But the catch here is it hogs up to nearly 90% of CPU cycles for a trivial operation thus bringing down the performance.

Let us have a quick look at the UserCancel thread in BatchProcessor client. The thread waits for user interruption indefinitely and checks for the same every 2 seconds once while holding on the CPU all the time.

Thread thread = new Thread(r, “Cancel”);

thread.setDaemon(true);

thread.start();

while (!controller.isShuttingDown()) {

          try

          {

            int i = System.in.read();

            if (i == -1)

            {

              try

              {

                Thread.sleep(2000L);

              }

              catch (InterruptedException e) {}

            }

            else

            {

              char ch = (char)i;

              if ((ch == ‘q’) || (ch == ‘Q’)) {

                controller.requestShutdown();

              }

            }

          }

          catch (IOException iox) {}

        }

BatchProcessor Performance Optimization Tips

We have so far discussed potential bottlenecks in running BatchProcessor at optimal levels. Best laid plans often go awry. What is worst is not having a plan. A well thought out plan needs to be in place before going ahead with data load. Now, let us discuss some useful tips that could help to improve the performance during data load process.

Infrastructure topology

For better performance, run the MDM application, DB Server and BatchProcessor client on different physical servers. This will help us to leverage the system resources better.

Follow the best thread count principle

If there are N number of physical CPUs available to IBM InfoSphere MDM Server that caters to BatchProcessor, then the recommended number of submitter threads in BatchProcessor should be configured between 2N and 3N.

For an example, assume the MDM server has 8 CPUs then start profiling the BatchProcessor by varying its submitter threads count between 16 and 24. Do the number crunching, keep an eye on resource consumption (CPU, Memory and Disk I/Os) and settle on a thread count that yields optimal TPS in MDM.

 

You can modify the Submitter.number property in Batch.properties to change the Submitter thread count.

For example:

Submitter.number = 4

Running Multiple BatchProcessor application instances

If MDM server is beefed up with enough resources to handle huge number of parallel transactions, we should consider parallelizing the load process by dividing the data into multiple chunks. This involves running two or more BatchProcessor client instances in parallel, either in same or different physical servers depending on the resources available in that server. Each BatchProcessor application instance here must work with a separate batch input and output; however they can share the same server-side application instance or operate against a dedicated instance(each BatchProcessor instance pointing to a different Application Server in the MDM cluster). This exercise will increase the TPS and lower the time spent in data load.

Customizing the Batch Controller

Well, this one is a bit tricky. We are looking at modifying the OOTB behavior here. Let us go ahead and do it as it really helps.

  • Comment out the following snippet in runBatch() method ofjava

  //UserCancel.start();

  • Recompile the BatchProcessor class and keep it in the jar
  • Replace the existing DWLBatchFramework.jar, present under <BatchProcessor Home>/lib with this new one which contains modified BatchController class
  • Bounce the BatchProcessor instance and check the CPU consumption

Manage Heap memory

Memory consumption may not be a serious threat while dealing with BatchProcessor but in servers that host multiple applications along with BatchProcessor the effective memory that can be allocated to it could be very low. During the data load process if high memory consumption is observed then allocating more memory to BatchProcessor helps to ensure a smooth run. In the BatchProcessor invoking script (named as runbatch.bat in Windows environments and runbatch.sh in UNIX environments), there are couple of properties that control the memory allocated to the BatchProcessor client.

set minMemory=256M

set maxMemory=512M

It is recommended to keep the minMemory and maxMemory at 256M & 512M respectively. If the infrastructure is of high-end, then minMemory and maxMemory can be increased accordingly. Again, remember to profile the data load process and settle for optimal numbers.

Reader and Writer Thread Count

It is recommended by IBM to keep the Reader and Writer Number thread counts as 1. Since, they are involved in lightweight tasks this BatchProcessor configuration should suit most of the needs.

Shuffle the data in the Input File

By shuffling the data in the input file,  the percentage of similar records (records with high probability of getting collapsed/merged in MDM) being processed at the same time can be brought down thus avoiding long waits and deadlocks.

Scale on the Server side

Well, well, well. We have really strived hard to make BatchProcessor client to perform at optimal levels. Still, poor performance is observed resulting in very low TPS? It is time to look into the MDM application. Though optimizing MDM is beyond the scope of this blog let us provide a high-level action plan to work on.

You can either:

  1. Increase the physical resources(more CPUs, more RAM) for the given server instance
  2. Hosting MDM in a clustered environment
  3. Allocating more application server instances to the existing cluster which hosts MDM
  4. Having dedicated cluster with enough resources for MDM rather than sharing the cluster with other applications
  5. Logging only critical, fatal errors in MDM
  6. Enabling SAM and Performance logs in MDM and tweaking the application based on findings

Hope you find this blog useful. Try out these tips when you are working on a BatchProcessor data load process next time and share how useful you find them. I bet you’ll have something to say!

If you are looking at any specific recommendations on BatchProcessor, feel free to contact sathish.baskaran@infotrellis.com. Always happy to assist you.

Topics: InfoTrellis Master Data Management MasterDataManagement mdm mdm hub MDM Implementation
Posted by Jan D. Svensson on Monday, Nov 10, 2014 @ 12:47 PM

I often become involved in an organization’s MDM program when they’ve reached out to InfoTrellis for help with cleaning up after a failed project or initiating attempt number X at achieving what, to some, is a real struggle. There can be a lot of reasons for a Master Data Management implementation failing, and none of them are due to the litany of blame game reasons that can be used in these scenarios.  Most failures arise from common problems that people just were not prepared for.

Let’s examine some of the top reasons MDM implementations fail. In the end they probably won’t surprise you, but if you haven’t experienced it yet you will be better prepared to face them if they happen.

Underestimating the work

I am starting with this one because it leads to many of the others, and is a complex topic. It seems like a simple thing to estimate the work but there are a lot of aspects to an MDM project that aren’t obvious that can severely impact timelines and your success.

“It’s just a project like any other”

Let me start by saying MDM is not a project, it’s a journey, or at the very least a program.

Most organizations thinking about implementing MDM are large to global companies. Even medium sized companies that started small and experience growth over time have the same problems as their global sized piers.  While the size of the chaos in a global company may seem much larger, they also have far more resources to throw at the problem than their smaller brethren.

If we stick to the MDM party domain as a point of reference (most organizations start here with MDM), the number of sources or points of contact with party information can be staggering. You may have systems that:

  • Manage the selling of products or services to customers
  • Manage vendors you deal with or contract to
  • Extract data to data warehouse for customer analytics and vendor performance
  • HR systems to manage employees who may also be customers
  • Self-service customer portals
  • Marketing campaign management systems
  • Customer notification systems
  • Many others

A lot of large organizations will have all of these systems, each having multiple applications, and often multiple systems responsible for the same business function. So by now you are probably saying, yes I know this, and…?  Well your MDM “project” will need to sit in the middle of all of this, and in many cases since many of these systems will be legacy mainframe based systems, you will need to be transparent as these systems won’t be allowed to be changed.

MDM can be on the scale of many of the transformation programmes your organization may be undertaking to replace aging legacy systems and moving to modern distributed Service Oriented Architecture based solutions.

Big Bang Never Works

Now that we have seen the potential size of your MDM problem, let me just remind you that you can’t do it all at once. Sure you can plan your massive transformation programme and execute it – but if you have ever really done one of these, you know it’s a lot harder than it seems and that the outcome is usually not as satisfying as you expected it to be.  You end up cutting corners, blowing the budget, missing the timelines, and de-scoping the work just trying to deliver.

What is one of the typical reasons this happens on your MDM transformation project?

You Don’t Know What You Don’t Know

You have all these systems you are going to integrate with and in many cases you are going to need to be transparent in that those systems may not know they are going to be interacting with your new MDM solution. You are going to need to know things like:

  • What data do they use?
    • How often?
    • How much?
    • When?
  • Do they update the data?
    • How often?
    • How?
    • What?
  • Do they need to know about changes made by others?
    • How often is the change notice required?
    • Do they need to know it’s changed, or what the change was?

This type of information seems pretty straight forward. I haven’t told you anything you probably didn’t know, but, when you go to ask these questions, the answer you will mostly likely often get is:

“I don’t know.”

Ok, so the documentation isn’t quite up to date, (I am being kind), but you are just going to go out and find the answer. Which leads to the next problem.

Not Enough Resources

So this is an easy problem to solve. I’ll hire some more business analysts, get some more developers to look at the code, get some more project managers to keep them on track.  Seems like a plan, and on the surface it looks like the obvious answer, (ignoring how hard it is to locate available quality IT people these days), but these aren’t the resources that are the problem.

You don’t have enough SMEs.

The BA’s, developers and others are all going to need time from your subject matter experts.  The subject matter experts are already busy because they are subject matter experts.  There typically aren’t enough of them to go around, and if you have a lot of systems to deal with, you are facing a lot of IT and business SME’s.

What your SMEs bring to the table is intellectual property. Intellectual property is critical to the success of your implementation.  You will need the knowledge your SMEs bring on your various systems, but there is another kind of intellectual property that you are going to need and can be tied to a very lengthy process.

Data Management through Governance

In order to be able to master your information, you will need to amalgamate data from multiple sources and both the meaning and the use of that information will need to be clearly defined. What may appear to be the same information from one source may have a different meaning.  Data governance is a key requirement to be able to establish the enterprise data definitions that are crucial for your master data.  Even in mature environments this can be a challenging task and can consume significant time and resources.

Data governance may seem like a problematic and time consuming exercise but it is an effective tool to use against one of the other major hurdles you will face in trying to establish a common set of master data.

That’s My Data

Many organizations are organized into silos. The silos are designed to look after their own interests, funded to maintain their business goals and competitive for resources and funding.  While the end goal of any organization is the success of the organization, the silo measures its success in terms of itself.

An MDM implementation is by nature at odds with the silo based organization as master data is data that is of value to a cross section of the business and thus spans silos. The danger in many organizations is that a particular silo has significantly more influence than another, often laying with the revenue generating lines of business.  This over balance of power can easily lead to undue influence on your master data implementation, making it just another project for division X, instead of an enterprise resource to be shared by all.

Data governance is one of the key factors to help keep this situation in check. Your data governance board will be comprised of representatives from all stake holders, giving equal representation to all.  The cross organizational nature of data governance is also the reason that decisions can be a difficult and lengthy process as it requires consensus across all the silos.

Aside from enterprise data definitions, another important aspect of master data management is the establishment of business rules.

Too Many Rules

The business and data governance will need to be involved to establish business rules for:

  • ETL processes to loading data into your MDM application
  • Updates to information from multiple sources
  • Matching rules
  • Survivorship rules

The establishment of rules is designed to address one of the big problems MDM is meant to solve: data quality. Organizations will want to manage both data quality on load and ongoing data quality.  One of the big mistakes often made is to try and introduce too many rules right away.

The use of too many rules early on can have a significant impact on the initial data loads into your MDM solution. You are ready for production and most likely getting your first crack at live data to only find out vast numbers of records are being rejected due to your business rules.  Your data loads have now failed and you need to go back and rethink your rules, revise your ETL process and try again.

You finally get your data loaded and your consumers have arrived to start to use the data and your legacy transactions are failing. Why are they failing? Because the application isn’t validating the input according to your business rules, or collecting enough information to satisfy the rules.

Of course there is one way you could reduce this risk, but it often isn’t done well enough and sometimes isn’t done at all.

What Profiling?

Data profiling is the one task that is critical to understanding what your data looks like and what you need to plan for. There are often many barriers to profiling because your party master data will likely contain personally identifiable information (PII) and access will be restricted for security reasons.  You have to overcome these barriers because data profiling is the only way to foresee the gotchas that are going to put you far off track down the road.

Data profiling can be a significant task as each source system needs to be profiled. As you learn more about your data you will have more questions that need to get answered.  All this profiling takes time and most likely needs the time of specific resources as they are the only ones that have access to the information you require.  (There’s that resource problem again.)

Project Management is my Problem?

So far you haven’t heard any magical reasons as to why your MDM implementation should fail. In fact  many of the problems seem to be tied to the typical reasons any IT project can fail:

  • Underestimating the work
  • Not enough resources
  • Trying to do too much at once (including scope creep)
  • Time required for discovery

An aspect of an MDM implementation that may be a little non typical includes the need for data governance. Data governance not only gives you the enterprise view of the information you are trying to master, but can also be an effective way of dealing with competing agendas between silos.

Data governance is also one of the key success actors for the ongoing success of your implementation. Since MDM is a journey not a project, longevity is a characteristic of a successful implementation.  Once you have delivered your foundation, the succeeding phases will build upon the base and provide more coverage of your master data.  To ensure the ongoing success of your implementation you will need the support of data governance, to ensure that new systems and upgrades to existing systems use the master data and don’t just create islands of their own.

In the past we tried to achieve what master data management promises today, but with a lack of controls and governance, we ended up with the data sprawl we are trying to correct with MDM. Once the project is over, the role of master data management does not end.  It is important to recognize that you must establish the processes and rules to not only create the master data store, but also to maintain it and integrate it into your systems.  Master data management is not about the installation and configuration of a shiny new software product.  The product is an enabler making the job easier.  The establishment of rules, governance processes and enforcement are what will bring you success.

One final thing that every master data management implementation requires, and you are pretty much doomed to failure without, is strong executive sponsorship. Your MDM implementation is going to take years.  You will require consistent funding and support to be able to take the journey and only an executive can bring that level of support.  Organizations that are organized into silos often don’t play well together, and while data governance can help in this situation, the time may come when a little intervention is required to ensure things keep moving in the proper direction on the expected timelines.

Your executive is a key resource in and out of the board room.  In the board room you will need t champion that has the vision of what your MDM implementation is going to bring to the organization, and keep the journey progressing over time.  Out of the board room you will be faced with competing agendas, data hoarding, shifting priorities, and silos trying to work together.  The executive influence here can be used to make sure that everyone continues to work towards the common goal, and provides the resources required to achieve the gaols in a reasonable time line.

Topics: master data governance Master Data Management mdm mdm hub

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by miklostomka on Thursday, Jan 31, 2013 @ 9:46 AM

Organizations spend millions of dollars to implement their MDM solution. They may have different approaches (batch vs. real time; integrated customer view vs. integrated supplier view etc.) – but in general they all expect to get a “one version of the truth” view by integrating different data sources and then providing that integrated view to a variety of different users.

After the completion and successful testing of the MDM implementation project, companies sit back and enjoy the benefits of their MDM hub – and more often than not don’t even think about looking under the hood. It never occurs to them that they could be trying to gain insights into what’s happening inside that MDM hub by asking questions like

–          How is the data quality changing?

–          What are the primary activities (in processing time) inside the MDM hub?

–          How are service levels changing?

However, organizations change, people change, requirements change – impacting what is happening inside the MDM Hub. Such changes can open up significant opportunities for an organization – but without doing any sort of investigation that opportunity is typically not recognized.

Here are two examples – diagnosed through the use of an MDM audit tool:

–          The company’s MDM Hub had approximately 100,000 incorrect customer addresses. These addresses were used for regular mailings; the mailings generated (in case of correct address) incremental revenues. Impact on the business related to just one mailing:

  • $400K wasted on the mailing cost ($4 is the conservative mailing cost per person – for postage, printing of the mailer etc.)
  • $100K of immediately lost revenues (as past data shows that one in 50 customers spends about $50 immediately following the mailing)
  • The longer term revenue lost was not assessed, but was estimated to be well over $400K
  • The opportunity: Cost saving of $400k and revenue increase of $500K or more

–          At a different company, by analyzing data processed by week the resulting report was able to determine that the number of new customers processed was declining by 1-2% every week – starting about 6 weeks before the audit was conducted. A deeper review of the audit report suggested that

  • The original service levels related to customer file changes had been getting worse and worse over that same time period
  • As customer file changes (as per the audit report) took over 85% of the total processing time, the slower processing lead to less time available for new customer processing
  • This initial diagnostic was confirmed by the client – they had a slowly growing backlog of new customer files
  • Ultimately the audit was able to highlight which input data source had been causing the slowdown, allowing the company to resolve the problem at its source
  • Business impact: a major risk (very significant slowdown in new customer set up) was eliminated before it became a real problem

In both examples: the MDM Data Governance team was recognized for identifying a major drain of resources (incorrect addresses) or for avoiding a major risk (new customer backlog)

We take our cars for regular maintenance; we go to regular medical and dental check-ups; when was the last time you had a thorough analysis of your MDM Hub?

Several leading MDM technology companies have developed very advanced tools with the ability to provide ongoing MDM monitoring capabilities. Has your organization implemented any of these tools?

Standard features offered by these solutions include:

System Reports:

–          Data Load Results

–          Transaction Metrics Summary and details by Transaction Category or Transaction

–          SLA Attainment by Transaction Category or Transaction

–          Performance Analysis

Business Reports:

–          Data Composition

–          Changed Data Trend Analysis

–          Data Quality, Quality Extract or Quality Trend Analysis

–          Inactive Data

Automatic alerts:

–          Identify areas of immediate need that should be investigated

The tools can often also provide custom reports to answer important business questions like:

–          Across of all product / service groups, which customers are from certain geographies (in some industries this is  required by regulatory bodies)

–          Across of all product / service groups, which customers have certain characteristics ( for example more than 2 addresses) which could indicate fraud?

Why hesitate to do your MDM deep dive? The immediate payoff is the ability to improve the business results of your organization. Your leadership team will thank you for it!

Companies typically do not analyze what is happening inside their MDM Hub. What is your company doing? InfoTrellis is conducting a survey to determine what percentage of companies do MDM Hub audits or deep dives – and the survey results will be shared with all participants.

Qualified respondents will be entered into a draw, the winner of which will receive an Apple iPad Mini for their personal use and a complimentary scan of their organization’s MDM Hub performance and data quality. (Note: the MDM Hub analysis is run on system log files, ensuring the confidentiality of your customer and product data.)

Click here to take the five minute survey on your MDM Hub monitoring process.

The survey will be open until March 15, 2013. The draw will take place on March 20, 2013 and the survey results and draw winners will be announced by March 25, 2013.

To learn more about the solutions InfoTrellis has developed for MDM Hub analysis, feel free to contact our specialists directly via veriscope@infotrellis.com.

Topics: data governance Data Quality master data analytics master data governance master data reporting mdm mdm hub

Leave a Reply

Your email address will not be published. Required fields are marked *