Posted by Purnima Borate on Tuesday, May 24, 2016 @ 2:40 PM

Data Governance is the process of understanding, managing and making the critical data available with the goal to maximize its value and to ensure compliance.

InfoTrellis’ Data Governance Methodology follows a multi-phased iterative approach with 4 stages – Initiate, Define, Deploy and Optimize. This article is the second part of the Data Governance Methodology series by InfoTrellis. The first part of this series – Initiate your Data Governance – listed the essential foundations of successful Data Governance program.

‘Define’ stage primarily deals with defining effective Policies to address Data Governance issues. This article lists the important considerations of this stage.

Understand your Data Governance problem

Understand your Data Governance problem

Detailed investigation to understand the root cause of problem is essential to identify and solve Data Governance issues. For instance, a revenue amount discrepancy in financial report may look like a calculation error in first glance. Upon deep analysis, it could be revealed to be the result of interpreting the same business term, revenue, differently by different users which led to users applying different logic to arrive at the monthly figure.
Once we know the root cause of problem, it is important to categorize it. From our experience, categorizing a business problem into Data Domain Management, Business Process and Data Management Governance areas act as high level guides to understand the nature and scope of Data Governance problem. For example the revenue discrepancy problem mentioned above can be categorized into Finance data domain belonging to Accounting business process and Metadata Management Governance area. This helps to focus on the problem with the correct perspective.

Assemble the team to define Policies

Assemble the team to define policies

Data Governance is a wide domain and requires varied skillset. For instance, Metadata management skills are different from Data Retention skills. Categorizing the business problem as mentioned above also helps in identifying the required skillset to resolve the issue. From our experience, a dynamic team composition based on the nature of business problem works the best. Typical members of this team are Data owners and Architect of pertinent IT/Business system, Business Data Stewards and Technical Data Stewards who understand the business domain and the mapped Data Governance area.

Define the Policies, Standards and Processes

Policy is generally a high level statement that describes how you would tackle issues or plan actions for the Data Governance area. For the revenue discrepancy problem, you can frame a policy that states – We must define all Business terms in Metadata Repository that can be accessed by all users of Business terms. Metadata Repository must map technical metadata, business rules and data lineage.

A Policy can be broken down into one or more Standards. For the policy mentioned above, you can have following Standards –
1. Business Glossary must be developed to maintain definition of all business terms.
2. Sensitive and Private data must be marked or categorized appropriately in Glossary.
3. Technical Metadata of data attributes in databases must be mapped to Business terms in Glossary.

A Standard could be broken down into one or more Processes. Typically Processes are implemented using an IT tool or program by IT implementation team. For the Standard 1 mentioned above, you can have the following processes –

1. For existing Business terms, import from Excel files into Glossary; for duplicate terms, resolve conflict and retain one instance of each unique term
2. For new Business terms, create the term and its definition in Glossary
3. Create Collections to group associated Business terms.

Select the tool – Some Policies would need an IT tool for implementation. The Enterprise Architect assigned to the Data Governance program can suggest the tool to be used based on Enterprise IT standards, existing tools in enterprise, future usage of tool, Data Governance” maturity of the tool and skillset of team. It is a best practice to select and get approval for the Data Governance Framework and Tools by IT office at enterprise level and make it a standard tool for addressing a specific domain of Data Governance Solutions. This ensures usage of uniform tools in enterprise to address common set of problems.

In conclusion, there are many variations to how teams would be setup and Policies would be defined. Keeping the above points in mind would help the enterprise to formulate the team with skillsets required to define effective Policies.

Stay tuned for Part 3 of this 4 part series on Data Governance from InfoTrellis. In the meanwhile, please send us a note with your queries and feedback.

Topics: data governance Master Data Management

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by infotrellislauren on Tuesday, Jul 16, 2013 @ 11:40 AM

Everybody, it seems, is getting onto the social media bandwagon. You can’t get far into any discussion about information management or marketing without it coming up, and it’s fascinating to see the emerging best practices and strategies behind social media products and consulting groups.

Here are five lessons from over a decade of working with Master Data Management, a much older piece of data-wrangling technology, that will serve any marketing or IT professional well as they navigate the social media technology landscape.


1. Huge Investments are a Tough Sell

I’m going to assume if you’re reading this that you see value in social media marketing, or else you see the potential for value. If you’re looking to leverage social media for your organization at a scale and level of sophistication higher than a summer intern firing off tweets now and then under the corporate handle, you’re going to have to actually spend money – and in an organization, that can be easier said than done.

Master Data Management teaches a very simple lesson on the subject of talking to your executives about a wonderful, intangible solution that will surely provide ROI if they can find it in themselves to approve the needed budget. The lesson is this: the bigger the price tag, the harder time you’ll have convincing a major decision maker it’s a necessary or worthwhile investment.

Often with MDM the more it’ll cost to implement, the more fantastic of an impact it will have on the data within the business. With social media, that’s a little harder to prove. It doesn’t help that there are more “social media marketing solutions” out there than you can shake a stick (or a corporate credit card) at.

If your executive doesn’t have time for your technobabble pitch for a million dollar overhaul, try wiggling your foot into the door by starting small without a lot of commitments. For MDM, that’s a proof-of-concept, and there’s no reason that can’t be applied to social media marketing. Consider starting off with something that is subscription based (my more IT-minded colleagues would refer to this as “software as a service” or “SaaS”) to give your management the confidence that if they aren’t seeing returns, they can just turn off the subscription and stop spending money on it.


A high level dashboard application is an ideal place to start.


This is your social media marketing proof-of-concept – if your initial test run gets you great results, that’s a good sign that your organization is part of an industry that stands to really benefit from a bigger, more expensive social media based project. Maybe even something that involves the term “big data”, but let’s not run before we walk.


2. Consolidated Records Mean More Accurate Information

This is the core premise of Master Data Management as an information management principle: you want there to be one copy of an important record that consolidates information from all its sources in the organization, containing only the most up to date and accurate data. It’s a simple but powerful idea, the philosophy of combining multiple copies of the same thing so that you only have one trustworthy copy, and then actively preventing new duplicates from cropping up.

The same thing applies to social media, especially when we’re talking about the users as actual human beings and not as individual accounts across multiple channels. Face it, we’re not interested in social media as an abstract concept – we’re there for the people using it.

(Which is why I love to cite this actual exchange between an older gentleman of a CEO and his marketing manager that goes something like: “I don’t get Twitter. I don’t use it, I don’t want to use it, I don’t personally know anybody that does use it, and I think it’s stupid.” “I agree. I honestly think it’s stupid too – but that doesn’t change the fact that 90% of our customer base uses it, and that’s why we need to pay attention to it.”)

So we’re there for the people – why on earth would we approach gathering and visualizing metrics and data on user accounts instead of people? Should we treat the Facebook, Pinterest, Twitter, LinkedIn and Tumblr account of one individual as having the weight of five individual voices?

What you really want to be looking for is a solution that matches and combines users across multiple channels. This isn’t quite the same process that it would be as part of MDM – this is new ground here that needs to be broken, and if you want to figure out that a Facebook user is the same person as a Twitter user, you need to be a little more creative than just checking to see if they have the same name.

With access to less traditional data (like a phone number or an address) it takes a bit of new technology combined with new approaches to match social media accounts accurately. I won’t bother getting into the details here, but suffice to say it’s something that today’s technology has the ability to do and a couple of companies are actually offering it. It seems perfectly logical to me that if you’re going to seriously use social media, especially in any sort of decision making process, you need to have a consolidated view of each user instead of a mishmash of unattributed accounts, which would, without a doubt, skew your numbers one way or another.

I’m going to briefly mention that if you want to take it a step above and beyond for even more insight into your customers, you can further consolidate that data by matching it to your internal records – Joe B in your client database is Joe B on Facebook and JoeTweet on Twitter, for example – but this is a much more ambitious project.


3. Data Quality is Not Just An IT Concern

Master Data Management is intended to bring greater value to an organization’s data by making it more accurate and trustworthy. Whether or not that actually happens very strongly depends on the quality of the data to begin with. As they say, “garbage in, garbage out,” and that’s even more true of social media marketing solutions. If you thought the quality of data in your organization was sorry to behold, I have a startling fact for you: the internet is full of garbage data. Absolutely overflowing with it. Not just things that are incorrect, but also things that are irrelevant.

If you’re going to get facts from social media, you’d better start taking data quality seriously – and make sure whatever solution you use is built by someone who takes it even more seriously. Let me give you an example.

Suppose you’re a retailer who sells Gucci products. You have a simple social media solution, a nice little application that gives you sentiment analysis and aggregate scores. You investigate how your different brands are doing and, to your shock, find that Gucci has a horrible sentiment rating. People are talking about the brand and boy are they unhappy.

You do some quick mental math and determine that it must be related to the promotion you just did around a new Gucci product. The customers must hate the product, or the promotion itself. You hurriedly show your CEO and she tells you to pull the ads.

What you didn’t know, and what your keyword based social media monitoring application didn’t know, is that there is a rap artist who goes by Gucci Mane whose fans tweet quite prolifically with reference to his name and an astonishing bouquet of language that the sentiment analysis algorithms determined to be highly negative.

Your customers are, in fact, pretty happy with Gucci and the most recent promotion, but the relevant data was drowned out and wildly skewed by a simple factor like a recording artist with a name in common. This wasn’t a question of “the data was wrong” – the data was accurate, it was just irrelevant, and the ability to distinguish between the two requires technology built on a foundation of data quality governance.

If you’re going to use social media data, especially when you’re using it as a measure for the success of a marketing campaign and subsequently the allocation of marketing budget, make sure you’re paying attention to data quality. Don’t veer away in alarm or boredom from terms like data governance just because they aren’t as sexy as SEO or content marketing or 360 view of the customer – train yourself to actively seek the references to data quality as part of the decision making process around a social media strategy.


4. Don’t Let Someone Else Define Your Business Rules

One of the most time consuming aspects of preparing for a Master Data Management implementation is sitting down to define your business rules. There is no one definition of the customer and no one definition of a product. These are complex issues that depend heavily on the unique needs and goals of an organization, and don’t let anybody try to tell you otherwise.

To that end, social media marketing demands the same level of complexity. If you’re building a social media strategy, you absolutely need to be thinking about those business rules and definitions. How do you define a suspect? A prospect? A customer? What makes someone important and worth targeting to you? Is it more important to you to have fifty potential leads or five leads that are defined by very specific requirements for qualification?

Every organization will be different, and a good social media solution takes that into account. Be wary of a piece of software or a consulting company that has a set of pre-established business rules that aren’t easily customizable or – even worse – are completely set in stone. If an outside company tries to tell you what your company’s priorities are and applies that same strategy to every single one of their clients, thank them for their time and look elsewhere.

Also steer clear of a solution that oversimplifies things. If you’re looking to social media opinion leaders as high value targets, you want to know how they’re defining that person as an opinion leader. Are they using one metric, like Klout score or number of followers? Are they using five? Would they be willing to give more emphasis to one over the other if your company places more value on, say, number of retweets than on number of likes?

Good solutions come preconfigured at a logical setting that is based on best practices and past client success – but are also flexible and able to match themselves to your unique business definitions and strategy as much as possible.


5. Data Silos Are Lost Opportunities

Finally, I want to talk about data silos. I’m going to expand on this term for those of you reading this who are marketing people like me and not necessarily information management junkies (although I confess the people who are both combined in one are always a delight to talk to). A data silo generally refers to situations in which the different lines of business hoard their databases and don’t like to share their information throughout the entire organization. This can be a huge problem for Master Data Management adoption, because of course the point is to make it so that everyone is using the same data, but it’s also a problem for social media marketing.

Social media data, first of all, is not just marketing data. Your sales teams will undoubtedly have uses for it in terms of account handling, and your product development teams, if you have them, will be interested in learning more about what customers actively crave from the market, and heck, your customer service division almost certainly can make use of an application that instantaneously warns them when people are dissatisfied.

The fact is, if you want to prove that gathering this data is useful, don’t hoard it all to yourself. Share that data around and let people play with it. Creativity – and creative ways to use data – happens when people think about things in ways they don’t normally think about them. Traditionally social media has been relegated to marketing, but it doesn’t have to be.

An ideal social media solution, even one of those affordable subscription-based ones I’ve been talking about, presents the data in an accessible, easily shared format. The good ones come with both a high level dashboard in business terms that even a CEO who thinks Twitter is stupid can log into and gain insight from and also the ability to drill down and export raw data so that the people who want to do complex and unique number crunching have that ability without the restraints of the program itself.


Shown above: Social Cue™, the InfoTrellis social media solution


It’s important to have a good balance of goal-oriented strategy – never go into social media without a plan or a purpose – and openness to innovation. It’s even more important to be working with an application that accommodates both.


InfoTrellis is a premier consulting company in the MDM and Big Data space that is actively involved in the information management community and constantly striving to improve the value of CRM and Big Data to their customers. To learn more about Social Cue™, our social media SaaS offering, contact the InfoTrellis team directly at to schedule a product demonstration.

Topics: allsight Big Data data governance Data Quality Marketing Master Data Management mdm Social Cue social media Social Media Marketing

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by deeparadhakrishnan on Wednesday, Apr 24, 2013 @ 1:37 PM

Master Data Management (MDM) is no longer a “fast follower” initiative but is now a generally accepted part of any information management program.  Many enterprises have well established MDM programs and many more are at the beginning stages of implementation.  In order to be successful with MDM you need continuous insights into that master data itself and how it is being used otherwise it is impossible to truly manage the master data.  An MDM dashboard is an effective tool for obtaining these insights.

What is an MDM Dashboard?

 It is difficult to improve a process without having a means to measure that process.  In addition, it is difficult to gauge continuous improvement without being able to track performance on a regular basis.  Dashboards are a common tool that is used to communicate key measures and their trends to stakeholders on a regular basis.

An MDM dashboard provides key measures about the master data such as:

  •  Metrics and trends of how the master data is changing.
  •  Metrics and trends of quality issues, issue resolution and issue backlogs.
  •  Insights into the composition of the master data.
  •  Metrics and trends of how the master data is being used by consumers across the enterprise and their experience such as meeting or failing service level agreements.

Additionally, the MDM dashboard must highlight significant changes and provide insights into key improvement areas and risk areas as these are what need to be actioned.  For example, perhaps a sudden increase in high severity quality issues coming from a particular source system.

The stakeholders for the MDM dashboard will be broad given master data is an enterprise asset.  Stakeholders will consist of a mix of business and IT resources from a variety of areas.


Use of Dashboard

A representative from each consumer of the master data (e.g., call-center applications, e-commerce applications, data warehouses and so on)
  • They will want insights on how many transactions they executed against the MDM hub, failure rates and SLA attainment and how it compares to past periods.
  • They will want to understand trends in quality for the data they are using, or plan to use, because data quality directly impacts business outcomes.
  • They will want to understand the composition (or profile) of the data they are using.
A representative from each provider of the master data (i.e., source systems that feed the MDM hub)
  • They will want trends in quality issues for the specific data they have provided to the MDM Hub so they get insights into their own data quality and can prioritize addressing quality issues at source.
  • They will want to reconcile change metrics in their system with change metrics in the MDM hub.
Executives responsible for managing the MDM program
  • They will want insights on MDM hub operations and performance to evaluate whether the system is meeting defined SLAs for the many consumers across the enterprise.
  • They will want to understand trends in data quality and data usage to not only optimize the MDM hub, but also to justify the MDM program.
Data Governance Council members responsible for setting and measuring policies including data quality initiatives
  • They will want insights into all aspects of master data and its use, including quality trends, change trends, consumer activity etc.  However what is most important is in highlighting any significant changes from period to period so that the council can take action where required to identify and prevent potential issues before they escalate.

The frequency for producing and delivering an MDM dashboard that targets these stakeholders varies from client to client but a common time frame is monthly.  However, this does not negate the need for frequent, detailed reports delivered to other stakeholders. Daily and weekly reports, for example, are essential to the team members that are responsible for implementing the MDM program.

What are the contents of the ideal MDM Dashboard?

The business cares most about significant changes in metrics and it is those that must be highlighted.  The goal of any dashboard should not be to look at everything available but rather to look at the information that is most important and most revealing – to gain insights into what is happening within the business unit with the end goal of making better decisions and identifying and anticipating issues before they can have a negative impact on the business.  An MDM dashboard can help to identify how effective the MDM and governance programs are in meeting the needs of the organization.

Breaking down the metrics

Every metric is nice to have but not every metric is key at the strategic level.  For example, metrics which show that MDM helped increase the accuracy of customer data by 10% aren’t likely to impress the management, but metrics which show that customer retention or cross-selling rates increased as a result of MDM will.

To make the link between goals and strategy, organizations should focus on specific metrics instead of trying to measure everything that can possibly be measured.  Therefore, organizations should look at the top five to 10 items that are important to measure on a regular basis.

Standard key metrics to be captured in the dashboard include:

Key Metric


Master Data Composition A static view of the master data – very much important because data is brought together from multiple sources and this gives you insight into what your “combined view” looks like.

  • Number of master data records (eg, number of Customers, Accounts, …)
  • Number of key elements and ratios (eg, Number of Addresses and average number of Addresses per Customer, number of Customers with no Addresses, number of Customers with many Addresses and so on).
Master Data Change Provide understanding on how the master data has changed over the time.

  • Number of de-duplicated and split master data records.
  • Number of New records, Updated records for the month.  Additionally a comparison on change trends from last periods with significant variances highlighted.
  • Change for key elements of interest.  For example, New / Updated Email Addresses if there is a campaign to obtain as many email addresses as possible for direct marketing purposes.  Again, with comparisons to prior periods.
Master Data Quality Provide master data quality trends.  Quality concerns differ from one client to another but common concerns for customer master data include anonymous values in names, invalid addresses, customers that don’t fit a n expected profile (such as too many addresses) and default birth data (such as 1900-01-01).

  • Number of quality issues discovered in the reporting period (by severity and type of issue).
  • Number of quality issues resolved in the reporting period.
  • State of the quality issue backlog to be addressed.
  • Sources contributing to the quality issues.
  • Trends compared to previous periods.
Master Data Usage Provide an understanding on how the master data is being consumed.  Managed master data only has value when it is consumed and so it is important to understand who is using it and how it’s being used.

  • Top consumers of the data including SLA attainment and error rates.
  • Trends using past reporting periods with significant variances highlighted.  If a consumer’s activity spikes for one month it may indicate an issue on their side or new requirements on using the data.

MDM hub performance details that can be used for capacity planning and performance tuning.

  • Number of transactions broken down by transaction type.
  • Success versus failure rates.
  • Processing Rate (transactions per second).
  • Min, Max, Average response times and message sizes.

Semantically speaking, organizations can define their metrics in more business-oriented terms that are meaningful to stakeholders. For example, strategic metrics related to the operational effectiveness (e.g. cost measures), customer intimacy (e.g. customer retention rates), and so on. The bottom line is key metrics drive the success of the organization.

Example Uses

The following are examples of how an MDM dashboard can be used to support and optimize business initiatives.

Example 1: Reduced mailing costs in marketing campaigns

The marketing team of a retail company uses the customer and address data from its MDM hub for its direct mail campaigns. Investigations revealed there is approximately $4 in processing costs for each returned mail item plus an undetermined amount of lost revenue since the mailed item did not reach its destination and fulfill its purpose.

An MDM dashboard would provide fundamental metrics and trends on address data including:

  • Number of customers and addresses
  • Number of new and updated addresses in this period
  • Number of addresses in standard form

The dashboard would provide advanced metrics and trends including:

  • Number of addresses with quality issues broken down by severity
  • Number of addresses that are aging and have become unreliable due to data decay

The marketing team can use this information to understand trends and be more strategic in how they approach their campaigns from a cost perspective.

Example 2: Addressing quality initiatives at the source.

Many source systems don’t have quality controls and trending information on master data such as customers and products that resides in their databases.  Analyzing the master data within an MDM Hub provides a “one-stop-shop” for finding and tracking quality issues that trace back to particular source systems.  It is always best to address quality issues at source and an MDM dashboard would provide management the metrics they need to understand how quality of the data and the backlog of issues in their source systems are trending.  Likewise, it gives the MDM team insights into how source systems are contributing to the MDM effort.

Example 3: Capacity Planning

As MDM gains momentum in the enterprise, it takes on more and more consumers.  Examples of consumers are CRM systems, e-Commerce, Web Applications, source systems and data warehouses.  As with any mission critical system, it is important to ensure the MDM Hub is providing all of these consumers with high quality service. This includes (but is not limited to) providing maximum availability and the ability to fulfill transaction requests within defined service level agreements (SLAs).

It is critical then to understand transaction metrics for each consumer including:

  • Number of transactions executed
  • Types of transactions executed
  • Failure rates
  • SLA attainment rates

These metrics, along with high level trending information, can be used to plan for future capacity needs to ensure the technical resources are there to satisfy the demands placed on the MDM hub.

It also gives your data stewardship team the means to identify anomalies and items in need of investigation – for example, if a consumer’s transaction workload drastically increases one month or suddenly begins to experience an unusual number of failures.


If you want to manage it then you must first measure it.

This is no less true just because your organization has implemented MDM – how can you expect your teams to manage your master data if they have no way to measure it?  An MDM dashboard is a tool that provides the measurements to various audiences so that you can optimize your MDM program and get more business value from it.

InfoTrellis has incorporated over 12 years of experience both in MDM product development and MDM implementation in a unique MDM dashboard solution “MDM Veriscope” that provides  you with the metrics you need to manage your MDM program.

Please click here for a recent announcement (April 2013) on the release of MDM Veriscope 3.0.

Topics: data governance Data Quality master data analytics master data governance Master Data Management mdm MDM Dashboard MDM Implementation reporting

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by hmrizin on Monday, Apr 1, 2013 @ 11:07 AM

Yes. Social Media is important for business. Thanks to the analysts, advocates, industry experts and the zillion articles & blog posts on the topic.

Now the question is where to start and how to go about consuming Social Media Data. What are the steps involved? Are there any best practices, frameworks, patterns around consuming Social Media Data? This blog tries to answer some of these questions, keep reading…

Within the general IT community we’ve learned many lessons over the years that lead us to the conclusion that data is an enterprise asset and we need proper controls and governance over it.  Only then can we trust the results of analytics, use it in real-time processes with confidence and gain operational efficiencies.

Social media data, or external data in general, is no exception to this.  We can and must apply techniques we commonly apply in master data management, data quality management and enterprise data warehousing disciplines so that we can draw the most value possible out of social media data.  On top of that there are additional techniques to apply given the unique nature of this type of data.

This article proposes the concept of a centralized and managed hub of social media data, or “social media master”.  It addresses the following topics:

  1. Justification
  2. Data Acquisition
  3. Enrichment
  4. Data Quality and “Data Quality Measures”
  5. Relevance (i.e., finding the signal in the noise)
  6. Consumption (i.e., use of the data)
  7. Integration with other data and “Social Media Resolution”
  8. Governance


This article does not go into much detail on why you should centralize and manage social media data and I would assume information management practitioners would accept this as the right thing to do.  Instead the question is “when is the right time to do it?”  It is natural to take a project-based and simple approach to managing social media data when starting out with only one or two initiatives.  But you do not want to fall into the trap of having multiple initiatives on the go, each with their own silo of data that is inconsistent, incomplete and with unknown quality.  That is reminiscent of the proliferation of project and departmental based data warehouses in the 1990’s that many organizations are still trying to address in their enterprise data warehousing strategies.

Data Acquisition

The first and foremost task is to collect the social media data of interest.  There are of course different ways this can be done such as:

  1. Subscribing to the social media site’s streaming API (i.e., data is pushed to you).
  2. Using the social media site’s API (i.e., you pull data from them).
  3. Purchasing data from a third party provider.

All of the popular social media sites like Twitter and Facebook have well documented APIs, schemas and terms and conditions.

The method you choose will depend on the criteria and volume of data you expect.  For example, if you simply want all Tweets that mention your company name (or some other keywords) then perhaps subscribing to the social media site’s streaming API may be sufficient.  If you expect a large volume of Tweets and you want to go back in time then you will likely have to get that data from a third party provider.


All popular social media sites have a well-defined schema that describes the content of the data.  And the content for many includes the same basic data such as a user id (or handle), a name, location information, timestamp data and of course the actual social content such as the 140 characters of Tweet text.

This is raw data and should not be considered ready for consumption as you first need to apply quality functions to the data, measure the quality and also create relevance measures so you can find “signal in the noise”.  There is also opportunity to enrich the data, which not only helps with quality and relevance measures but also provides additional data that can be very useful in analytics.  Let’s look at a few examples.

The first example is enriching a Twitter user’s profile with gender information.  By analyzing the user’s name and handle it is possible to derive gender along with a confidence level.  It is not possible in all cases but is possible in many.  Gender is, of course, a very important dimension in analyzing data for many organizations.

A second example is by analyzing the text of the social content.  For example, are there any mentions of brand names, product names or competitors?  A simple yet effective way is to use reference files of keywords and simple string matching to pull out this information.  Another way is to use more advanced natural language processing (NLP) and machine learning techniques to do this, which is better suited for enriching the raw data with things such as sentiment and categories.

A final example is with Four Square check-ins (over Twitter), which broadcasts a user’s location such as “I’m at Lowe’s Home Improvement (Mississauga, ON)”.  Different check-in services have different formats but Four Square check-ins are usually in the format of “I’m at <Store-Name> (<Place>, <State/Province>)”.  This is packed full of good information even though it is brief.  You can pull out not only city and state/province level information but also store level information that can be matched to a reference file of stores and used as a dimension in analytics.

You have the ability with enrichment techniques to augment the raw data with additional data that is very useful both in downstream use (analytics and real-time processes) but also in subsequent activities for mastering the data.

Data Quality and Data Quality Measures

Data quality is an activity that cannot be ignored in any data management/integration exercise and the same applies to social media data.

Different quality functions can be applied depending on what data is available to you.  One simple example is in analyzing free form location information that can in formats such as “Toronto”, “Toronto, Ontario”, “Toronto, On”, “Toronto, ONT”, “T.O.”, “Toronto Ontario”.  This data can be put into a standard format so it is consistent and uniform across the data set.

Given this is external data that is not under your control, it is very important to not just apply quality functions but to also measure the quality.  For example, what is your level of confidence that the city data refers to an actual city?  Your confidence in whether or not the user lives in that city is a different matter all-together, however.

When you create and manage a hub of social media data you can expect the multiple consumers will have different uses of the data.  They will therefore pick data that is appropriate for them and that they believe is “good enough” for their purpose.  This is why measuring the quality of the data is important, if not critical.


One important quality measure is “relevance”.  Ultimately, relevance is contextual because one set of data that is relevant to one user may not be for another user.  However, it is important to create a relevance, or qualification score as a basis.

By definition “Qualification = Quantifying the confidence you have in the quality of the data” .  In simple terms the question that needs to be answered is “how confident are you that this particular social media content, such as a Tweet, is relevant for you?”   As an example, a Tweet that you’ve acquired that has nothing to do about your company, competitors, products or brands may have a low (or zero) relevance and therefore “filtered out” from being used in downstream processing.

This is finding the signal in the noise and ensuring consumers use data that matters, which provides better business outcomes.


The “managed” social media data can be consumed from a centralized hub once the data has been acquired, enriched, enhanced with quality and measured.


Just like an MDM hub, the consumers can be analytical or operational (real-time) in nature.  And just like an MDM hub, best practices should be followed in terms of security, audit, setting SLAs and having the right infrastructure components in place.

One major difference between an MDM hub and a social media hub is the level of trust and confidence in the data.  This is not a topic that can be ignored with MDM hubs but we are in a different game since we are dealing with external data versus internal data.  That is why enriching, measuring quality and measuring relevance of the data is critical.  It provides the ability for consumers to work with the data that is appropriate for their needs and tolerance levels.

It is also important to have a well-defined schema in place.  Much of the actual social media content is unstructured however there is structured data around it and the enrichments and quality measurements are structured.  Just because it is well-defined doesn’t mean it has to be normalized into a fully typed relational model or object model.  What is most important is there are basic structures in place to aid in consuming the data.


Integrating with other data and Social Media Resolution

Social media data, just like reference data, master data and other types of data, is not an island.  It is when you combine qualified and relevant social media data with your internal data that you have huge business potential.

In some cases an organization may have Twitter handles or other social account identifiers in their MDM hub that they can use to join to a social media hub to see relevant activity.  But for most this is not the case and instead they would need to look at “social media resolution” as a technique to match and merge data.

Social media resolution comes in two forms:

  1. Resolving identities/accounts across social media services (e.g., a Twitter user to a Facebook user).
  2. Resolving social media identities/accounts to internal data (e.g., a Twitter user to a customer in an MDM hub or enterprise data warehouse).

This is a very different problem than matching “customers to customers” and different data points, techniques and technologies are required to make it happen.

It is not in the scope of this article to describe how to do it, however if you find yourself interested in hearing more details then you can contact me at to chat about it. We’ve been innovating in this area at InfoTrellis with the development of our AllSight big data platform and I’d be happy to talk to you at greater length on the subject.


All enterprise assets need to be properly governed and to govern something you need to first measure it.  Therefore it is important to capture and analyze key measures of the social media hub such as:

  1. New data acquired (how the data is changed)
  2. Quality of the data and how it is trending (we can do this since we measure the quality)
  3. Success in enriching the data
  4. Who is using the data and how are they using it

Below is an example of a Twitter Dashboard that is used to get insights into the key measures of a social media hub containing Twitter accounts and Tweets.



The past has taught us that we need to be proactive and properly manage data as an enterprise asset if we want to get the most out of it and have confidence in what we get.  Social media data is no different than this and hopefully this article has provided you some insight into what a social media hub can look like and what it must do.

If you agree or disagree or want to chat further on the topic then please leave a comment or contact me directly!

Topics: allsight bigdata data aquisition data governance Data Quality data silos enrichment facebook InfoTrellis master data analytics Master Data Management mdm qualification relevance social media social media master social media schema tweets Twitter

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by miklostomka on Thursday, Jan 31, 2013 @ 9:46 AM

Organizations spend millions of dollars to implement their MDM solution. They may have different approaches (batch vs. real time; integrated customer view vs. integrated supplier view etc.) – but in general they all expect to get a “one version of the truth” view by integrating different data sources and then providing that integrated view to a variety of different users.

After the completion and successful testing of the MDM implementation project, companies sit back and enjoy the benefits of their MDM hub – and more often than not don’t even think about looking under the hood. It never occurs to them that they could be trying to gain insights into what’s happening inside that MDM hub by asking questions like

–          How is the data quality changing?

–          What are the primary activities (in processing time) inside the MDM hub?

–          How are service levels changing?

However, organizations change, people change, requirements change – impacting what is happening inside the MDM Hub. Such changes can open up significant opportunities for an organization – but without doing any sort of investigation that opportunity is typically not recognized.

Here are two examples – diagnosed through the use of an MDM audit tool:

–          The company’s MDM Hub had approximately 100,000 incorrect customer addresses. These addresses were used for regular mailings; the mailings generated (in case of correct address) incremental revenues. Impact on the business related to just one mailing:

  • $400K wasted on the mailing cost ($4 is the conservative mailing cost per person – for postage, printing of the mailer etc.)
  • $100K of immediately lost revenues (as past data shows that one in 50 customers spends about $50 immediately following the mailing)
  • The longer term revenue lost was not assessed, but was estimated to be well over $400K
  • The opportunity: Cost saving of $400k and revenue increase of $500K or more

–          At a different company, by analyzing data processed by week the resulting report was able to determine that the number of new customers processed was declining by 1-2% every week – starting about 6 weeks before the audit was conducted. A deeper review of the audit report suggested that

  • The original service levels related to customer file changes had been getting worse and worse over that same time period
  • As customer file changes (as per the audit report) took over 85% of the total processing time, the slower processing lead to less time available for new customer processing
  • This initial diagnostic was confirmed by the client – they had a slowly growing backlog of new customer files
  • Ultimately the audit was able to highlight which input data source had been causing the slowdown, allowing the company to resolve the problem at its source
  • Business impact: a major risk (very significant slowdown in new customer set up) was eliminated before it became a real problem

In both examples: the MDM Data Governance team was recognized for identifying a major drain of resources (incorrect addresses) or for avoiding a major risk (new customer backlog)

We take our cars for regular maintenance; we go to regular medical and dental check-ups; when was the last time you had a thorough analysis of your MDM Hub?

Several leading MDM technology companies have developed very advanced tools with the ability to provide ongoing MDM monitoring capabilities. Has your organization implemented any of these tools?

Standard features offered by these solutions include:

System Reports:

–          Data Load Results

–          Transaction Metrics Summary and details by Transaction Category or Transaction

–          SLA Attainment by Transaction Category or Transaction

–          Performance Analysis

Business Reports:

–          Data Composition

–          Changed Data Trend Analysis

–          Data Quality, Quality Extract or Quality Trend Analysis

–          Inactive Data

Automatic alerts:

–          Identify areas of immediate need that should be investigated

The tools can often also provide custom reports to answer important business questions like:

–          Across of all product / service groups, which customers are from certain geographies (in some industries this is  required by regulatory bodies)

–          Across of all product / service groups, which customers have certain characteristics ( for example more than 2 addresses) which could indicate fraud?

Why hesitate to do your MDM deep dive? The immediate payoff is the ability to improve the business results of your organization. Your leadership team will thank you for it!

Companies typically do not analyze what is happening inside their MDM Hub. What is your company doing? InfoTrellis is conducting a survey to determine what percentage of companies do MDM Hub audits or deep dives – and the survey results will be shared with all participants.

Qualified respondents will be entered into a draw, the winner of which will receive an Apple iPad Mini for their personal use and a complimentary scan of their organization’s MDM Hub performance and data quality. (Note: the MDM Hub analysis is run on system log files, ensuring the confidentiality of your customer and product data.)

Click here to take the five minute survey on your MDM Hub monitoring process.

The survey will be open until March 15, 2013. The draw will take place on March 20, 2013 and the survey results and draw winners will be announced by March 25, 2013.

To learn more about the solutions InfoTrellis has developed for MDM Hub analysis, feel free to contact our specialists directly via

Topics: data governance Data Quality master data analytics master data governance master data reporting mdm mdm hub

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by lavanyaramkumar on Monday, Jan 7, 2013 @ 10:45 AM

In recent years reference data management (RDM) has slowly crept into the forefront of business decision-makers’ consciousnesses, making its way steadily upwards in priority within corporate goals and initiatives. Organizations are suddenly seeing the benefits of investing in RDM, attention grabbed by potential paybacks like smoother interoperability among various functions of the organization and centralized ownership and accountability in creating trustworthy data.

Before we dive into talking about approaches for implementation, I want to look at the potential significance of RDM in an enterprise. When the market is inclined towards concepts like Data Integration, MDM and Business Intelligence tend to hog the spotlight. For these particular corporate initiatives, the primary focus of data is key business information like customers, products or suppliers. It is equally important, however, to appreciate the fact that reference data plays a major role in organizing and comprehending all these key pieces of business data.

Whenever there is a change in reference data, the definition of that business data changes as well. That’s why it’s so important to invest meaningful effort into the maintenance of reference data, especially in any globally distributed network or where enterprises have diverse systems each with their own localized data. Half-hearted maintenance of reference data degrades quality of business data and results in misleading reports in BI and CRM initiatives.

Organizations looking for an efficient, quick and low-risk approach are shifting focus towards reference data management solutions. RDM allows different versions of codes to be managed from a central point, simplifies the creation of mappings between different versions, and enables transcoding of values across data sets. Cross-enterprise reference data can then be reconciled for application integration, business process integration, statutory and non-statutory reporting, compliance and BI analysis.

RDM treats reference data as a special type of master data and applies proven master data management (MDM) techniques including data governance, security, and audit control. A good RDM solution enables efficient management of complex mappings between different reference data representations and coordinates the use of reference data standards within an organization. User interface to the RDM hub provides a centralized authorization and approval process, publishes data changes to enterprise systems, and handles exceptional situations.


Key Considerations for Implementation

     1.     Data Identification

Identifying common definitions and classifications across the organization and then generalizing a golden set of definitions is the first step to RDM success. The same data may have historically been maintained by several groups, directly or indirectly wasting resources like effort, budget and time, and starting with clear definitions is the best way to eliminate that waste. Some examples of common reference data issues resolved by clearer definition are:

Transaction Codes: Manufacturing or sales units of an organization can have different “transaction codes” that require consolidation of several systems to communicate status rather than a single dashboard providing universal status for all departments.


HR Codes: A global organization having several sub-units or frequent mergers and acquisitions can struggle to unify its data on employees, resulting in a failure to leverage employee expertise across the organization and ultimately underutilizing current staff and spending money to look for the same skills externally.


Segment codes for sales & marketing: Maintaining a single version of zonal or segment code improves an organization’s ability to concentrate on markets with the greatest potential growth and ensure that focus is globally distributed instead of fixated on a single market segment.


Fixed Asset codes: Organizations often have multiple assets of the same type or category such as machinery, equipment, furniture, and real estate, and face difficulties in segmenting them universally to identify an accurate global financial status of the company.

     2.    Define Two-Fold Rule

Defining data rules and business rules is incredibly important. The first category focuses on validation rules, which can be as simple as data validations imposed by industry standards. The second category focuses on compliance with business processes and data governance objectives.

For instance,

Data Rules Business Rules
NAICS Code is a 2-6 digit number Hierarchy management and constraints within Code Sets
State Code must have an associated Country Code Life cycle management for a Code Set from creation to distribution

     3.    Know Your Integration Points

Centralized management of reference data within an enterprise will enable organizations to improve efficiency and provide strong data governance, security, audit & process controls and change management.

Unlike Master Data Management scenarios, where the MDM Hub can function as a replacement to legacy systems for single source information, the RDM Hub is positioned at the center of the enterprise architecture with anywhere between one and a dozen source and distribution points. Significant considerations must be laid out to ensure the seamless integration of data into downstream systems for real time services and consistent operations. Well defined integration process and mechanisms at this stage will provide long term returns for your organization.

     4.    Ease of Data Governance  

Strategic data governance is absolutely essential in an RDM implementation. Without any sort of RDM, consolidation of data for internal or regulatory reporting must be achieved through an inefficient, labor intensive manual process. The benefit of having an RDM solution is that it removes the burden of maintaining reference data from what is usually several individual IT teams, transferring ownership to one data governance team with more visibility and control over the business rules around reference data.

Having an RDM solution facilitates the establishment of a lean, efficient data governance team that manages multiple versions of code, builds complex mapping and hierarchy, authorizes changes, manages data for reporting purposes, publishes changes to downstream systems, and manages a variety of other valuable tasks.

Ultimately, a powerful RDM solution can save a corporation from wasting significant amounts of valuable resources, and a proper implementation is key to that solution.


I hope this article has given you some valuable insight into successful RDM implementation. For more details on our RDM solutions and client RDM successes, feel free to contact me at or visit

Topics: data governance Data Quality InfoTrellis Integration RDM Reference Data

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by David Borean on Wednesday, Nov 7, 2012 @ 3:08 PM

Intertwined fates

There has been an interesting shift in the MDM space over the last few years.  It wasn’t long ago that the most common question used to be “What is MDM?” – these days that question is instead “What are the best practices in implementing and sustaining MDM?”

There are best practices that have become common knowledge, one example being the practice of approaching MDM as a “program” and not a “project”, employing phased implementations that provide incremental business value.

Other best practices have yet to enter the mainstream; among them the absolutely essential practice of establishing MDM not in isolation but as part of a broader Data Governance program – a practice that cannot be undervalued for its impact on long term success.  This is an approach that takes time to see the effects and understand the value of, which goes a long way towards explaining why it so often gets overlooked, especially in light of the fact that MDM is still a relatively young idea for many companies.  You can get MDM off the ground without Data Governance, but over time you will certainly feel the effects of gravity much more without it.

We understand that successful Data Governance will lead to better and higher value business outcomes by managing data as a strategic asset.  It is also widely recognized that a critical success factor in effective Data Governance is having the right metrics and insights into the data.  Taking it one step further, if you concede that master data is the most strategic data for many organizations (most people would), having the right metrics and insights into that master data is a must.

MDM requires Data Governance to be successful beyond the first phases of implementation – and Data Governance requires metrics and insights into master data to be successful.  So what are these required metrics and insights and where do they come from?

Metrics and insight

The most important metrics and insights about your master data are as follows:

What is the composition of your master data?

When you bring data in from multiple sources and “mesh” it together you’ll want to understand what the resulting “360 view” of that data looks like, as it will provide interesting insights and opportunities.  For example, on average how many addresses does each customer have?  How many customers have no addresses?  More than five addresses?  How many customers have both US and Canadian addresses?

How is your master data changing and who is impacting the change?

In any operational system you want to know how many new records have been added and how many existing records have been updated for different time dimensions (e.g., daily, weekly, monthly) and time periods.  In an MDM hub, you need to take this a step further and understand entity resolution metrics – such as how many master data records have been collapsed together and split apart.  Entity resolution is the key capability of an MDM hub responsible for matching and linking/merging records from multiple sources, and you therefore need on-going metrics on it in order to optimize it.

Furthermore, it is also important to understand what sources are causing the changes, given that master data records are composed of records from multiple sources.  Is the flow of information what you expect?

How are quality issues trending and where are they originating?

It is obviously important to know the current state of quality and how many issues are outstanding for resolution, aiding in your ability to address these issues in priority order.  It is, however, also important to see the bigger picture and be aware of how quality issues are trending over given time dimensions and time periods.  Ultimately you want to fix any data quality issues at their source, and in order to do this you will need to understand which of your sources are providing poor quality data to the MDM hub.

Take address data, for example.  You may detect that a number of address records in the MDM hub have a value of “UNKNOWN” for the city element. With proper Data Governance you are able to trace these values back to a particular source system, and from there address the issue at source.  The result is being able to see and track this particular quality issue trending downwards over time.

Not only does this help in increasing the quality of the data but can also be used to justify the existence of the MDM program, especially if you can put a unit cost to a quality issue (possible for some quality issues like bad addresses).  It is extremely difficult to put a price on data – but comparatively easy to put a cost on bad data.

How is quality issue resolution trending?

Ultimately you want to see new quality issues trending downwards over time,  but oftentimes you still need to deal with resolving existing quality issues.  It is important to be able to see if the overall quality of the MDM hub is increasing or decreasing.  As above, having metrics and trends on the resolution of issues measured against a unit cost is a valuable and meaningful resource for data governance councils to have in hand to justify their efforts.

Sometimes your quality issues can be resolved through external events, such as a customer calling to update their address that may have a “returned mail” flag on it.  Other times quality issues are resolved by data stewards.  Quality issue resolution trends help to understand not just the outstanding data stewardship workload but also their productivity, which is useful in team planning.

Who is using the master data, how are they using it and are you meeting their SLAs?

It is common for consumers of MDM hubs to grow over time until eventually there are many consuming front-end channel systems and back-end systems.  I’ve seen MDM implementations grow from one or two consumers in initial phases to many consumers across the enterprise, invoking millions and tens of millions of transactions a day against the MDM hub.  Understanding what workload each consumer is putting on the MDM hub, error rates and SLA attainment is essential information for a data governance council to have.  To give the most obvious example, having access to this information allows for capacity planning to ensure the MDM hub will continue to handle future workloads.

The missing link – where do the metrics and insight come from?

The key metrics and insights listed above are required for successful MDM and Data Governance.  But where do you get them from?  They are not something provided by operational MDM hubs, as the hubs themselves are focused on operational real-time management and use of master data.  It is not their duty to capture facts and trending information to support the analysis of master data that produces the metrics and insights. That’s more of an “analytical process”, and it doesn’t fit well within an operational hub.  Instead, what we’re talking about is the job of “Master Data Analytics”.

I define Master Data Analytics as the discipline of gaining insights and uncovering issues with master data to support Data Governance, increase the effectiveness of the MDM program, and justify the investment in it.

This has been a missing capability in the overall MDM space for some time now.  Some clients have addressed this capability by custom-building it and, even worse, some clients have done nothing at all.  Seeing firsthand the need for a solution to this universal stumbling block, our team began work some time ago on providing that solution. There is now a best of breed product by InfoTrellis called ROM that incorporates our experience of over 12 years of implementing MDM for Master Data Analytics that delivers these required metrics and insights for success.

InfoTrellis ROM provides a set of analytics and reports that are configurable and extendible to support Data Governance – and you can think of it as a technology component in your overall MDM program that is an extension to your existing MDM hub.

One very big advantage of ROM is it allows you to capture your master data policies (e.g., quality concerns) and test them against your source system data prior to implementing MDM, providing initial snapshots of quality issues to prioritize and manage in the implementation.

Rather than talk about the product in much detail here, if you’re interested in more information on ROM or in seeing a set of sample reports, just check out the product page at

Topics: data governance Data Quality InfoTrellis master data analytics master data governance master data reporting mdm

Leave a Reply

Your email address will not be published. Required fields are marked *

Posted by David Borean on Tuesday, Aug 21, 2012 @ 1:11 PM

How can you govern your master data without knowing your master data?

For many years I’ve been saying that the one thing all MDM clients have in common is that the quality of data in their source systems is not as good as they thought.  Over the past several years I’ve found that all MDM clients have a second thing in common: they are unaware of the quality of data in their MDM hub and they don’t know how the data is changing.  This is surprising since an MDM hub contains your most critical business data that is used in real-time processes and analytics across the organization.  How can you govern your data when you don’t know its trend in quality, how it is being used and how it is changing over time?  This is flying blind.

There are a few contributing factors to this issue.  The first is that MDM products don’t provide capabilities to analyze and report on data.  The second is an MDM hub is not the appropriate place to do this.

MDM products provide capabilities for master data management and not master data analytics.

Popular MDM products such as IBM InfoSphere MDM, Informatica Siperian and others don’t have any practical capabilities to analyze and report on the master data.  Yes they all have a very strong focus on entity resolution so that you can de-duplicate data but they all fall short on addressing and tracking other types of quality issues (aka as policies), reporting on how the data is changing over time, reporting on who is using the data and how they are using it and so on. These MDM products do come with some reporting capabilities but they usually come with disclaimers that they may negatively impact performance and operations of the MDM hub and therefore should be used with care, which is a strong indication they are not the appropriate place for these activities.  This is for a couple of reasons.

First, the underlying data models in the MDM hubs are operational in nature and designed for managing master data.  They aren’t designed for analyzing and reporting on master data and transactions against the master data.  For example, they lack dimensional structures that can be used to slice and dice the data in different ways including the critical time dimension.  Also they lack structures (such as aggregate tables) that directly support reporting in efficient manners.  It should be noted, however, that many of the products do contain features to broadcast information on activities within the hub that can be collected, aggregated/analyzed and reported on or streamed to dashboards to show near real-time activity.  So in a sense they are enabling master data analytics.

Second, MDM hubs are often used to support real-time, low-latency operations such as integration to call centers, web channels and other business processes.  You don’t normally mix real-time operations with analytical operations without putting one or the other at risk. In other words, MDM products are there to manage master data.  They are not there to analyze master data.  It is analogous to operational systems versus analytical systems.

What is master data analytics?

I define “master data analytics” as the discipline of gaining insights and uncovering issues with master data to support data governance, increase the effectiveness of the MDM program and justify the investment in it. A master data analytics solution should provide the following capabilities:

  • Describe how the master data is changing over various time dimensions and time periods including new, updated, consolidated and de-consolidated data.
  • Deep analytics and discovery of quality issues that goes beyond what is done within the MDM hub with traceability back to the source systems so that issues can be addressed at source.
  • Describe trends in discovery of quality issues AND resolution of those issues.
  • Provide the current state of outstanding data stewardship tasks and how they are trending across different time dimensions.
  • Describe the composition of the data.
  • Describe the transactional activity, who is consuming the master data, how they are consuming it and if their SLAs are met or not.
  • Provide insights into capacity planning for future phases in the MDM program.

Emerging master data analytics products

It is very common for clients to build some level analytics and reporting on their master data.  They’ve had no choice because it is a need in their MDM program but vendor MDM products don’t provide sufficient capabilities for this.  There are, however, products starting to emerge in this important space. InfoTrellis has the most mature offering.

InfoTrellis was first to market with a master data analytics product in 2011 called “Reporting and Operational Monitoring for MDM”, or ROM for short.  This product is now available at version 2.1 and has a strong roadmap. More details including sample reports can be found at

If you want to learn more about ROM including how InfoTrellis uses it in services engagements to accelerate MDM implementations then don’t hesitate to email me at

Topics: data governance master data analytics master data governance master data reporting

Leave a Reply

Your email address will not be published. Required fields are marked *