Data…the art of the possible
If you’ve ever struggled to find that elusive email or essential document you placed somewhere “safe” then spare a thought for Paul Brabant. He frequently has the challenge of having to locate, syphon and evaluate up to millions of pieces of data in complex legal matters that often come with seismic consequences for business and industry.
Q. People refer to ‘big data’ but that isn’t what we’re talking about is it? Your role is more as a data detective where it’s all about robustness of quality and not the quantity?
That’s right. In the generic sense big data refers to the technologies that enabled the storage of massive volumes of data, and the means to exploit this data to generate a new category of insights.
You are correct that my role requires a focus on accuracy. In the business world you can apply the 80/20 rule to draw conclusions about user behaviour and preference, but I need to present data as evidence that supports a legal claim, and therefore need to ensure it will withstand a legal challenge.
Q. We live, of course, in a world of data saturation and information overload in all walks of our lives, not just professionally. Would it be fair to say there remains a human instinct to accept data rather than to challenge it?
In my view, yes. While this depends on the person, I suspect we each have biases that make us more likely to challenge some types of data than others. Given the quantity of data that is available, we may not think enough about what other data might be appropriate to consider. We are also subject to emotion and pre-conceived notions in how we interpret such data.
A good starting point may be to always consider the merits of challenging the data we are evaluating. In recent years we have all become more attuned to the problems of bias and manipulation, so in this context a challenge should include at least the following considerations:
a. Reliability of the source. Does the source indicate the data was properly collected for the indicated purpose?
b. Motive. Why is the person/organisation making this data available?
c. Accuracy. Is the data validated or endorsed by a trustworthy person/organisation?
d. Interpretation. Is causation improperly imputed from correlation?
e. Applicability. Even when all of these factors are sound, is the data relevant to my analysis?
Regarding the last two points, I will share an example of how data can be applied incorrectly in the most important situations. During the Second World War, the Allied Air Force sought to improve aircraft bomber survivability by examining the damage on the aircraft that survived the bombing runs over Germany. The damage was interpreted as an indication of where bombers were targeted, focused on the fuselage and wings. With a limit as to how much armour could be added, the initial conclusion was to limit reinforcements to these areas. It turned out, however, that if a bomber could sustain damage to its wings and still fly home, the correct conclusion was that the areas with less damage were in fact those that made the aircraft most vulnerable, and the decision was ultimately made to focus reinforcements on the engines.
Q. Among others, you worked on the case of a global airline suing the manufacturers of a device that caused the loss of one their aircraft, and on an accounting fraud investigation at a global corporation. Where do you even begin with harnessing data on that level?
Most business hold a complex web of data, and the complexity typically increases with the size of the organisation. Changes in priorities, migrations from legacy systems, employee turnover and similar factors complicate the task of identifying all of the data that is relevant to the factual issues in a legal matter.
To address this, I use a classic investigative method that consists in cross-referencing many sources of information. This includes detailed discussions with various people who have knowledge of the data, and direct examination of the data sources themselves. You can sense that your inquiry is complete when you have resolved inconsistencies across these different data points, and the information begins to paint a consistent picture.
Q. When a business is faced with the challenge of bringing data together, I imagine one of your immediate priorities is not simply the sheer volume but how it’s kept, where, by whom, and on what systems?
That’s right. Referring to my earlier point about investigating to confirm whether you have identified all of the relevant data, we need to gather information that no single person in the organisation has or “knows”. The facts need to be patched together from fragments held by different people, teams, and systems. Investigating the “how”, “where” and “whom” therefore leads you to the “what”.
Q. Presumably there's also an issue when it comes to language and translation?
Of course. Ultimately it is a matter of practicality and cost. You can have people translate or review information based on your instructions, but that will carry a significant cost.
Fortunately, the accuracy of automated machine translation has increased significantly in recent years, and in the right applications this technology can be a solution with a moderate cost. The general rule is that the more common the language, the better the automated translation. Romance languages will generally produce good results.
It is important to recognise that in either case subtleties can be lost in translation, so where possible it is ideal for at least one native speaker to be included in the team responsible to interpret the data.
“The transition to data-based analysis takes time, so my suggestion would be for people to first identify a type of decision point that benefits from being both high-value and relatively low-effort to implement. This will help create a quick win that can serve as the foundation for further growth and refinement.”
Q. And what about the overall management of data in large, complex businesses? In the end someone has to assume overall responsibility or we’re just into silo working?
There is a hierarchy of course, but the person at the top typically only has general and current information about the data situation. For any precision and historical context, the investigation must start with identifying the right people to speak with to understand what data were available over the relevant time frame. Some people will provide enough information to know what systems are relevant, and others will complement this information with additional details about a specific system.
Organisations that have anticipated the need to access data will have invested time in creating a map of their data sources that outlines the various systems, their function, the people responsible for their management, and the applicable retention policies. This preparation makes my job infinitely simpler, but the availability of such a road map is the exception to the rule.
Q. I imagine people look at data in different ways. So, for example, a BD professional in a law firm will view it differently from the finance director, or even their own marketing colleagues. How do we address that, especially when most of us are not trained data analysts.
I don’t believe that the way to think about how to gain insights from data will vary significantly based on one’s role. Whatever one’s function, the following questions should be helpful:
a. What are available data sources?
b. How do I verify these are the best available sources?
c. How should I challenge the data received from these sources?
d. Have I sense-checked this data and examined insights that are counter-intuitive?
e. Have I implemented the means to use this data to develop strategy or drive actions over time?
Q. And where does AI come in?
Artificial Intelligence is developing in many areas, and it is a sound solution for many commercial activities. In most applications AI refers to predictive analytics based on machine learning. This is a process whereby you teach a computer to recognise a pattern by showing it positive and negative examples of the pattern, and refine the training through repetition.
For a basic example, you would teach a computer to recognise pictures of cats by showing it pictures of photos with and without cats, and scoring these until the computer had seen so many examples that it can differentiate between cats and dogs. In my practice, I use this form of AI routinely to help my clients efficiently identify the documents that pertain to the issues in a legal matter.
Generally, this type of AI is suitable for large projects. The volume of data to be analysed needs to be quite large (preferably hundreds of thousands of data points and up) to justify the setup and implementation costs, and to elicit insights that would otherwise not be possible with traditional analysis. While the software license costs are generally reasonable, it is the time necessary to setup the data set, train the model, assess its quality, and apply it to the problem at hand that is significant.
Human judgment and intervention remain a central part of the process, so the value of the technology is to scale analysis across a very large amount of data. Accordingly, the initial analysis is whether the cost of generating insights driven by AI can exceed their value. One solution for this is for organisations with similar interests or objectives to access pooled data, so that the necessary economies of scale are reached.
Q. So, how do we do better with data because it seems we’re in danger of slipping further into a world of dangerous assumptions and simplification?
The transition to data-based analysis requires an upfront effort, so my suggestion would be to initially start small, for example with a decision point for which you have access to clean data that can be readily harvested. This approach will help create a quick win that can serve as the foundation for further growth and refinement.
The progression may be slow at first as there can be many hurdles to jump through, so it is best to work with an expert to future-proof the process, as data analysis is a long-term investment. One priority is to make the data to be easy for people to consume, and to provide flexibility to explore different scenarios. Beyond getting the data and analysis right, the mechanics of updating and disseminating the data should be as automated as possible, so that busy people find it is easy to incorporate into their ongoing decision-making.
As a final consideration, I am not suggesting that decision-making become solely dependent on data feeds; if you are a seasoned professional, gutfeel remains valuable. The key is to recognise that it diminishes in value as complexity increases.
Paul Taylor Director of Business Development, Alix Partners
I love a spreadsheet, a dashboard, some key financials, some market data. But I’m by no means a real expert. I probably just like it more than your average business development and marketing professional - and have had a significant amount of experience leveraging it.
When I was asked to contribute some thoughts on use of data within our profession however, I thought there must be a better person than me to contextualise this challenge. So, I found an actual expert. Paul Brabant is a leading expert in the field of digital data discovery and analysis.
On exploring this topic with him, I realised there are many synergies between how he works with data, and how we could and should work. Take Paul’s point about sources.
In our world we potentially have access to CRM systems, pipeline management tools, pitch databases, website analytics, eMarketing platforms and financial systems. The list goes on. The key is identifying what you have, whether you can extract data from it, what form it is in, and how you might use it. All identical questions to those Paul ask himself.
You then need to consider what the data is telling you, and whether it’s accurate. For example, take a very simple example of email open rates as a measure of success of an email campaign. It’s a measure for sure but, in isolation, dangerous.
If somebody uses their curser to scroll down emails (as I do) it will generally register as an open. Therefore, how accurate is that number as a measure of success? Also, it doesn’t really demonstrate engagement with the content. Therefore, can you combine it with click rates (for example).
Finally, think how you’ll use that data and what behaviours you want to change? Data doesn’t give you the answers, it merely provides you with a tool to make better decisions. This quote from Tableau CEO Mark Nelson summarises this perfectly:
“Nobody can just drop all your data in and have the right answer comes out. Human insight helps you make that jump from that raw data to conclusions.”