Rambling Thoughts...

Wednesday, 12 March 2014

From operations to strategy: how data should help the decision-making process

Business intelligence (BI), big data, dashboards, reports, pie charts... These are only some of the fancy buzzwords (ok maybe not the 'reports' one...) we hear everyday. Software vendors show us nice graphs with gauges and needles which exist for the sole purpose to help making decisions. This is all good and well but currently we use terms interchangeably, mix up concepts and think that if we simply plug a dashboard on a corporate database, we will find the answer to any question we have. And no, the answer is not 42 (at least not for any other reason than pure coincidence...).

These days, data is everywhere. It is cheap to store and is getting increasingly cheaper to analyze in order to find hidden patterns or facts that can help us make better decisions. For example, blood sugar level testing instruments do not just give a reading of one's current blood sugar. They map it onto a graph over time and can provide information regarding the trend of one's blood sugar level and make recommendations regarding food choices etc. Coupled with a cheap smartphone app, you could probably receive automated text messages based on your location telling you something like "Hi! This is bloody here, your blood sugar level reader. Are you sure you want to eat that crap again? You know how my graph will go in the red at the next reading. Why don't you go for something healthier? By the way your health insurance agrees with me and would like to jack up your premium if you decide to ignore my friendly advice". Sounds too imaginative? Perhaps. But the point is, from a technological standpoint, there is nothing preventing this from happening. In fact, we already have dongles for car insurance companies that measure one's driving style to give out discounts to good drivers (or rather, increase good drivers' premium less than the bad ones...).

But let's get back on topic here. Although at a different scale, the examples I gave above still apply. In an organization, we have access to a myriad of data sources, from internal systems (e.g., a CRM, an ERP etc.) to external systems (e.g., partners, social networking sites etc.). So data is cheap. But does that mean that we necessarily want more? Should we hog data in hopes to (maybe) find the answer to our questions?

There is no right answer to this question. However there are some questions that we can ask ourselves to help make up our minds on this.

What type of decision are we trying to make?
Are we concerned about our strategy or our operations? This is a crucial question because the structure of the data, its value based on its "expiration date" are very much affected by this. A business unit may need very current data to make a decision on the spot, every day, at the same time, or whenever a given event occurs. On the other hand, a manager may only need the data on a monthly or even quarterly basis. What this entails is that the level of detail and summarization of the data available to support these two types of decisions vary greatly. Now this does not mean that a strategic decision does not rely on the same data than the operational decision-makers. But the granularity of this data will likely be very different. In essence, it goes back to the eternal question of tailoring the message to the audience we are trying to reach. In my experience, operations need the data as soon as possible because it influences their every move. On the other hand, management may say they need the data as soon as possible while it is only actually needed at regular intervals.

How much data do we need?
Just like the question regarding how soon should the data be available, when you ask someone which pieces of data they want and give them a list, they will often pick them all. I have personally lived through this many times and it can be frustrating when you just know that the information is irrelevant but the user still wants it. It is important that everybody understands that there are costs associated with data. It may be screen real-estate for presentation purposes (fields take space after all), extra computing resources and time for calculations. It may also help to use mock scenarios to help users describe the actual pieces of data they need to make a decision. For example, if an extra piece of data is needed in 1% of cases but doubles the processing time, it may be unnecessary. Again, here the objective is to determine what is the exact value of a piece of data.This means knowing its tangible benefits and costs.

Another important point related to this question is whether we need to keep or store all the data we can. Again, here it is important to understand the costs and benefits associated with these strategies. There may also be legal requirements to follow. For example, one may be forced by law to keep data for 5 years. But beyond that, is the data still relevant for the business? Do we need to integrate data from 5 different social networking sites or do 2 of them give us enough confidence to make certain decisions? One interesting principle in the realm of big data is the idea that we cannot know everything. However, we can sample out of everything and determine a level of confidence within which we can make a decision based on this sample. I think it is a fair way to look at data. Maybe out of 10TB of raw data a random sample of 2TB is sufficient to gain 95% confidence. If using 9TB gives us 98%, does it justify the extra costs associated with storing and processing this data? Perhaps, perhaps not. But the question is worth asking.

How do we need to present this data?
Presentation of the relevant data is essential. But we often get caught up with the fancy visual gizmos which look nice but may actually be relatively poor conveyors of information or be effectively unusable on newer platforms such as mobile devices. Also, some people are better with numbers which they can then manipulate in a spreadsheet for example, while others are better with graphics. There is no universal answer to this question, but it needs to be carefully considered. Here, the goal is to avoid the two extremes where one goes "I want to drill down to the details" or "I want an overview, this is too much detailed information". There are countless books and techniques to help here, but the works of Edward R. Tufte, regardless of their age, are still highly regarded by many (and with reason!).

Are we going to even use this data?
This question is rarely asked, but it is essential. Do we want data to justify our choices or help us make them? If we make up our mind then pick the data that fits our decision, it is worthless. Numbers are easy to manipulate, just look at polls and politics. In fact, using hard evidence as a guide for decision-making is not a trivial exercise. One needs to resist the urge of thinking "I know how this plays out, no need for the data but I will include it because it backs up my argument, this way my a$$ is covered".

But can we trust this data?
The old adage, "garbage in, garbage out" comes to mind as I write this. I have the good fortune to be teaching Masters students a class on BI technologies. In this class, we start from a transactional database for an organization and slowly move toward building a data warehouse which we feed via an ETL (Extract-Transform-Load) process before trying out multidimensional analysis using OLAP cubes and data mining. One thing that stands out for students who are not used to all this is how complex this infrastructure is. Not only is it complex, but it requires a very good knowledge of the business, its relevant processes and associated data. Plus, it is layered so changes in one layer most often have a cascading effect on other layers. One important lesson that comes out of this the fact that everything we do relies on one key assumption: that the data in the transactional system is reliable. Well of course it should be, people use it everyday. Yes but do they use it the way you believe they do and entering the data where you think it belongs?

A simple example: say you have an assembly with 3 steps, A, B and C. Users are instructed to "punch-in" everytime they perform a step so that you can measure the productivity of each step. Now, users think this is stupid and decide that it will take less overall time to simply do the 3 steps and then punch-in 3 times. The result? A process that may be optimal but which steps are not accurately represented in the application. The same goes with workarounds for cumbersome or restrictive systems, usage of fields for unintended purposes etc. In BI we try to go back to the reality of the process using data from systems. If there is a disconnect between the two, we cannot provide the data we want.

So it all boils down to this: can you trust your data? Does it reflect te reality of the business and not just the reality of the business as seen through the eyes of the system? This is something which is worth checking before embarking on a big initiative to try and use that data for decision-making purposes. This is even more important when using external sources as you may not be able to judge of its quality. Plus the provider of this data may not be liable for its accuracy or reliability.

To conclude...
It may seem somewhat naive to put it like this but before we can make decisions with data, we need to make decisions about data. This is a first step which may lead to one's decision to not use data as much as originally thought, simply because one's processes cannot provide the level of data quality and reliability required to make decisions based on numbers alone.

Thursday, 19 December 2013

To SaaS or not to SaaS?

It seems that every now and then new buzzwords pop up, whether in newspapers and magazines (yes, they still exist), the Internet and everyday conversations with people. Today, slightly irritated by some of the commercials I have been seeing in the media, I wanted to talk about Software as a Service, or SaaS.

When we are looking at the history of IT in organizations, it seems that our track record is somewhat mitigated. Failed implementations, delayed deliveries, budget overruns are only but a few examples of issues that C-level people think of when they hear about a “great new tech”. The very recent example of the major hiccups of the U.S. healthcare web platform springs to mind as I write this. All this happens despite the large sums of money thrown at these systems and the fact that they are developed by vendors who are supposedly highly competent. Heck, if I were a CIO, I would worry too when I hear about a “great new tech”…

It is with these images in mind that I want to talk about SaaS, what it can give, and what issues it may also trigger. The great promise of SaaS, like many pieces of software, is that it allows you to focus on the business rather than the technology. By moving some of the technological burden outside of your organization, you effectively free up some resources to do other, more interesting things (and by that I mean things that bring in $$$).

First, the idea behind SaaS is not really new. Basically, one pays to consume a service, except the service in this case is software. You may pay for it on a monthly basis, on a per transaction basis, or a combination of both. Either way, the promise of SaaS lies in the fact that the vendor takes care of the burden of maintaining the hardware and software infrastructure for you. In most cases, you will simply access the software via a browser interface.

In theory, this is a win-win, especially in models where pricing is proportional to usage. Use it more, pay more. Use it less, pay less. Doesn’t that sound great if you run a seasonal business? In the “old days”, the fancy equipment would sit largely idle most of the year while it would still require maintenance, upgrades, and incur other related costs (e.g., depreciation of asset).

But perhaps there is more to it than that… And I would argue that going for SaaS or switching some software needs to SaaS simply for the sake of money is probably not a good idea. If we look at IT as something that can potentially generate value for a business, it can take on a strategic value of its own. Instead, we often look at IT as a cost center that we want to keep under control. This is arguably a recipe for disappointment, unless your strategy is to compete exclusively on costs. And even then, Wal-Mart may compete on costs but its investments in IT are pretty impressive.

So the argument here is to look beyond the money you can save and look into the value you can create with SaaS. Perhaps this means that costs will not be lower with SaaS. However you may gain a competitive advantage. Take the example of an ERP (enterprise resource planning, e.g., SAP). ERPs are great but they are heavy and often effectively create templates for “best practices” that end up being followed by organizations implementing them. Bottom line, the best practice becomes the baseline for everybody and therefore, not a source of competitive advantage. If on the other hand, you do your homework and find a way to answer your business needs using a variety of interconnected pieces of software (e.g., some using SaaS, others not), you may not only retain but also create a competitive advantage. In other words, rather than using a standardized pipeline, you can assemble the puzzle yourself and truly shape it to your business’ needs.

So, to SaaS or not to SaaS? The question is more: what are your business needs? How can you answer them? If it turns out that a SaaS vendor holds the answer to these questions, then SaaS away. If not, then don’t. We tend to be blinded by the trendy things but the essence of conducting business has not changed. In that respect, one must find the technology that answers one’s needs rather than trying to fit a clunky piece of software for the sake of it.

There is some research that is starting to pop out on these issues. We are moving away from some of the technical questions related to SaaS and looking into the business questions that SaaS can help answer. This is a good thing, because we will be able to move away from the trendy toward the essential. So to finish on this, here are some of the questions which I think are relevant when considering going for SaaS. As you will see, many of these questions can deal with any piece of software, whether or not it is based on the SaaS model.

General questions

Why do you need it? What is the business imperative behind the question regarding the use of SaaS?
What can you do with it? When evaluating a piece of software, what is the potential for value-creation beyond any short-term cost savings you may anticipate?
How will you use it? How would the SaaS piece of software “fit” in your IT architecture?

Do you have special integration needs with other applications (e.g., an ERP’s financial module)? If so, are they covered by the vendor?
How will the use of the SaaS piece of software free up IT resources (e.g., staff, equipment)?

How would you allocate these freed up resources?
Will your IT staff be in charge of managing the vendor? If so, are they trained to do so?

What are the implications? Would you have to tell your clients that data about them may potentially be hosted outside of your organization’s boundaries? Could it be an issue for them?
Is it even feasible? Are there any laws or regulations you must follow regarding the storage of data? If so, do vendors comply with them?

Vendor-specific questions

How would you survive in case of issues with the vendor?

In case of service disruptions (e.g., network outage)
In case the vendor goes bankrupt

What service level agreement (SLA) does the vendor provide?
How are the software maintenance and upgrade windows arranged by the vendor?
Do you need to be able to retrieve your own raw data from the vendor at any time? If so, how long would it take the vendor to do so, and in what format would you get this data?
How does the vendor handle the multiple clients and environments they have to host? Do they share computing resources, have their own dedicated resources? Your IT staff may help understand what the implications of these choices are!
Can the vendor provide a trial version of their software? If all you need to access it is a browser, a sandbox environment should be easy to provide.

So, unlike what some of these commercials claim, there is no miracle, no silver bullet. SaaS is just another option availble to your business. It may be the best thing since sliced bread, but that is up to you to decide.

Friday, 2 August 2013

Meta-what?!

For my first blog post (you do have to eventually be cool too!), I have decided to mention a topic that is dear to my heart. It was back when I worked in software development (and particularly in database administration), and it still is, albeit in a different context.

So today we talk about metadata and why producers should care about it more.

Some Context...
I am currently a PhD candidate in Information Systems (IS). One of the things I have to often do during that time is to read and review literature on different topics. This literature comes from a variety of sources and is accessed via a variety of online journal portals (EBSCOHost, ABI, ACM, IEEE Xplore etc.). As I browse through journals, do searches etc. I gather a number of articles which I download in PDF format and which I try to remember to add to my EndNote (or Zotero, your choice) library. After a few days' research I can end up with a hundred PDFs or so saved on my computer which I will then have to read, annotate etc.

Couple of issues here:

These portals are built like 1990's online shopping websites. What I mean by that is that navigation is clumsy at best and downright frustrating at times (e.g., expired proxy sessions, slow downloads)
Often saving the document and its citation info are two separate tasks
These documents contain no useful metadata (see below for why that matters)
These portals have Application Programming Interfaces (API) available but these are not public and reserved for institutions etc.
When I search for articles, I am not in the mindset of organizing everything neatly already. I am wading through tons of (virtual) papers and cannot be bothered to save everything, add it to EndNote etc.

So what is metadata?
Metadata is "data about data". In other words, it is about giving some information regarding the actual data users will, well, use. It seems trivial and rather unnecessary but from the perspective of an outsider, metadata is not only cool, it can be crucial. It can be simple and fixed, such as specific header information you can store in a PDF (e.g., title, author, copyright info). And it can be more complicated, such as extensible metadata on database schemas. And this is where I want to draw a parallel with my previous employment. Metadata is not documentation, but it provides important clues as to something the data can help you achieve. The important part is that it is packaged alongside data itself but resides in a "logical" space that is separate from it.

What do I do with it?
Well, currently I am sitting on a pile of about 150 PDFs files, named using the following pattern: author1_author2_year.pdf. This is useful and in a sense somewhat akin to metadata. For instance, I have macros that I use to generate template Microsoft Word documents with a neat table with a hyperlink to each PDF to enter review info for each file. This way I concentrate on the task of reviewing the literature and not formatting documents to do so. This is because as a programmer (or ex-programmer, I'll let you decide on the technicalities), I am inherently lazy regarding repetitive tasks. I like to automate these as much as possible. Plus it makes for a healthy distraction from reading papers :-).

My issue here is that PDFs can store useful metadata, but when it is not done, well, it is useless. If there were actual metadata and APIs were accessible (ok I could even do without that one), I could program something automatic like the following:

Read PDF metadata
Look up reference online using provider API
Download citation info
Add it to my EndNote library
Cross-check that all PDFs are in my EndNote library and ready to be reviewed in my Word documents

Now that would be neat. Unfortunately the lack of metadata and accessible API prevent me to do so. So I have to painfully open each PDF and do that by hand. I'll find a way to automate part of it somewhat, but it will be clumsy.

So, what can we learn from this?
Well, metadata can be very useful. Back when I worked, I used it a lot on Microsoft SQL Server schemas (using their Extended Properties) to create a sort of mini-descriptor on all tables that could be useful when automating programming tasks (e.g., automatic cleanup of archiving tables etc.). And this is where I go back to something I read on other websites about "services", "APIs" and so on. If you are going to publish APIs for clients, students, or whatever outsider you have to allow interacting with your own services, metadata is not just neat, I think it is pretty crucial. Documentation will only get you this far. Metadata can be used as a sort of "online documentation" that programmers can actually use when interacting with your data. Regular documentation cannot do that. And it's not just good for them, it can be good for you too. For example you may reduce resource contention as consumers can easily discard irrelevant information or not have to request extra, unnecessary data to do what they want using the metadata you give them.

Now if you'll excuse me, I have to get back to these PDFs...