Wikipedia defines big data as "a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software."
The definition above hits on two key points:
1. Extracting insight from vast amounts of information, and,
2. Having the tools to handle the data in the first place.
Decoding the Data
Making sense from an overwhelming amount of information is challenging. We need to understand the information we do (and don't) have. Often this data isn't in a single source - it's spread out over multiple databases and systems - so just knowing what to work with we have can be an involved task.
At this point, we might establish there are gaps in the data, things that we need to either link datasets together (unique identifiers) or missing information that we require to perform a meaningful analysis. Of course, we then have to learn how we can obtain those missing data points.
Next, we need to think about what we want to do with the information. Are we looking for patterns or trends? Are we marrying proprietary internal data with external public data? Are we after something more specific - like why does product A remain on store shelves longer in one region versus another?
Big data is rarely neat and tidy. In our experience, big data starts as a mess, and our first task is to bring order to hundreds of gigabytes of information. Once we have wrangled the info, only then can we start analyzing and drawing insight from the data.
Tools of the Trade
One of the biggest challenges associated with big data relates to managing the data. The processing power needed to handle extensive files is significant - as is the requirement for an efficient and well-thought-out workflow.Merging data of different formats is challenging enough, let alone manipulating files that contain millions of records. Big data management isn't something that can be done solely in Excel, or SPSS, or Access.
Ideally, you want to be able to view all of your data in one place, without the hassle of having to change all of your files into a single format. Additionally, you want to be able to view that data without bogging down your computational resources. That in itself is an impossible task unless you have the right tools of the trade.
Our single most important tool for doing anything and everything related to big data is Alteryx. Why is this our big data tool of choice?
- Alteryx allows us to non-destructively preview big data files from numerous different formats at once without rendering our computers useless.
- We can link individual datasets together.
- Using complex logic, we can perform a wide range of functions including finding and replacing data gaps, extracting specific data-points from unformatted data, combine and manipulate wide and narrow datasets, find patterns and trends is across multiple data sources, etc.
- Build visual workflows.
- Integrate US geolocation data into the datasets.
The options in Alteryx are limitless. Stew is a big fan of CASS database matching as a way of obtaining useful geo-location information from horrendously lacking address records. Jen digs the fact she can merge years worth of monthly research tracking projects with a few clicks of a button. I love manipulating RegEx code to to pull otherwise-overlooked essential information from ill-formatted string fields miraculously.
There are indeed more big data tools to choose from, but Alteryx fits the needs of Evolve and our clients the best (over the next few months, we'll have some specific Alteryx-related content).
Big data is not easy to work with. That is unless you have the tools and the know-how. We'd love to learn what you use to deal with your data, so let us know in the comments below. And if you have big data but are unsure of how to work with it, just give us a shout.
Related Posts
Posted by EvolveKev
Kevin is all about research. Qualitative, quantitative, UX, you name it. When he's not researching, he's to be found laying down beats in his studio and hanging out with his dogs (and girlfriend). Woof.