I’ve always had a problem with the term “big data” in media and entertainment
Here in California, people tend to throw around the term ‘big data’ quite a lot. Its most assuredly the hot topic of the day, and companies are scrambling to make sure they’re on top of it, whether they’re a vendor of big data products or an organization with a big data problem.
I used to think that big data applied only to governments, extremely large bio-tech firms, market research firms, or any other company with an insurmountable data about people and what they do.
But I’ve come to realize that ‘big data’, as a term, isn’t really about the data itself. Its more about the problem, and what solutions people are putting into place to solve it.
If you’ve done any research on big data, you’ve probably heard of the three V’s. As time goes on, more Vs are added, but it really started with; volume, velocity, and variety.
Media and Entertainment companies know about volume and velocity very well. The biggest files in the world are video files (think about a 120 minute film with 48 frames per second in 4K). Motion picture studios and broadcasters know how to store files. They’ve been doing it for 20 years. Maybe the amount and size of digital content that is created is increasing every year (velocity), but most of these companies are ready to scale their storage.
I don’t want to belittle the challenge that storing petabytes and petabytes of files can present to these companies, but when you compare it to the problem of variety you start to see that its a walk in the park.
These organizations have to somehow manage scripts, call sheets, word docs, PDFs, budgets, excel docs, asset tracking databases, raw camera files, lighting tables, dailies, masters, proxies, raw audio files, audio mixes, music files, still images… the list goes on and on. All these types of data are structured in completely different ways. How do you store, browse, access, preserve and share all of this structures and unstructured data? Do you jam as much metadata into the file name of every file? Can you search by provenance and context of your content? Can you quickly correlate budgets with call sheets and number of assets?
If you don’t have an answer to these questions then you have a big data problem.