Somebody told me recently that there is no such thing as a fish – biologically that is. This is apparently because any given animal that we might be tempted to call a fish is likely to be more closely genetically related to something other than another (non) fish. On the other hand we all know that fish are broadly similar in many ways and a lot of human life (including legislation and commerce) is configured to deal with fish.
I don’t think there is such a thing as big data. This is because most given instances of the digital output in the current global mega-trend of exponential data production are more similar to their immediate forebears than they are to the other lumps of (large) data. On the other hand, big data is broadly (and importantly vertically) understood to be of great importance and a cornerstone for any organisation or country wanting to reap the benefits of the digital economy. Thus the inclusion of work items and budgets to manage and mine big data are Very Good Things and no senior decision maker would be without them.
Because while the vast mass of big data being managed and mined is so-called consumer sentiment analysis (to see what we might be persuaded to buy) and mash-ups of publically available data (which can be very useful, especially for travel arrangements), the world of Engineering Big Data needs work.
In the heroic early days of computer-based numerical simulation, when memory was low and processors slow, mathematicians would employ cunning, conformal mapping and the lost art of special functions to convert canonical simulation challenges into (often) remarkably accurate and short algorithms that would run on the available hardware in reasonable time. Now and then, unscrupulous types would embed unpublished know-how into such software so that competitors would be forced to admire the author’s achievement from the sidelines. Of course you wouldn’t catch that sort of thing happening now. It was not otherwise necessary (or seen as an effective use of the space available in printed journals) to publish the raw data from the published work, and results were communicated via hand-drawn (frequently) graphs summarising the numerical results against existing literature and the thesis being advanced.
With the birth of the Internet, it has become technically possible to electronically link online journals and papers in PDF format with cited references, and this is becoming standard practise. It is also possible to link documents (including data points on graphs) with online repositories of raw data, in order to allow for detailed comparison and even the re-interpretation of data to either support or challenge the primary conclusions of the paper. This is almost never done.
The particular opportunity that is being lost, in my view, is the correlation of raw in-service and experimental data with numerical simulations. In the specific world of computational fluid dynamics the latter are increasingly scale resolving and unsteady models - required to obtain accurate time-averaged loadings for a wide range of engineering sectors. The fundamental tuning parameters that have been accepted as standard in such models are often applied way beyond the limits of the assumptions for which they are strictly valid. New mathematical machinery, including statistical correlation and Bayesian inference provides powerful tools to use data as the basis for science
So I say throw out the hand drawn graphs and keep the data big. If you publish a paper and truly stand by your results then add a link to an online repository – including all of the data points (even the outliers). The raw data might tell you (or someone else) something useful.
The cost of providing such infrastructure is becoming very cheap these days - leveraged no doubt by the folk interested in selling us consumer goods and working out how much our next insurance premium should be.