FAIR data—looking back, looking forward.  An opinion from the US

Looking back—a hardware enabler 

 

Over the last years of the 20th century and the first years of the 21st, computer disk storage experienced a revolution comparable to the chip and fiber revolutions. That is, disk capacities went way up and disk prices went way down. (In the early 1980s a gigabyte drive cost $1000; by the mid 2000s a terabyte drive cost $100.  A price-performance increase of 10,000.)  Therefore we could start economically storing much more data than earlier. By 2010, the phrase big data emerged to reflect this. OSTP organized an interagency senior steering group to research the problem that we could now store way more data than we could effectively process. About the same time, the phrase open data crept into use, which of course could be applied to small data as well as big.

 

Since the 1990s, the NIH had been waging war on biomed publishers to provide open access to articles reporting results funded by the USG. By 2010, OSTP was carrying the open-article argument forward for all science funded by the USG. In 2013, the Holdren memo was issued requiring public access to all such articles and the supporting data. The requirement to include data as well as articles blindsided the Federal agencies who would have to implement the requirement and caused an extended period of slow progress (also known as passive-regressive behavior). Initially, the phrase public access was informally translated to open access.  Open access to articles was one thing, but open access to data was quite another. In 2014, our Lorenz conference addressed this problem, and the FAIR data concept was born to begin moving the community beyond the completely inadequate concept of open data.

 

Looking forward—a software enabler?

 

AI is an ancient computing term, having been coined in 1957. Machine Learning (ML) as a flavor of AI has even more ancient roots (1943), but it didn’t become useful until the 21st century.  Then finding cute cat pictures on the web became possible (Analytic ML). And in 2023 Generative AI burst upon the public consciousness with the release of chatGPT. Now one could generate cute cat pictures, apparently from thin air. Of course it wasn’t from thin air but from reams of big data that had trained the cat generating algorithm.

 

Is Generative AI in 2024 the same type of enabler (inflection point) for data going forward that big disks were looking back to 2014?  If we were meeting in 2028 or so, we could probably answer that question definitively, but in 2024 it’s probably too soon to know for sure.

 

A two-way street?

 

We already know that FAIR data will be (very) good for (implementing) AI. We might ask:   will AI be (equally) good for (implementing) FAIR data?  It may be too soon to put all our R&D eggs in that basket, but I suggest we should not ignore it.

Next
Next

GO FAIR US awarded a NIAID FAIR data and ecosystem contract by Frederick National Laboratory for Cancer Research