"Free" Does Not Equal "Open"
When it comes to thinking ahead about what is happening to the digital economy and where this all might be headed, a passage early on in the new book "21 Lessons for the 21st Century" by Yuval Noah Harari packs a punch:
The merger of infotech and biotech might soon push billions of humans out of the job market and undermine both liberty and equality. Big Data algorithms might create digital dictatorships in which all power is concentrated in the hands of a tiny elite while most people suffer not from exploitation but from something far worse -- irrelevance.
While human irrelevance and the related efficiencies may be the destination of current big infotech business models, the path to irrelevance is paved with exploitation. This is the road upon which expertise, knowledge, and data are ingested.
Infotech organizations have created some market asymmetries (Amazon is a strong competitor in nearly any space they enter), but none so stark as those created by organizations using free information to create manipulation engines based on a surveillance-advertising model -- Facebook, Twitter, and Google.
Because of the assumption that "open" and "free" are equivalent, and an argument that continues to this day that "free/open" is a natural outcome of the digital economy -- an assumption Google and Facebook naturally drove, through messaging and, in the case of Google, support of Creative Commons -- paywalls became anathema, a major "blame the victim" mentality was propagated ("just change your business model" or "find new revenue streams"), and long-standing information businesses started to crumble. Local newspapers everywhere are closed or struggling. Scholarly publishers are under new pressures to toe the "open" line, despite the professional communities served generally seeing no real problems to be solved by this approach.
Google has been involved in the scholarly communications market for a long time now, with Google Scholar. Recently, it extended its reach via a partnership with HighWire Press via the CASA access control solution, which barters data exchange with Google in trade for Google technologies.
The recent announcement that Google is creating a search engine for scientific datasets has been hailed as a technological advance, which is certainly is. However, it is also a continuation of Google's business model of making making data free, while not making data open.
This requires some explanation, because certainly the data you get from Google is free, and therefore open. Or is it?
What Google shows you in a search result is a subset of the data they have gleaned from sources. It is not all the data they have, by any means. They don't show you the date and timestamps of when they scraped the information. They don't show you the algorithms' scoring outputs, or the factors used to generate the subset you are seeing. They don't share all the details gleaned from a CASA login via a participating journal. They don't show you the email login timestamps of the people affiliated with the organization they know are using Gmail. They don't show you the GPS coordinates of the employees at NOAA or NIH using Android phones. They don't show you the machine configurations of the servers they queried to get the data, or the cookies they put on your machine when you searched, and the data these will reveal about you in the moment and over the next few days.
In order to extract all this data from you, the only thing Google has to do is show you a small and superficial selection of their vast data troves. There is no way for you to see the rest. That's all theirs.
You have no access to what they know, or even how they know it. This isn't meant to instill paranoia, but it should give you pause if you are truly devoted to "open" data and "open" publishing. Why does Google get a free pass? You have given them unknown information about you, your connections, your desires, your secrets, and your preferences. You have given them permission to track you so they can fill their data coffers with abundances, while you receive a few measly search links at no cost in exchange. Now, we're giving them access to scientific datasets at no cost so they can share a smidgen of what they glean from them and their integration into the overall Google datastore.
Facebook is also interested in this space, via their funding of Meta and bioRxiv. But Google seems to be not only ahead of them, but also has not been viewed as skeptically until recently. Now, the press smells blood in the water, with Google found to send reams of data from their Android phones even when users turn data-sharing off, and with the discovery that Google and Mastercard had a secret deal through which Google gained access to credit card spending details they could use to better target their online surveillance and manipulation ads.
The reason for Google's incredible power is that its users aren't the customer. Users are being farmed for data. Google lures us in with admittedly great products, engineered fantastically well, and designed with superlative usability. But the end result is still user exploitation.
Billions of dollars and thousands of engineers and designers can do wonderful things. But the business model is still fundamentally exploitative and coercive.
There are better ways to make billions to hire thousands of engineers and designers. There are ways to do this that not only preserve but promote the economic dignity of people and society, while making the economy work better by making people relevant again.
But until the business model changes, we won't be able to move to these. "Open" is not just another business model. It is a business model that we've come to associate with "free," and it is a business model with particularly exploitative features.
An example in Jaron Lanier's new book comes from Google Translate, which appears to be a nearly magical technology, but which is in fact highly dependent on people who can themselves translate language. Thousands of terms, phrases, and descriptions are translated by these people every day via Google, without them necessarily knowing and certainly without them being paid. This happens because Google has built interfaces that encourage this kind of input, and then they harvest the data and push it into Google Translate, basically getting free labor from people without them knowing, using this to create a technology solution that acquires more data, and so forth. However, there is no apparent end to Google's dependence on this unpaid workforce, as language is a living thing, and despite the promotion of AI and machine learning, there's no other answer on the horizon.
Why not pay these translators? Why not dignify them with a clear role, a fair wage, and an acknowledged contribution? This would seem a path to getting more work done, not less, while being more fair, more dignified, and more sustainable.
This change would require users of Google Translate to pay a small annual fee to use the service. With advances in e-commerce, this is certainly entirely feasible. I can buy a song with my fingerprint in 10 seconds. Why not pay $0.10 to user Google Translate for a day using a similar approach?
Promotion of "open" as "free" takes business model options off the table, leaving only those that expose populations to manipulation by a few powerful, technologically-advantaged information oligarchs.
A group of 11 European funders called Coalition S -- the group has an apparent penchant for naming themselves in a manner reminiscent of a bad spy movie's villainous global -- plans to impose open access on researchers by funding edict. This would undermine not only the economic but, by association, the academic dignity of researchers and authors by limiting their choice while crippling the quality signals in the market. Works become fee-for-service, and publishers seeking to grow or even maintain a level of excellence with be forced into the surveillance economy by default were this plan to become reality. It would also ironically make data-harvesting the main business model, and we know who is already good at that.
Coaltion S not only smacks of the patronage shift I've written about, but their plan seeks to shift an entire prestige economy into another information oligarchy, where scientists are the product, their works are fuel for large technology companies, and there are no commercial opportunities beyond a sort of digital servitude.
Science is already open, as noted in an earlier essay, and as noted by Paula Stephan and Jaron Lanier. Its publication system is built to reveal secrets, and creates incentives against secrecy. However, if it is forced into the surveillance economy any further, secrecy may re-emerge as a natural reaction. In this way, "free" will certainly not equate with "open."
"Open" does not need to mean "free," just as "free" does not mean "open." We could pay for Google, and demand that we have more data than what they offer for free, or that they take less data when we use their services. Google has separated the two via a surveillance business model, giving you free without giving you open. Is this the future of scientific and academic publishing? A large technology oligarch scooping up data from papers, research datasets, bench work, phones, photographs, GPS devices, sensors, and so forth, and compiling this, while all we get are free papers? And while scientists reconsider sharing data and findings with secretive data overlords?
These are all open questions.