In Part II of this two-part blog, EIFL Copyright and Libraries Programme Manager Teresa Hackett examines how the COVID-19 pandemic has highlighted the importance of the right to research through two key issues, text and data mining and digital preservation by cultural heritage institutions, and how WIPO’s proven formula could address the issues. In Part 1 of the blog, we looked at the immediate challenges the COVID-19 pandemic posed for the copyright and licensing framework as education moved online.
The right to research: text and data mining
Scientists are in a global race against time to combat coronavirus. They are active on many fronts. All over the world, the genetics of the virus are being tracked as they mutate, to help identify and contain community transmissions. Research on antibodies is being scaled up to develop new tests, and immune responses are being investigated to design new treatments and vaccines in record time.
And scientists are doing it together. Like never before, data and research results are being shared in real time through open searchable databases, such as the WHO COVID-19 database that gathers the latest international, multi-lingual scientific findings and knowledge. The health science preprint servers, medRxiv and bioRxiv, that share academic research before peer review and publication in journals, have seen a surge in submissions relating to COVID-19 with over 7,400 papers currently available (papers are being uploaded at a rate of 50 a day).
Making sense of the emerging data
A powerful tool to make sense of all this emerging data is text and data mining (TDM). TDM is computer-based analysis of large amounts of data in order to gain knowledge. Sophisticated computational techniques are deployed to identify relevant research papers in the database to be ‘mined’, and to enable meaningful patterns and links to be made between otherwise unconnected documents, generating new insights and understanding.
The benefits of data analysis for scientific discovery are not new. For example, in 2003, the thalidomide drug, taken off the market decades earlier, was found to have potential to treat chronic hepatitis and other diseases not previously associated with the drug. In 2007, scientists discovered a new link between genes and osteoporosis by using a TDM tool to analyze PubMed, a database of 30 million citations for biomedical literature.
In 2014, doctors at the Hospital del Mar Institute for Medical Research (IMIM) in Spain evaluated the usefulness of TDM in respiratory diseases, concluding that TDM can play a significant role in its research and clinical care. Fast forward to 2020, and doctors at IMIM are leading a research project on respiratory complications in COVID-19 patients.
In fact, TDM has already played an important role in the pandemic. In late 2019, BlueDot, a Canadian start-up company that uses data analytics to mine over 100,000 information sources in 65 languages each day, was among the first to identify the emerging risk from COVID-19 when it sent the first warning to the world.
TDM and copyright
Databases of interest to TDM researchers are often managed by the library in an institution, for example, subscription-based e-resources (where usage is governed by the publisher licence), and institutional repositories that provide open access to the research outputs of the institution.
Since TDM projects usually involve copyright-protected works, copyright law comes into play. Some TDM activities, such as the mere reading of a database, will fall outside the scope of copyright protection, and copyright does not extend to facts and data. However, unless the database is open access or Creative Commons licensed, other activities are likely to be restricted and need authorization (through a publisher licence or a copyright exception).
For example, effective TDM entails copying entire works to create a database for the mining process (right of reproduction), the data needs to be shared with other researchers for review and replicability (potentially implicating the right of communication to the public), and if the project involves international collaboration, it will need to be sent to fellow researchers in other countries.
Disparities in national laws
Some publishers seek to monetize text and data mining by licensing TDM to database subscribers (such as libraries). But each publisher can license only their own works, while the practice of researchers often is to search horizontally across their whole field of study, not vertically publisher by publisher. The real power of TDM lies in the ability to simultaneously search across multiple databases and disciplines - potentially across holdings in different countries. In addition, negotiating seamless, effective TDM licensing terms for thousands of publishers is a virtually impossible task.
To assess current copyright rules, American University’s Program on Information Justice and Intellectual Property (PIJIP) is undertaking a mapping exercise of research rights in copyright laws around the world. In July 2020, preliminary results of its work were released to a WIPO seminar.
The work shows that in most countries researchers can make and use a TDM database for a non-commercial project, depending on the interpretation of rights for research or private study. But in a few countries there are clear restrictions, for example, because exceptions are limited only to “excerpts” or “quotations” of works. In doing so, copyright effectively closes off an important modern tool of value not only in combating coronavirus, but in addressing a whole range of critical issues facing humanity from cancer treatments to climate change.
The PIJIP research also found national variations in permitted activities, such as differences in subject matter and rights covered by TDM. Some laws prohibit commercial uses, or impose contractual or technical restrictions. Only a minority of laws address the sharing of TDM databases between researchers, and no law reviewed explicitly authorizes making the database available across borders.
TDM is a vital tool in the fight against coronavirus
The disparities in the scope of permitted activities in national laws, and uncertainty about cross-border uses risks leaving scientists in legal limbo exactly at a time when scientific collaboration has gone truly global.
The modern research technique of text and data mining is as much part of the toolbox in the fight against coronavirus as face coverings or contact tracing. Not only should TDM be universally allowed, it should be encouraged. Everywhere. We owe it to the scientists who are working around the clock to find a cure to this terrible disease that is wreaking havoc around the world.
The right to research: digital preservation
Traditional and modern research methods (such as text and data mining) have one thing in common. They both require the ability to access data and publications. Depending on the discipline and the nature of the research, the source material needed might be hot off the press or centuries old, published or unpublished, in-commerce or long out of print, born digital or in paper format.
For example, an epidemiologist studying the spread of coronavirus might additionally need data on the spread of other infectious diseases, such as the Asian flu (1950s) or Ebola (1970s). Behavioural scientists advising governments on public health messaging to slow the spread of Covid-19 consult all types of material e.g. surveys and social media, news reports and posters, as well as professional journals and reference books. Historians helping future generations to understand the extent of COVID-19 and its impact on societies will require access to official records and other primary source material from current and previous events, including the 1918 influenza pandemic, until now the most severe global health emergency in modern history.
Creating COVID-19 collections for future research
Responsibility for collecting, curating and preserving all this material lies primarily with cultural heritage institutions - libraries, archives and museums - that have the mandate and expertise to undertake the work. (That’s why many copyright laws give cultural heritage institutions rights to make preservation copies). Memory institutions also have a public interest mission to make their collections available for scholarship, in this case, to understand and contextualize the COVID-19 pandemic to help overcome such crises in the future.
Right now librarians around the world are working hard to identify, collect and preserve the diverse, multi-channel sources of information on COVID-19, including research data, scientific articles, statistics and graphs, public health videos, social media and news reports.
For example, in Ireland, the National Library is building a digital COVID-19 collection by archiving websites that reflect this moment in Irish life. (It is estimated that a website is changed or deleted within 100 days so if websites are not saved now, the content will be lost forever). At University College Dublin (UCD), the Research Repository service, maintained by the Library, is collecting and preserving articles and working papers created by college researchers on COVID-19. And Dublin City Library and Archive is creating the Dublin Covid-19 Pandemic Collection to ensure that the archive represents a true picture of how the city and its people fared during the pandemic.
These examples of curated COVID-19 collections will be of enormous value not only to future generations of scholars and scientists, but also to those in the coming years who will study and draw lessons from the scientific, economic, political, sociological and cultural aspects of the biggest global health crisis of modern times.
Digital preservation and copyright
One of the most effective ways of ensuring enduring access to library collections is to digitize the work, or if it is born-digital, to transfer it to a preservation-quality file format and to safely store the digital object off-site. But preservation strategies for digital materials always require the making of copies, and too many national copyright laws fail to allow digital preservation for copyright-protected material.
In fact, over a quarter of WIPO member states do not expressly permit preservation at all, even for print formats. In countries that do allow preservation - just like for TDM - statutes vary in fundamental ways, such as who can make the preservation copy, what may be copied, and in what format. The lack of a clear right to import and export also compounds the problem, stifling international cooperation between libraries on joint preservation projects.
Just as scientists should not be hampered by copyright law, neither should librarians and archivists, who should be allowed to follow best practice preservation techniques, to provide electronic access to preserved works regardless of the researcher’s location, without unnecessary restrictions on usage.
Text and data mining and digital preservation are fundamental to the right to research. Publisher licences don’t provide a proper solution, and existing copyright laws are inadequate, especially for cross-border uses.
WIPO’s proven formula
A global problem requires a global solution. For this reason, EIFL together with the international library and research communities have been calling on the World Intellectual Property Organization (WIPO) to work on an international legal instrument to facilitate the cross-border sharing of TDM tools and databases, and to set out clear rules enabling preservation by cultural heritage institutions, including collections dispersed across national borders. Only WIPO has the mandate to set these global copyright standards, and only WIPO can address cross-border issues.
WIPO has stepped in before. The Marrakesh Treaty for persons with print disabilities addressed disparities in national copyright laws on the making of accessible format copies for persons who are blind and visually impaired, as well as prohibitions on sharing these copies across borders.
The Marrakesh Treaty resolved the copyright obstacles by creating a mandatory exception allowing accessible copies to be made, and allowing the copies to be shared by beneficiaries, including libraries, in countries that are party to the treaty. The treaty also contains a novel provision allowing a work lawfully created in one country to be used in another. Such a rule would go far in ensuring that countries that do not currently allow the creation of TDM databases can use databases created by others.
The Marrakesh Treaty is WIPO’s most successful and popular treaty to date. Using the same proven formula, WIPO could also address the copyright obstacles to research highlighted by the coronavirus pandemic.