The job of keeping up with the latest in research is getting increasingly difficult for researchers. In 2018, over 2.5 million research articles were published across the globe, and the publication output is growing at about 4% annually.1 If we add regional and local language articles, this number increases exponentially.
Macro changes brought about by the COVID-19 pandemic
When the COVID-19 pandemic gripped the world in 2020, it impacted academics and research production and publication in unforeseen ways. Clinical trials were fast tracked, researchers collaborated across borders like never seen before, and publishers joined hands for perhaps the first time to reduce time taken to publish and make high-quality research available to the scientific community faster than ever before. Preprint servers emerged as a means of rapid dissemination of medical research related to COVID-19 before articles were officially published in peer-reviewed journals. Preprints is also said to have played a significant role in shaping early conversations and molding key policies at the beginning of the global pandemic.2
These macro changes and the deluge of research output meant researchers now had to deal with a much higher volume of relevant research and the expectations for them to stay abreast of new developments in their field were higher than ever before. According to a 2019 study done by Elsevier, a researcher spends on average 9 hours a week on research literature reading, out of which researcher spends an average 4 hours searching for good research papers and then 5 hours reading those.3 This shows that researchers spent as much time searching for the right papers and good research topics as reading these.
Helping researchers discover and read a wide range of reliable research articles
R Discovery aims to make this research searchable, discoverable, and readable. R Discovery, available as a mobile app and online, boasts of more than 100 million articles across 9.5 million topics in its repository that are vetted for quality and duplication.
Building a growing repository of research
R Discovery ingests, standardizes, de-duplicates and merges millions of data records every week from some of the biggest data aggregators like CrossRef, PubMed, MAG (closed December 21, 2021). We have also brought in over 30 million open access (OA) articles from UnPaywall. Both these together ensure that R Discovery users have access to full-text OA papers and updated metadata for a large variety of published works. We’re working to further enrich our content bank through direct pipelines and partnerships with global publishers, such as Springer Nature, Taylor & Francis and IOP publishing.
The R Discovery team is also working to introduce new filters and search methods on the app, improving end-user experience and research irretrievability.
De-duplication to ensure you see only relevant research content
R Discovery also de-duplicate all data twice a week to ensure users are only seeing the relevant version of the published paper and its metadata data in the smart app. Our team has developed a complex and robust logic and a priority matrix for de-duplication of content, where we rank the content by its source and then de-duplicate at a field level across hundreds of fields. In the last year alone, we de-duplicated over 20 million records to deliver a better research reading experience for our users.
Disambiguation of data to optimize your search
Disambiguation of data across key data points like journal names, author names, and publisher names present a significant challenge that remains unsolved. But not for long. R Discovery is working on some smart solutions to present researchers only the cleanest data for researchers.
Author name disambiguation
There are more than 400 million author names in our database, but only 3.6 million of these are completely unique. The large overlap of first and last names, for instance, due to common first names like John, Robert, and last names like Smith, Lee, makes it difficult to distinguish one researcher from another. It also makes it difficult to attribute the right papers to the right person. R Discovery is using machine learning-based pattern matching and AI-based algorithmic approaches to solve this problem.
Journal and publisher name disambiguation
While there are nearly 50,000 journals across English language publishers, there are several hundred publishers and several data aggregators. This means that each journal name could have hundreds of variations; we have over 1.2 million variations of journal names in our content bank alone. Researchers face a similar challenge when it comes to publisher name variations, making it impossible to effectively search or filter results by journal/publisher name. The R Discovery team is using data cleaning, normalization, and pattern matching techniques to provide the cleanest results for researchers.
Apart from tackling the disambiguation in author, journal and publisher names, R Discovery is also actively working to eliminate predatory journals from its content repository. Our team has so far identified more than 160,000 fake records and deleted them from the repository, so R Discovery users only see journals and articles that are trustworthy and reliable.
References:
- The countries leading the world in scientific research. World Economic Forum, January 2020. https://www.weforum.org/agenda/2020/01/top-ten-countries-leading-scientific-publications-in-the-world/
- Majumder M, Mandl KD. Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility. The Lancet Global Health, March 2020. https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(20)30113-3/fulltext
- Trust in research. Research Survey by Elsevier and Sense About Science. June 2019. https://www.elsevier.com/__data/assets/pdf_file/0011/908435/Trust_evidence_report_summary_Final.pdf