Textual data analysis of Corona research over decades

image source

Scientists first identified a human coronavirus in 1965. It caused a common cold. Later that decade, researchers found a group of similar human and animal viruses and named them after their crown-like appearance

Between 1965 and 2020, various Coronaviruses have been identified across Globe, researched upon, linked with sources like bats, turkey, camel, and several techniques were proposed, tested, and developed

Utilizing the 5GB of textual data from over 5K Coronavirus research papers, the goal is to understand how the research around Coronavirus has evolved over decades

Let’s begin!

Data is downloaded from this link

There are more than 5K…


credits: freepik.com

On a rainy day, Emma is sipping her tea, occupied in her thoughts

She would end up sharing some of those thoughts with her circle, few will be researched further, few will be written down or few will be acted upon. Whereas most of it will remain private in her memory

In few cases, when she chose to act (buy, play, engage, apply, register, write, share, upload, click, research) upon it, she ended up interacting with the environment

She wasn’t entirely aware of the data ecosystem of which she will be more a part of today than she was yesterday


Every single day, 30% of the time is spent on media consumption, on average. 6 hours on Social Media itself, which is only increasing every year. Note that this is an Average figure and it will vary based on Age groups, geographical location, internet access, etc.

Abhinav Pathak

Data Science professional

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store