-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and viruses
Kareena Singh
Jitu Ram Bhargav
- To achieve the target “Which viruses are reported as being involved in causing viral epidemics?”
- For better understanding, not all viruses are infectious and may lead to an epidemic or a pandemic for that matter. There are few viruses that have been reported as being involved in a viral epidemic, whereas few are not. The goal is to find out which viruses can cause or have caused an epidemic outbreak.
- To create Dictionary on viruses from scratch. (viruses not builtin ami dictionary)
- To download a corpus of 1000 articles using
getpaperson viruses that cause viral epidemics. - To run
ami searchfor the viruses dictionary. - To perform Binary Classification of papers using
KNIME - To do Sectioning of the papers using
ami section - To Identify and extract entities and display the data.
- Initially the communal corpus called
epidemic50noCovof 50 articles on viral epidemics will be created. - After analyzing the above corpus, we shall later come up with own individual corpus consisting of 950 papers. It shall be created using the virus dictionary.
- The corpus of 950 articles was created and committed here in 4 parts. https://github.com/petermr/openVirus/tree/master/miniproject/virus
- virus dictionary a test dictionary was created on human viruses to begin with. https://github.com/petermr/openVirus/blob/master/dictionaries/test/virus.xml
-
Use of
getpapersfor downloading a corpus of 950 articles from PubMedCentral. See https://github.com/petermr/openVirus/wiki/getpapers -
Use of
ami/SPARQL withamidicttool for creating dictionary on viruses. See https://github.com/petermr/openVirus/wiki/INSTALLING-ami3 -
Using
amisearchfor testing the virus dictionary. See https://github.com/petermr/openVirus/wiki/ami-search and https://github.com/petermr/openVirus/wiki/How-ami-search-works -
Using
ami sectionto split a document in aCtreeinto sections (front, body, back). See https://github.com/petermr/openVirus/wiki/ami:section -
Data analysis using
KNIME.KNIMEallows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. See https://github.com/petermr/openVirus/wiki/Tools:-KNIME -
Using
Python,KerasandJupyter notebook.
For Python see https://github.com/petermr/openVirus/wiki/Tools:-Python
Keras is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in just a few lines of code.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more
-
Rfor summarizing the extracted information.Ris a powerful language used widely for data analysis and statistical computing.
- For displaying the extracted information, we will use excel for creating spreadsheets and other forms of data display such as histograms, timeline, pie charts and graphical representations. This is called Scoping review.
- Analysis of the 50 papers of communal corpus
epidemic50noCovand displayed in the form of a spreadsheet. Finished 🟢 - Sectioning of the 50 papers using
ami sectionFinished 🟢 - Downloaded a corpus of 950 articles on viruses using
getpapersFinished 🟢 - Sectioning of corpus 950 using
ami section. Finished 🟢 - Created a test dictionary (link above) with 30 entries on human viruses using a test file containing a list of names of human viruses. Using
ami dict. Finished. 🟢 - Created Dictionary using Wikidata Query Service and SPARQL. (Finished) 🟢
- Run
ami searchon corpus 950 and recieved cooccurrence. (Finished) 🟢 - Committed the corpus 950 on GitHub. (Finished) 🟢
- Installation of Jupyter notebook as a machine learning tool. (Finished) 🟢
- Manual classification of corpus950 (Ongoing) 🔵
- Download and Install Github Desktop from here https://desktop.github.com , Log in to your github account and Clone the repository openVirus using URL
- Remember the folder of your system where you have cloned the repo. Open the folder of your miniproject and move your corpus950 files here. Go to github desktop and you can see the changes committed on your left
- Add your summary like 'added files' to miniproject and commit to master.
- Then click on Push changes and you data will be committed. ( It will take time depending on your file size)
- here https://github.com/petermr/openVirus/tree/master/miniproject/virus
- I am an MSc student and want to pursue PhD in related field.
- Helpful in understanding the current scenario in viral epidemics.
- Give an idea about accesssing the online stored data and how to use the stored information for our research.
- This project will help me in understanding the research methodology by using computational biology and bioinformatics
- To create and maintain a dictionary for viruses which are responsible for causing viral epidemic.
- To find the papers and articles that are related to viruses and viral epidemic.
- To identify the different types of viruses which causing viral epidemic around the globe.
- To collect updated data from trusted sources which are related to viruses and viral epidemic.
-
getpapersto obtain papers -
amifor to create and maintain dictionary -
ami searchuse for testing the dictionary -
ami sectionuse for a document sectioning -
amidicttool for creating dictionary
- An overall information of specific work on which an individual is working on.
- It consists of the work done till date and tells that what will be the possibilities of further research in the topic under limits or beyond limits.
- An editing, analysing platform for the processed documents and papers and also used for uploading data and create dictionaries.
- My dictionary is
viruscreated from wikidata using the softwareami.
- It consists of 950 articles which are taken from European PubMedCentral with help of
getpapers - EuPMC is a collection of journals literature and research articles related to life sciences around the globe.
- No bugs or issues faced till now, Hoping that by communicating with the allocated mentor and members of openVirus group the problems can be solved till the completion of my four weeks programme
- I learnt about purpose and usage of
getpapers,ami, corpus 950 etc. - I understood how to update and edit pages on GitHub.
- I came to know that from where I can collect the articles.
- I also understood about how to download and install software from GitHub.
``