-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and non pharmaceutical interventions
Zeyang Charles Li
NPIs are actions, apart from getting vaccinated and taking medicine, that people and communities can take to help slow the spread of illnesses like pandemic influenza (flu). NPIs are also known as community mitigation strategies [1]. Common NPIs include face masks, social distancing, quarantine etc.
This miniproject is set to find, in literature, whether the reported NPIs have an effect on controlling viral epidemics.
-
Conduct manual binary classifications on communal corpus
Epidemic50noCovand create a spreadsheetSTARTED -
Create dictionary specific to this miniproject, starting from Wikidata/Wikipedia and build dictionary with
amidictSTARTEDhttps://en.wikipedia.org/wiki/Non-pharmaceutical_interventions -
Re-run the query with project-specific dictionary and retrieve a new corpus of 950 papers with
amisearchgetpapers -
Section papers to extract paragraphs mostly related to NPIs
NOT STARTED
-
amidictwill be used for creating dictionariesamiinstallation failed due to objective errors.SPARQLis then opted.Current step: merging multiple SPARQL queries
-
getpapersfor retrieving papers in new corporagetpaperspaper retrieving failed due to proxy server.getpapersruns after proxy setting change, a reduced corpus (k=580) was created.Current step: Attempts of reducing corpus size/ work locally.
-
KNIMEfor data flow -
Rfor data analyses and visualisation
getpapers runs after proxy change and VPN reset. An initial corpus (k=580) was created.
Multiple SPARQL queries results obtained. Attempted merging all using UNION feature. Then attempted removing the redundant terms by DISTINCT
BLOCK: errors during merging
Instances of and Main Subject queries.
Successfully cloned ami-jar repository (finally)
For cloning a big repo with poor internet connection (low download speed)
-
First increase the postBuffer value
git config --global https.postBuffer 157286400 -
Then turn off repository compression
git config --global core.compression 0 -
Then partially download a chunk of the repo using
depth 1. This helps reducing the connection time with remote host and reduces the risk of fatal clone failures.git clone --depth 1 https://github.com/petermr/ami-jars.git -
Once the first part is cloned, finish download.
git fetch --unshallow -
Once
unshallowtask finishes, retype git fetch --unshallow and you should seefatal: --unshallow on a complete repository does not make sense -
Change PATH for
ami
Binary classification has been done on k=580 corpus but too many false positives (19/20). This could be caused by not refining the search terms in getpaper query.
getpapers -q "Viral Epidemics and Non-pharmaceutical Interventions" -k 950 -x -o NPIcorpus2
BLOCK:
-
Re-ran
SPARQLand changed all queries toInstances ofbut many terms were lost -- terms included in 'main subject' are not present in 'instances of' -
Significant noise in wikidata
main subjectsterms and discrepancies between wikidata / wikipedia
Created a new corpus (CorpusNPI2) of 760 articles and started binary classification on both viral epidemics and NPI
Attempted altering PATH for ami but my PATH looks a bit tangled
echo $PATH
/Users/charlesli/.nvm/versions/node/v7.10.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/charlesli/Desktop/apache-maven-3.6.3/bin
Curated a NPI dictionary without SPARQL
Successfully installed ami3-2020.08.09_09.54.10 and changed PATH by editing .sh profile
Created a third corpus using query terms "viral epidemics" and "non-pharmaceutical" to minimise noise brought by "interventions", k=464
Downloaded Docker and Jupyter Notebook
Attempted creating a xml dictionary with amidict containing terms from curated dictionary.
Run smoke tests on Docker and Jupyter Notebook
Classified 40 papers from CorpusNPI3 and results were better (24 pos /40) and attempted to commit to github
Created dictionary using amidict and committed to github
https://github.com/petermr/openVirus/blob/master/dictionaries/NPIdict1.xml
BLOCK:
-
multiple words (entries) in ami dictionary (solved)
-
synonyms for terms
Installed Anaconda Navigator and ran Jupyter Notebook
Created a .csv file containing dictionary terms
Created new dictionary with phrases for terms NPIdict2
Attempted validation on NPIdict2 and showed dictionary as NULL but with 38 entries, using this syntax
amidict --dictionary NPIdict2 --directory /Users/charlesli/Desktop/NPIdict2 display --validate
Could be the display command??
BLOCK:
- Validation
Attempted debugging validation
Run ami-section on corpus (individually)
Deleted all empty repositories
Sectioned all 437 papers in NPIcorpus2 using ami-section
Run ami-search on NPIcorpus2
BLOCK: Errors during ami-search: "cannot read stopward stream"
and " SXXP0003 Error reported by XML parser: Content is not allowed in prolog. java.lang.RuntimeException: cannot transform NPIcorpus2/PMC5959063/fulltext.xml"
No updates due to illness.
run ami-search on 437 papers but the results only showed word counts for every word (not specific to my dictionary)
Change PATH for ami without altering existing PATH of MAVEN and git DONE
Rerun getpapers and download a second (third by 12/08) corpus, then manually classify STARTED
Resolve SPARQL merge issues and query noises DONE
Add properties (wikidata ID etc.) and synonyms to curated dictionary NOT STARTED
Commit dictionary to github DONE
Covert .csv file to xml DONE
Validate new dictionary STARTED
Commit corpus
[1] https://www.cdc.gov/nonpharmaceutical-interventions/index.html