Skip to content

Commit b91d878

Browse files
oserbanchueatwork
andauthored
Adding the REDASA Open Dataset (awslabs#994)
* Added REDASA Dataset * Finalised the repository * Renamed file * Fixing YAML multiline * Fixed YAML formatting * Fixed YAML according to schema * Removing services from the YAML file * Update redasa-covid-data.yaml * Update redasa-covid-data.yaml Co-authored-by: Erin Chu <59396555+chueatwork@users.noreply.github.com>
1 parent d692860 commit b91d878

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

datasets/redasa-covid-data.yaml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Name: REDASA COVID-19 Open Data
2+
Description: >
3+
The REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in [our paper](https://www.jmir.org/2021/5/e25714).
4+
The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the [CORD-19 dataset](https://registry.opendata.aws/cord-19/), plus other sources deemed relevant by the REDASA consortium.
5+
The second S3 bucket contains a series of documents surfaced by [Amazon Kendra](https://aws.amazon.com/kendra/) that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations created by our curator community.
6+
Documentation: https://github.com/PanSurg/redasa-sample-data/blob/master/open-data.md
7+
Contact: redasa-open-data@imperial.ac.uk
8+
ManagedBy: REDASA Consortium, Imperial College London, UK
9+
UpdateFrequency: Yearly updates
10+
Tags:
11+
- aws-pds
12+
- COVID-19
13+
- coronavirus
14+
- life sciences
15+
- information retrieval
16+
- natural language processing
17+
- text analysis
18+
License: CC-BY-4.0
19+
Resources:
20+
- Description: This is the raw data repository containing a common crawl of CORD-19 papers and other sources identified by the REDASA Project.
21+
ARN: arn:aws:s3:::pansurg-curation-raw-open-data
22+
Region: eu-west-2
23+
Type: S3 Bucket
24+
- Description: For all the questions curated during the REDASA project, we created a Kendra index. The documents available in this S3 bucket were surfaced by the Kendra index as being relevant to the research medical question.
25+
ARN: arn:aws:s3:::pansurg-curation-workflo-kendraqueryresults50d0eb-open-data
26+
Region: eu-west-2
27+
Type: S3 Bucket
28+
- Description: An S3 bucket that contains the final curation data in GroundTruth format
29+
ARN: arn:aws:s3:::pansurg-curation-final-curations-open-data
30+
Region: eu-west-2
31+
Type: S3 Bucket
32+
DataAtWork:
33+
Tools & Applications:
34+
- Title: Curadr - Curation Platform
35+
URL: https://curadr.com/
36+
AuthorName: REDASA Consortium, Imperial College London
37+
AuthorURL: https://www.pansurg.org/redasa/
38+
Publications:
39+
- Title: "Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study"
40+
URL: "https://www.jmir.org/2021/5/e25714"
41+
AuthorName: Uddhav Vaghela, Simon Rabinowicz, Paris Bratsos, Guy Martin, Epameinondas Fritzilas, et al.

0 commit comments

Comments
 (0)