Skip to content

metaharvest inaccurately renames metadata headings retrieved from ENA  #27

@Feat-FeAR

Description

@Feat-FeAR

Currently, metadata fetched from ENA are extracted from the original JSON by metaharvest using the function _extract_ena_metadata, which ruturns a CSV table as output. In this process, headings are renamed, and in particular aliases are remapped as 'geo':
study_alias --> geo_series and sample_alias --> geo_sample. However, this is not always correct, as in those case in which RNA-Seq data have been brockered from Array-Express, or have been submitted directly to (ENA or NCBI) SRA. See, for instance, PRJEB48614

"ena_sample_title","geo_series","geo_sample","ena_project","ena_sample","ena_run","read_count","library_layout","extra"
"1_E_TIGRb_S8","E-MTAB-11129","E-MTAB-11129:1_E_TIGRb_S8","PRJEB48614","SAMEA10792371","ERR7246655","52455546","PAIRED","0"
"1_E_Ua_S1","E-MTAB-11129","E-MTAB-11129:1_E_Ua_S1","PRJEB48614","SAMEA10792373","ERR7246657","52170236","PAIRED","0"
"EPOCE2_T4_2_S21","E-MTAB-11129","E-MTAB-11129:EPOCE2_T4_2_S21","PRJEB48614","SAMEA10792378","ERR7246662","41502100","PAIRED","0"
...

Unfortunately, this change could be breaking...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions