-
Notifications
You must be signed in to change notification settings - Fork 66
Closed
Description
Describe the bug
I do not know exactly what causes the bug, but some attributes on the ninth column in GFF files, when converted to GTF files, will include double quoted values, which can violate some assumptions tools make about the GTF format. As an example, included below: ;start_range=.,210217 becomes ; start_range "." "210217";.
General (please complete the following information):
- AGAT version
quay.io/biocontainers/agat:1.4.2--pl5321hdfd78af_0 - Docker 28.0.1
- OS: macOS Sequoia 15.3.2 (24D81)
To Reproduce
test.gtf from NCBI Refseq:
NZ_LR214945.1 RefSeq pseudogene 210217 211509 . - . ID=gene-EXC43_RS01050;Dbxref=GeneID:66608824;Name=mgpA;end_range=211509,.;gbkey=Gene;gene=mgpA;gene_biotype=pseudogene;locus_tag=EXC43_RS01050;old_locus_tag=NCTC10119_00214;partial=true;pseudo=true;start_range=.,210217
NZ_LR214945.1 Protein Homology CDS 210217 211509 . - 0 ID=cds-EXC43_RS01050;Parent=gene-EXC43_RS01050;Dbxref=GeneID:66608824;Note=incomplete%3B partial in the middle of a contig%3B missing N-terminus and C-terminus;end_range=211509,.;gbkey=CDS;gene=mgpA;inference=COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1;locus_tag=EXC43_RS01050;partial=true;product=adhesin P1;pseudo=true;start_range=.,210217;transl_table=4
Command:
docker run --mount type=bind,src=/paht/to/file,dst=/file quay.io/biocontainers/agat:1.4.2--pl5321hdfd78af_0 agat_convert_sp_gff2gtf.pl --gff file/test.gff -o file/result.gtf
result.gtf:
##gtf-version X
# GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.
NZ_LR214945.1 RefSeq pseudogene 210217 211509 . - . gene_id "agat-pseudogene-1"; Dbxref "GeneID:66608824"; ID "agat-pseudogene-1"; Name "mgpA"; end_range "211509" "."; gbkey "Gene"; gene "mgpA"; gene_biotype "pseudogene"; locus_tag "EXC43_RS01050"; old_locus_tag "NCTC10119_00214"; partial "true"; pseudo "true"; start_range "." "210217";
NZ_LR214945.1 AGAT mRNA 210217 211509 . - . gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "gene-EXC43_RS01050"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "agat-pseudogene-1"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";
NZ_LR214945.1 AGAT exon 210217 211509 . - . gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "agat-exon-1"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "gene-EXC43_RS01050"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";
NZ_LR214945.1 Protein Homology CDS 210217 211509 . - 0 gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "cds-EXC43_RS01050"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "gene-EXC43_RS01050"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";
Expected behavior
There should be no double quoted values in the resulting GTF file.
My last issue turned out to be easily addressed just by checking the 'Troubleshooting' page so I hope this one is more helpful!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels