Skip to content

Some GFF features translate to double quoted values in GTF output #537

@timslittle

Description

@timslittle

Describe the bug
I do not know exactly what causes the bug, but some attributes on the ninth column in GFF files, when converted to GTF files, will include double quoted values, which can violate some assumptions tools make about the GTF format. As an example, included below: ;start_range=.,210217 becomes ; start_range "." "210217";.

General (please complete the following information):

  • AGAT version quay.io/biocontainers/agat:1.4.2--pl5321hdfd78af_0
  • Docker 28.0.1
  • OS: macOS Sequoia 15.3.2 (24D81)

To Reproduce
test.gtf from NCBI Refseq:

NZ_LR214945.1	RefSeq	pseudogene	210217	211509	.	-	.	ID=gene-EXC43_RS01050;Dbxref=GeneID:66608824;Name=mgpA;end_range=211509,.;gbkey=Gene;gene=mgpA;gene_biotype=pseudogene;locus_tag=EXC43_RS01050;old_locus_tag=NCTC10119_00214;partial=true;pseudo=true;start_range=.,210217
NZ_LR214945.1	Protein Homology	CDS	210217	211509	.	-	0	ID=cds-EXC43_RS01050;Parent=gene-EXC43_RS01050;Dbxref=GeneID:66608824;Note=incomplete%3B partial in the middle of a contig%3B missing N-terminus and C-terminus;end_range=211509,.;gbkey=CDS;gene=mgpA;inference=COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1;locus_tag=EXC43_RS01050;partial=true;product=adhesin P1;pseudo=true;start_range=.,210217;transl_table=4

Command:

docker run --mount type=bind,src=/paht/to/file,dst=/file quay.io/biocontainers/agat:1.4.2--pl5321hdfd78af_0  agat_convert_sp_gff2gtf.pl --gff file/test.gff -o file/result.gtf

result.gtf:

##gtf-version X
# GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.
NZ_LR214945.1	RefSeq	pseudogene	210217	211509	.	-	.	gene_id "agat-pseudogene-1"; Dbxref "GeneID:66608824"; ID "agat-pseudogene-1"; Name "mgpA"; end_range "211509" "."; gbkey "Gene"; gene "mgpA"; gene_biotype "pseudogene"; locus_tag "EXC43_RS01050"; old_locus_tag "NCTC10119_00214"; partial "true"; pseudo "true"; start_range "." "210217";
NZ_LR214945.1	AGAT	mRNA	210217	211509	.	-	.	gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "gene-EXC43_RS01050"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "agat-pseudogene-1"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";
NZ_LR214945.1	AGAT	exon	210217	211509	.	-	.	gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "agat-exon-1"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "gene-EXC43_RS01050"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";
NZ_LR214945.1	Protein Homology	CDS	210217	211509	.	-	0	gene_id "agat-pseudogene-1"; transcript_id "gene-EXC43_RS01050"; Dbxref "GeneID:66608824"; ID "cds-EXC43_RS01050"; Note "incomplete; partial in the middle of a contig; missing N-terminus and C-terminus"; Parent "gene-EXC43_RS01050"; end_range "211509" "."; gbkey "CDS"; gene "mgpA"; inference "COORDINATES: similar to AA sequence:RefSeq:WP_010874498.1"; locus_tag "EXC43_RS01050"; partial "true"; product "adhesin P1"; pseudo "true"; start_range "." "210217"; transl_table "4";

Expected behavior
There should be no double quoted values in the resulting GTF file.

My last issue turned out to be easily addressed just by checking the 'Troubleshooting' page so I hope this one is more helpful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions