Skip to content

Commit 9bf5106

Browse files
authored
556 (#575)
* fix #556 - avoid glue of empty attribute to the ID in GFF3 parsing * fix #250 url_escaped. Keep escaped characters in in the output * add url_encode_out parameter to decode GFF3 url escaped chraracter (is it useful?) * homogenize usage output: output|out|o (remove outfile) * homogenize POD (type describe in the options line: -o, --out, --output <file>) * Add --force parameter to overwrite output if already exists * Add nextflow pipeline to assess performance * homogenize usage of boolean: --bolean_option activate the option. In order to deactivate a boolean (e.g. in agat.pl because other scripts suggest only boolean activation) use the --no-bolean_option syntax.
1 parent 92a3b37 commit 9bf5106

File tree

114 files changed

+1735
-1494
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+1735
-1494
lines changed

bin/agat

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,11 @@ force_gff_input_version, gtf_output_version, gff_output_version and output_forma
165165
getopt => 'progress_bar!',
166166
help => 'To activate / deactivate the progress bar. [Default activated]',
167167
},
168+
{
169+
170+
getopt => 'url_encode_out!',
171+
help => 'To activate / deactivate the URL escaping of attributes (9th column) in GFF output. [Default activated]',
172+
},
168173
{
169174
getopt => 'log!',
170175
help => 'To create a log file while parsing the input file to keep track of modification made by AGAT. [Default activated]',
@@ -205,8 +210,10 @@ force_gff_input_version, gtf_output_version, gff_output_version and output_forma
205210
getopt => 'deflate_attribute!',
206211
help => 'deflate multi-values attributes: attribute_tag=att_value1,att_value2,att_value3 will will become attribute_tag=att_value1;attribute_tag2=att_value2;attribute_tag3=att_value3;',
207212
},
208-
{
209-
getopt => 'create_l3_for_l2_orphan!',
213+
{ getopt => 'force!',
214+
help => 'Force overwrite existing output file. [Default false]',
215+
},
216+
{ getopt => 'create_l3_for_l2_orphan!',
210217
help => 'To create l3 feature for l2 feature without any. [Default activated]',
211218
},
212219
{

bin/agat_convert_bed2gff.pl

Lines changed: 11 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@
3232
"primary_tag=s" => \$primary_tag,
3333
"inflate_off!" => \$inflating_off,
3434
"inflate_type=s" => \$inflate_type,
35-
"outfile|output|o|out|gff=s" => \$outfile ) )
35+
"o|out|output=s" => \$outfile,
36+
))
3637
{
3738
pod2usage( { -message => "Failed to parse command line.\n",
3839
-verbose => 1,
@@ -473,63 +474,42 @@ =head1 OPTIONS
473474
474475
=over 8
475476
476-
=item B<--bed>
477+
=item B<--bed> <file>
477478
478479
Input bed file that will be converted.
479480
480-
=item B<--source>
481+
=item B<--source> <string>
481482
482483
The source informs about the tool used to produce the data and is stored in 2nd field of a gff file.
483484
Example: Stringtie,Maker,Augustus,etc. [default: data]
484485
485-
=item B<--primary_tag>
486+
=item B<--primary_tag> <string>
486487
487488
The primary_tag corresponds to the data type and is stored in 3rd field of a gff file.
488489
Example: gene,mRNA,CDS,etc. [default: gene]
489490
490491
=item B<--inflate_off>
491-
492492
By default we inflate the block fields (blockCount, blockSizes, blockStarts) to create subfeatures
493493
of the main feature (primary_tag). The type of subfeature created is based on the
494494
inflate_type parameter. If you do not want this inflating behaviour you can deactivate it
495495
by using the --inflate_off option.
496496
497-
=item B<--inflate_type>
497+
=item B<--inflate_type> <string>
498498
499499
Feature type (3rd column in gff) created when inflate parameter activated [default: exon].
500500
501-
=item B<-o> , B<--output> , B<--out> , B<--outfile> or B<--gff>
502-
503-
Output GFF file. If no output file is specified, the output will be
504-
written to STDOUT.
501+
=item B<-o>, B<--out> or B<--output> <file>
505502
503+
Output file to create (default GFF3 - see config to modify output format).
504+
If no output file is specified, the output will be written to STDOUT.
506505
507506
=item B<-h> or B<--help>
508507
509508
Display this helpful text.
510509
511-
=back
512-
513-
=head1 SHARED OPTIONS
514-
515-
Shared options are defined in the AGAT configuration file and can be overridden via the command line for this script only.
516-
Common shared options are listed below; for the full list, please refer to the AGAT agat_config.yaml.
517-
518-
=over 8
519-
520-
=item B<--config>
521-
522-
String - Path to a custom AGAT configuration file.
523-
By default, AGAT uses `agat_config.yaml` from the working directory if present, otherwise the default file shipped with AGAT
524-
(available locally via `agat config --expose`).
525-
526-
=item B<--cpu>, B<--core>, B<--job> or B<--thread>
527-
528-
Integer - Number of parallel processes to use for file input parsing (via forking).
529-
530-
=item B<-v> or B<--verbose>
510+
=item B<-v> or B<--verbose> <int>
531511
532-
Integer - Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
512+
Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
533513
534514
=back
535515

bin/agat_convert_embl2gff.pl

Lines changed: 15 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,11 @@
3131
"h|help" => \$help,
3232
"embl=s" => \$embl,
3333
"primary_tag|pt|t=s" => \$primaryTags,
34-
"d!" => \$discard,
35-
"k!" => \$keep,
34+
"d|discard!" => \$discard,
35+
"k|keep!" => \$keep,
3636
"emblmygff3!" => \$emblmygff3,
37-
"outfile|output|o|out|gff=s" => \$outfile))
37+
"o|out|output=s" => \$outfile,
38+
) )
3839
{
3940
pod2usage( { -message => "Failed to parse command line\n$header",
4041
-verbose => 1,
@@ -235,60 +236,41 @@ =head1 OPTIONS
235236
236237
=over 8
237238
238-
=item B<--embl>
239+
=item B<--embl> <file>
239240
240241
Input EMBL file that will be read
241242
242243
=item B<--emblmygff3>
243-
244-
Bolean - Means that the EMBL flat file comes from the EMBLmyGFF3 software.
244+
Means that the EMBL flat file comes from the EMBLmyGFF3 software.
245245
This is an EMBL format dedicated for submission and contains particularity to deal with.
246246
This parameter is needed to get a proper sequence id in the GFF3 from an embl made with EMBLmyGFF3.
247247
248-
=item B<--primary_tag>, B<--pt>, B<-t>
248+
=item B<--primary_tag>, B<--pt> or B<-t> <list>
249249
250250
List of "primary tag". Useful to discard or keep specific features.
251251
Multiple tags must be coma-separated.
252252
253-
=item B<-d>
253+
=item B<-d> or B<--discard>
254254
255-
Bolean - Means that primary tags provided by the option "primary_tag" will be discarded.
255+
Means that primary tags provided by the option "primary_tag" will be discarded.
256256
257-
=item B<-k>
257+
=item B<-k> or B<--keep>
258258
259-
Bolean - Means that only primary tags provided by the option "primary_tag" will be kept.
259+
Means that only primary tags provided by the option "primary_tag" will be kept.
260260
261-
=item B<-o>, B<--output>, B<--out>, B<--outfile> or B<--gff>
261+
=item B<-o>, B<--out> or B<--output> <file>
262262
263-
Output GFF file. If no output file is specified, the output will be
263+
Output GFF file to create. If no output file is specified, the output will be
264264
written to STDOUT.
265265
266266
=item B<-h> or B<--help>
267267
268268
Display this helpful text.
269269
270-
=back
271-
272-
=head1 SHARED OPTIONS
273-
274-
Shared options are defined in the AGAT configuration file and can be overridden via the command line for this script only.
275-
Common shared options are listed below; for the full list, please refer to the AGAT agat_config.yaml.
276-
277-
=over 8
278-
279-
=item B<--config>
280-
281-
String - Path to a custom AGAT configuration file.
282-
By default, AGAT uses `agat_config.yaml` from the working directory if present, otherwise the default file shipped with AGAT
283-
(available locally via `agat config --expose`).
284-
285-
=item B<--cpu>, B<--core>, B<--job> or B<--thread>
286-
287-
Integer - Number of parallel processes to use for file input parsing (via forking).
288270
289-
=item B<-v> or B<--verbose>
271+
=item B<-v> or B<--verbose> <int>
290272
291-
Integer - Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
273+
Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
292274
293275
=back
294276

bin/agat_convert_genscan2gff.pl

Lines changed: 9 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
"h|help" => \$help,
3131
"g|genscan=s" => \$genscan,
3232
"seqid=s" => \$seq_id,
33-
"outfile|output|o|out|gff=s" => \$outfile ) )
33+
"o|out|output=s" => \$outfile,
34+
) )
3435
{
3536
pod2usage( { -message => "Failed to parse command line.\n",
3637
-verbose => 1,
@@ -285,45 +286,26 @@ =head1 OPTIONS
285286
286287
=over 8
287288
288-
=item B<--genscan> or B<-g>
289+
=item B<--genscan> or B<-g> <file>
289290
290291
Input genscan bed file that will be convert.
291292
292-
=item B<--seqid>
293+
=item B<--seqid> <string>
293294
294-
String - Sequence ID. [default: unknown]
295+
Sequence ID. [default: unknown]
295296
296-
=item B<-o> , B<--output> , B<--out> , B<--outfile> or B<--gff>
297+
=item B<-o>, B<--out> or B<--output> <file>
297298
298-
Output GFF file. If no output file is specified, the output will be
299+
Output GFF file to create. If no output file is specified, the output will be
299300
written to STDOUT.
300301
301302
=item B<-h> or B<--help>
302303
303304
Display this helpful text.
304305
305-
=back
306-
307-
=head1 SHARED OPTIONS
308-
309-
Shared options are defined in the AGAT configuration file and can be overridden via the command line for this script only.
310-
Common shared options are listed below; for the full list, please refer to the AGAT agat_config.yaml.
311-
312-
=over 8
313-
314-
=item B<--config>
315-
316-
String - Path to a custom AGAT configuration file.
317-
By default, AGAT uses `agat_config.yaml` from the working directory if present, otherwise the default file shipped with AGAT
318-
(available locally via `agat config --expose`).
319-
320-
=item B<--cpu>, B<--core>, B<--job> or B<--thread>
321-
322-
Integer - Number of parallel processes to use for file input parsing (via forking).
323-
324-
=item B<-v> or B<--verbose>
306+
=item B<-v> or B<--verbose> <int>
325307
326-
Integer - Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
308+
Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
327309
328310
=back
329311

bin/agat_convert_mfannot2gff.pl

Lines changed: 8 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
if ( !$script_parser->getoptionsfromarray(
2828
$script_argv,
2929
'mfannot|m|i=s' => \$mfannot_file,
30-
'gff|g|o=s' => \$gff_file,
30+
'o|out|output=s' => \$gff_file,
3131
'h|help' => sub { pod2usage( -exitstatus=>0, -verbose=>99, -message => "$header\n" ); },
3232
'man' => sub { pod2usage(-exitstatus=>0, -verbose=>2); }
3333
))
@@ -433,40 +433,22 @@ =head1 OPTIONS
433433
434434
=over 8
435435
436-
=item B<-m> or B<-i> or B<--mfannot>
436+
=item B<-m> or B<-i> or B<--mfannot> <file>
437437
438438
The mfannot input file
439439
440-
=item B<-g> or B<-o> or B<--gff>
440+
=item B<-o>, B<--out> or B<--output> <file>
441441
442-
the gff output file
442+
Output GFF file to create. If no output file is specified, the output will be
443+
written to STDOUT.
443444
444445
=item B<-h> or B<--help>
445446
446447
Display this helpful text.
447448
448-
=back
449-
450-
=head1 SHARED OPTIONS
451-
452-
Shared options are defined in the AGAT configuration file and can be overridden via the command line for this script only.
453-
Common shared options are listed below; for the full list, please refer to the AGAT agat_config.yaml.
454-
455-
=over 8
456-
457-
=item B<--config>
458-
459-
String - Path to a custom AGAT configuration file.
460-
By default, AGAT uses `agat_config.yaml` from the working directory if present, otherwise the default file shipped with AGAT
461-
(available locally via `agat config --expose`).
462-
463-
=item B<--cpu>, B<--core>, B<--job> or B<--thread>
464-
465-
Integer - Number of parallel processes to use for file input parsing (via forking).
466-
467-
=item B<-v> or B<--verbose>
449+
=item B<-v> or B<--verbose> <int>
468450
469-
Integer - Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
451+
Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
470452
471453
=back
472454
@@ -482,4 +464,4 @@ =head1 BUG REPORTING
482464
483465
=cut
484466
485-
AUTHOR - Jacques Dainat
467+
AUTHOR - Jacques Dainat

bin/agat_convert_minimap2_bam2gff.pl

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,7 @@ =head1 OPTIONS
376376
377377
=over 8
378378
379-
=item B<-i> or B<--input>
379+
=item B<-i> or B<--input> <file>
380380
381381
Input file in sam (.sam extension) or bam (.bam extension) format.
382382
@@ -388,9 +388,9 @@ =head1 OPTIONS
388388
389389
To force to use the input file as sam file.
390390
391-
=item B<-o>, B<--out> or B<--output>
391+
=item B<-o>, B<--out> or B<--output> <file>
392392
393-
Output GFF file. If no output file is specified, the output will be
393+
Output GFF file to create. If no output file is specified, the output will be
394394
written to STDOUT.
395395
396396
=item B<-h> or B<--help>
@@ -406,19 +406,9 @@ =head1 SHARED OPTIONS
406406
407407
=over 8
408408
409-
=item B<--config>
409+
=item B<-v> or B<--verbose> <int>
410410
411-
String - Path to a custom AGAT configuration file.
412-
By default, AGAT uses `agat_config.yaml` from the working directory if present, otherwise the default file shipped with AGAT
413-
(available locally via `agat config --expose`).
414-
415-
=item B<--cpu>, B<--core>, B<--job> or B<--thread>
416-
417-
Integer - Number of parallel processes to use for file input parsing (via forking).
418-
419-
=item B<-v> or B<--verbose>
420-
421-
Integer - Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
411+
Verbosity, choice are 0,1,2,3,4. 0 is quiet, 1 is normal, 2,3,4 is more verbose. Default 1.
422412
423413
=back
424414

0 commit comments

Comments
 (0)