SCAFE/scripts at main · minoda-lab/SCAFE

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
scafe.check.dependencies	scafe.check.dependencies
scafe.demo.test.run	scafe.demo.test.run
scafe.download.demo.input	scafe.download.demo.input
scafe.download.resources.genome	scafe.download.resources.genome
scafe.tool.bk.bam_to_ctss	scafe.tool.bk.bam_to_ctss
scafe.tool.bk.count	scafe.tool.bk.count
scafe.tool.bk.pool	scafe.tool.bk.pool
scafe.tool.bk.subsample_ctss	scafe.tool.bk.subsample_ctss
scafe.tool.cm.annotate	scafe.tool.cm.annotate
scafe.tool.cm.cluster	scafe.tool.cm.cluster
scafe.tool.cm.ctss_to_bigwig	scafe.tool.cm.ctss_to_bigwig
scafe.tool.cm.filter	scafe.tool.cm.filter
scafe.tool.cm.prep_genome	scafe.tool.cm.prep_genome
scafe.tool.cm.remove_strand_invader	scafe.tool.cm.remove_strand_invader
scafe.tool.sc.bam_to_ctss	scafe.tool.sc.bam_to_ctss
scafe.tool.sc.count	scafe.tool.sc.count
scafe.tool.sc.link	scafe.tool.sc.link
scafe.tool.sc.pool	scafe.tool.sc.pool
scafe.tool.sc.subsample_ctss	scafe.tool.sc.subsample_ctss
scafe.workflow.bk.pool	scafe.workflow.bk.pool
scafe.workflow.bk.solo	scafe.workflow.bk.solo
scafe.workflow.bk.subsample	scafe.workflow.bk.subsample
scafe.workflow.sc.pool	scafe.workflow.sc.pool
scafe.workflow.sc.solo	scafe.workflow.sc.solo
scafe.workflow.sc.subsample	scafe.workflow.sc.subsample

SCAFE Tools and Workflows

This folder contains the following tools and workflows. A tool perform a single task and a workflow runs multiple tools. Some scripts are seperately implemented as bulk (.bk.) and single-cell (.sc.) mode, while others are common (.cm.) for both.

scafe.workflow.sc.subsample ---> workflow, single-cell mode, subsample ctss
scafe.workflow.sc.solo ---> workflow, single-cell mode, process a single sample
scafe.workflow.sc.pool ---> workflow, single-cell mode, pool ctss of multiple samples
scafe.workflow.bk.subsample ---> workflow, bulk mode, subsample ctss
scafe.workflow.bk.solo ---> workflow, bulk mode, process a single sample
scafe.workflow.bk.pool ---> workflow, bulk mode, process a single sample
scafe.tool.sc.subsample_ctss ---> tool, single-cell mode, subsample ctss
scafe.tool.sc.pool ---> tool, single-cell mode, pool ctss of multiple samples
scafe.tool.sc.link ---> tool, single-cell mode, linking tCRE by coactivity
scafe.tool.sc.count ---> tool, single-cell mode, count of UMI within tCRE
scafe.tool.sc.bam_to_ctss ---> tool, single-cell mode, convert bam to ctss
scafe.tool.cm.remove_strand_invader ---> tool, common mode, remove strand invader artefact
scafe.tool.cm.prep_genome ---> tool, common mode, prepare custom reference genome
scafe.tool.cm.filter ---> tool, common mode, filter for genuine TSS clusters
scafe.tool.cm.ctss_to_bigwig ---> tool, common mode, convert ctss to bigwig
scafe.tool.cm.cluster ---> tool, common mode, cluster ctss
scafe.tool.cm.annotate ---> tool, common mode, define and annotate tCRE
scafe.tool.bk.subsample_ctss ---> tool, bulk mode, subsample ctss
scafe.tool.bk.pool ---> tool, bulk mode, pool ctss of multiple samples
scafe.tool.bk.count ---> tool, bulk mode, count ctss within tCREs
scafe.tool.bk.bam_to_ctss ---> tool, bulk mode, convert bam to ctss bed
scafe.download.resources.genome ---> download, reference genome to resources dir
scafe.download.demo.input ---> download, demo input data for testing
scafe.demo.test.run ---> demo, run demo data for testing
scafe.check.dependencies ---> check dependencies

scafe.workflow.sc.subsample [top]

This workflow subsamples a ctss file, defines tCRE and generate a tCRE UMI/cellbarcode count matrix Subsampling is useful to investigate the effect of sequencing depth to tCRE definition

 Usage:
   scafe.workflow.sc.subsample [options] --UMI_CB_ctss_bed_path --run_cellbarcode_path --subsample_num --genome --run_tag --run_outDir
   
   --UMI_CB_ctss_bed_path <required> [string]  ctss file for subsampling, one line one cellbarcode-UMI combination,
                                               *UMI_CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                               4th column cellbarcode-UMI and 5th column is number of unencoded-G
   --run_cellbarcode_path <required> [string]  tsv file contains a list of cell barcodes,
                                               barcodes.tsv.gz from cellranger
   --subsample_num        <required> [integer] number of UMI to be subsampled
   --genome               <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag              <required> [string]  prefix for the output files
   --run_outDir           <required> [string]  directory for the output files
   --training_signal_path (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for training of logical 
                                               regression model If null, $usr_glm_model_path must be supplied for 
                                               pre-built logical regression model. It overrides usr_glm_model_path 
                                               (default=null)
   --testing_signal_path  (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                               of the logical regression model. If null, annotated TSS from $genome will be 
                                               used as binary genomic regions. (default=null)
   --max_thread           (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                               avoid memory overflow (default=5)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.sc.subsample \
   --overwrite=yes \
   --UMI_CB_ctss_bed_path=./demo/input/sc.subsample/demo.UMI_CB.ctss.bed.gz \
   --run_cellbarcode_path=./demo/input/sc.subsample/demo.barcodes.tsv.gz \
   --subsample_num=100000 \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/sc.subsample/

scafe.workflow.sc.solo [top]

This workflow process a single sample, from a cellranger bam file to tCRE UMI/cellbarcode count matrix

 Usage:
   scafe.workflow.sc.solo [options] --run_bam_path --run_cellbarcode_path --genome --run_tag --run_outDir
   
   --run_bam_path         <required> [string]  bam file from cellranger, can be read 1 only or pair-end
   --run_cellbarcode_path <required> [string]  tsv file contains a list of cell barcodes,
                                               barcodes.tsv.gz from cellranger
   --genome               <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag              <required> [string]  prefix for the output files
   --run_outDir           <required> [string]  directory for the output files
   --training_signal_path (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for training of logical 
                                               regression model If null, $usr_glm_model_path must be supplied for 
                                               pre-built logical regression model. It overrides usr_glm_model_path 
                                               (default=null)
   --testing_signal_path  (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                               of the logical regression model. If null, annotated TSS from $genome will be 
                                               used as binary genomic regions. (default=null)
   --usr_glm_model_path   (optional) [string]  pre-built logical regression model from the Caret package in R. Used only if 
                                               training_signal_path is not supplied. Models were pre-built for each genome
                                               and used as default.
   --max_thread           (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                               avoid memory overflow (default=5)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.sc.solo \
   --overwrite=yes \
   --run_bam_path=./demo/input/sc.solo/demo.cellranger.bam \
   --run_cellbarcode_path=./demo/input/sc.solo/demo.barcodes.tsv.gz \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/sc.solo/

scafe.workflow.sc.pool [top]

This workflow pool multiple samples for defining tCRE, starting from ctss files to tCRE UMI/cellbarcode count matrix

 Usage:
   scafe.workflow.sc.pool [options] --lib_list_path --genome --run_tag --run_outDir
   
   --lib_list_path        <required> [string] a list of libraries, in formation of 
                                              <lib_ID><\t><suffix><\t><UMI_CB_ctss_bed><\t><cellbarcode><\t><CB_ctss_bed>
                                              lib_ID = Unique ID of the cellbarcode
                                              suffix = an unique integer to be used as for suffix of cellbarcode
                                              UMI_CB_ctss_bed = *UMI_CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                              CB_ctss_bed = *CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
   --genome               <required> [string] name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag              <required> [string] prefix for the output files
   --run_outDir           <required> [string] directory for the output files
   --training_signal_path (optional) [string] quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                              regions (e.g. annotated CRE, in bed format) used for training of logical 
                                              regression model If null, $usr_glm_model_path must be supplied for 
                                              pre-built logical regression model. It overrides usr_glm_model_path 
                                              (default=null)
   --testing_signal_path (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                              regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                              of the logical regression model. If null, annotated TSS from $genome will be 
                                              used as binary genomic regions. (default=null)
   --usr_glm_model_path  (optional) [string]  pre-built logical regression model from the Caret package in R. Used only if 
                                              training_signal_path is not supplied. Models were pre-built for each genome
                                              and used as default.
   --max_thread          (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                              avoid memory overflow (default=5)
   --overwrite           (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.sc.pool \
   --overwrite=yes \
   --lib_list_path=./demo/input/sc.pool/lib_list_path.txt \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/sc.pool/

scafe.workflow.bk.subsample [top]

This workflow subsamples a ctss file, defines tCRE and generate tCRE read count Subsampling is useful to investigate the effect of sequencing depth to tCRE definition

 Usage:
   scafe.workflow.bk.subsample [options] --long_ctss_bed_path --subsample_num --genome --run_tag --run_outDir
   
   --long_ctss_bed_path    <required> [string] ctss file for subsampling, one line one read
                                               *long.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
   --subsample_num        <required> [integer] number of UMI to be subsampled
   --genome               <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag              <required> [string]  prefix for the output files
   --run_outDir           <required> [string]  directory for the output files
   --training_signal_path (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for training of logical 
                                               regression model If null, $usr_glm_model_path must be supplied for 
                                               pre-built logical regression model. It overrides usr_glm_model_path 
                                               (default=null)
   --testing_signal_path  (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                               of the logical regression model. If null, annotated TSS from $genome will be 
                                               used as binary genomic regions. (default=null)
   --max_thread           (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                               avoid memory overflow (default=5)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.bk.subsample \
   --overwrite=yes \
   --long_ctss_bed_path=./demo/input/bk.subsample/demo.long.ctss.bed.gz \
   --subsample_num=100000 \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/bk.subsample/

scafe.workflow.bk.solo [top]

This workflow process a single sample, from a bulk CAGE bam file to read count per tCRE

 Usage:
   scafe.workflow.bk.solo [options] --run_bam_path --genome --run_tag --run_outDir
   
   --run_bam_path         <required> [string]  bam file (of CAGE reads), can be read 1 only or pair-end
   --genome               <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag              <required> [string]  prefix for the output files
   --run_outDir           <required> [string]  directory for the output files
   --training_signal_path (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for training of logical 
                                               regression model If null, $usr_glm_model_path must be supplied for 
                                               pre-built logical regression model. It overrides usr_glm_model_path 
                                               (default=null)
   --testing_signal_path  (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                               of the logical regression model. If null, annotated TSS from $genome will be 
                                               used as binary genomic regions. (default=null)
   --max_thread           (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                               avoid memory overflow (default=5)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.bk.solo \
   --overwrite=yes \
   --run_bam_path=./demo/input/bk.solo/demo.CAGE.bam \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/bk.solo/

scafe.workflow.bk.pool [top]

This workflow pool multiple samples for defining tCRE, starting from ctss files to read count per tCRE per sample

 Usage:
   scafe.workflow.bk.pool [options] --lib_list_path --genome --run_tag --run_outDir
   
   --lib_list_path         <required> [string] a list of libraries, in formation of 
                                               <lib_ID><\t><long_ctss_bed><\t><collapse_ctss_bed>
                                               lib_ID = Unique ID of the cellbarcode
                                               long_ctss_bed = *long.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
                                               collapse_ctss_bed = *collapse.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
   --genome                <required> [string] name of genome reference, e.g. hg19.gencode_v32lift37
   --run_tag               <required> [string] prefix for the output files
   --run_outDir            <required> [string] directory for the output files
   --training_signal_path  (optional) [string] quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for training of logical 
                                               regression model If null, $usr_glm_model_path must be supplied for 
                                               pre-built logical regression model. It overrides usr_glm_model_path 
                                               (default=null)
   --testing_signal_path  (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                               regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                               of the logical regression model. If null, annotated TSS from $genome will be 
                                               used as binary genomic regions. (default=null)
   --max_thread           (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                               avoid memory overflow (default=5)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.workflow.bk.pool \
   --overwrite=yes \
   --lib_list_path=./demo/input/bk.pool/lib_list_path.txt \
   --genome=hg19.gencode_v32lift37 \
   --run_tag=demo \
   --run_outDir=./demo/output/bk.pool/

scafe.tool.sc.subsample_ctss [top]

This tool subsample a ctss bed file and maintains the cellbarcode and UMI information

 Usage:
   scafe.tool.sc.subsample_ctss [options] --UMI_CB_ctss_bed_path --subsample_num --outputPrefix --outDir
   
   --UMI_CB_ctss_bed_path <required> [string]  ctss file for subsampling, one line one cellbarcode-UMI combination,
                                               *UMI_CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                               4th column cellbarcode-UMI and 5th column is number of unencoded-G
   --subsample_num        <required> [integer] number of UMI to be subsampled
   --outputPrefix         <required> [string]  prefix for the output files
   --outDir               <required> [string]  directory for the output files
   --overwrite            (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.sc.subsample_ctss \
   --overwrite=yes \
   --UMI_CB_ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.UMI_CB.ctss.bed.gz \
   --subsample_num=100000 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.subsample/subsample_ctss/

scafe.tool.sc.pool [top]

This tool pool multiple ctss bed file and maintains the unique (suffixed) cellbarcode and UMI information

 Usage:
   scafe.tool.sc.pool [options] --lib_list_path --genome --outputPrefix --outDir
   
   --lib_list_path <required> [string]  a list of libraries, in formation of 
                                        <lib_ID><\t><suffix><\t><UMI_CB_ctss_bed><\t><cellbarcode><\t><CB_ctss_bed>
                                        lib_ID = Unique ID of the cellbarcode
                                        suffix = an unique integer to be used as for suffix of cellbarcode
                                        UMI_CB_ctss_bed = *UMI_CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                        CB_ctss_bed = *CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
   --genome        <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix  <required> [string]  prefix for the output files
   --outDir        <required> [string]  directory for the output files
   --max_thread    (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                        avoid memory overflow (default=5)
   --overwrite     (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.sc.pool \
   --overwrite=yes \
   --lib_list_path=./demo/input/sc.pool/lib_list_path.txt \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.pool/pool/

scafe.tool.sc.link [top]

This tool links tCREs by their coactivity among single cells using cicero

 Usage:
   scafe.tool.sc.link [options] --count --run_chr --genome --CRE_bed_path --CRE_info_path --outputPrefix --outDir
   
   --CRE_bed_path     <required> [string]  bed file contains the regions of CRE,
                                      *.CRE.coord.bed.gz from scafe.tool.cm.annotate.pl
   --CRE_info_path    <required> [string]  tsv file contains the annoations of CREs, 
                                      *..CRE.info.tsv.gz from scafe.tool.cm.annotate.pl
   --count_dir        <required> [string]  a dir contains the UMI count of the CRE
   --genome           <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix     <required> [string]  prefix for the output files
   --outDir           <required> [string]  directory for the output files
   --network_cutoff   (optional) [0-1]     minimum coactivity to define cis-coactivity network (default = 0.05)
   --link_cutoff   (optional) [integer] minimum coactivity to output as link(default = 0.2)
   --binarize_CRE_exp (optional) [yes/no]  binarize_CRE_exp CRE expression signal or not (default = no)
   --min_cell         (optional) [integer] minimum number of cells the CRE to be expressed (default = 5)
   --Rscript_bin      (optional) [string]  path to the Rscript bin, aim to allow users to supply an R version other the 
                                           system wide R version. Package Caret must be installed. (default = Rscript)
   --max_thread       (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                           avoid memory overflow (default=5)
   --run_chr          (optional) [string]  comma delimited list of chromosome name to run,
                                           use 'all' to run all chromosome (default=all)
   --overwrite        (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   R packages: 'docopt','monocle3', 'cicero', 'Matrix', 'data.table', 'scales'

 To demo run, cd to SCAFE dir and run:
   scafe.tool.sc.link \
   --overwrite=yes \
   --max_thread=10 \
   --CRE_bed_path=./demo/input/sc.link/demo.CRE.coord.bed.gz \
   --CRE_info_path=./demo/input/sc.link/demo.CRE.info.tsv.gz \
   --count_dir=./demo/input/sc.link/matrix/ \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.link/

scafe.tool.sc.count [top]

This tool counts the UMI within a set of user-defined regions, e.g. tCRE, and returns a UMI/cellbarcode matrix

 Usage:
   scafe.tool.sc.count [options] --countRegion_bed_path --cellBarcode_list_path --ctss_bed_path --outputPrefix --outDir
   
   --countRegion_bed_path   <required> [string] bed file contains the regions for counting CTSS, e.g. tCRE ranges, 
                                                *.CRE.coord.bed.gz from scafe.tool.cm.annotate.pl
   --cellBarcode_list_path  <required> [string] tsv file contains a list of cell barcodes,
                                                barcodes.tsv.gz from cellranger
   --ctss_bed_path          <required> [string] ctss file for counting,
                                                *CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                                4th column cellbarcode and 5th column is number UMI
   --outputPrefix           <required> [string] prefix for the output files
   --outDir                 <required> [string] directory for the output files
   --overwrite              (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.sc.count \
   --overwrite=yes \
   --countRegion_bed_path=./demo/output/sc.solo/annotate/demo/bed/demo.CRE.annot.bed.gz \
   --cellBarcode_list_path=./demo/input/sc.solo/demo.barcodes.tsv.gz \
   --ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.CB.ctss.bed.gz \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/count/

scafe.tool.sc.bam_to_ctss [top]

This tool converts a bam file to a ctss bed file, identifies read 5'end (capped TSS, i.e. ctss), extracts the unencoded G information, pileup ctss, and deduplicate the UMI

 Usage:
   scafe.tool.sc.bam_to_ctss [options] --bamPath --genome --outputPrefix --outDir
   
   --bamPath      <required> [string]  bam file from cellranger, can be read 1 only or pair-end
   --genome       <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix <required> [string]  prefix for the output files
   --outDir       <required> [string]  directory for the output files
   --include_flag (optional) [string]  samflag to be included, comma delimited 
                                       e.g. '64' to include read1, (default=null)
   --exclude_flag (optional) [string]  samflag to be excluded, comma delimited, 
                                       e.g. '128,256,4' to exclude read2, secondary alignment 
                                       and unaligned reads (default=128,256,4)
   --min_MAPQ     (optional) [integer] minimum MAPQ to include (default=0)
   --max_thread   (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                       avoid memory overflow (default=5)
   --TS_oligo_seq (optional) [string]  Template switching oligo sequence for identification of 
                                       5'end (default=TTTCTTATATGGG) 
   --overwrite    (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools
   samtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.sc.bam_to_ctss \
   --overwrite=yes \
   --bamPath=./demo/input/sc.solo/demo.cellranger.bam \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/bam_to_ctss/

scafe.tool.cm.remove_strand_invader [top]

This tool identify and remove strand invader artefact from a ctss bed file, by aligning the sequence immediate upstream of a ctss to TS oligo sequence

 Usage:
   scafe.tool.cm.remove_strand_invader [options] --ctss_bed_path --genome --outputPrefix --outDir
   
   --ctss_bed_path      <required> [string]  "collapse" ctss file from scafe.tool.sc.bam_to_ctss.pl, 
                                             4th column is number of cells and 5th column is number UMI
   --genome             <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix       <required> [string]  prefix for the output files
   --outDir             <required> [string]  directory for the output files
   --min_edit_distance  (optional) [integer] edit distance threshold to define strand invader 
                                             the smaller value, the more stringent defintion of strand invader
                                             (default=5)
   --min_end_non_G_num  (optional) [integer] immediate upstream non-G number threshold to define strand invader
                                             the smaller value, the more stringent defintion of strand invader
                                             (default=2)
   --max_thread         (optional) [integer] maximum number of parallel threads, capped at 
                                             10 to avoid memory overflow (default=5)
   --TS_oligo_seq       (optional) [string]  Template switching oligo sequence for identification 
                                             of 5'end (default=TTTCTTATATGGG) 
   --overwrite          (optional) [yes/no]  [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.remove_strand_invader \
   --overwrite=yes \
   --ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.collapse.ctss.bed.gz \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/remove_strand_invader/

scafe.tool.cm.prep_genome [top]

This tool prepares a reference genome assembly and its gene models for others tools in scafe.

 Usage:
   scafe.tool.cm.prep_genome [options] --gtf_path --fasta_path --chrom_list_path --mask_bed_path --outputPrefix --outDir
   
   --gtf_path         <required> [string] gtf of the gene models
   --fasta_path       <required> [string] fasta of the genome assembly
   --chrom_list_path  <required> [string] list of <chromosome name><\t><alternative chromosome name> 
                                          e.g. <chr1><\t><1>
                                          chromosome name and alternative chromosome name could be the same
                                          alternative chromosome name is necessary if the cellranger bam
                                          file uses alternative chromosome name that is different from those
                                          in $fasta_path
   --mask_bed_path    <required> [string] a bed file specific the CRE regions. For human or mouse, consider 
                                          using ENCODE CREs. for other species, consider using merged ATAC-seq
                                          from multiple tissues. If ATAC is not available, use the +/- 500nt of 
                                          gene model 5'end.
   --outputPrefix     <required> [string] prefix for the output files (should be name of the genome reference)
   --outDir           <required> [string] directory for the output files (should be resource dir in scafe dir)
   --overwrite        (optional) [yes/no] erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools
   samtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.prep_genome \
   --overwrite=yes \
   --gtf_path=./demo/input/genome/TAIR10.AtRTDv2.gtf.gz \
   --fasta_path=./demo/input/genome/TAIR10.genome.fa.gz \
   --chrom_list_path=./demo/input/genome/TAIR10.chrom_list.txt \
   --mask_bed_path=./demo/input/genome/TAIR10.ATAC.bed.gz \
   --outputPrefix=TAIR10.AtRTDv2 \
   --outDir=./demo/output/genome/

scafe.tool.cm.filter [top]

 Usage:
   scafe.tool.cm.filter [options] --ctss_bed_path --ung_ctss_bed_path --tssCluster_bed_path --genome --outputPrefix --outDir
   
   --ctss_bed_path           <required> [string]  ctss file contains all ctss,
                                                  *.collapse.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                                  5th column is number reads/UMI
   --ung_ctss_bed_path       <required> [string]  ctss file contains only ctss with unencoded G,
                                                  *.unencoded_G.collapse.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                                  5th column is number reads/UMI
   --tssCluster_bed_path     <required> [string]  bed file contains all TSS clusters,
                                                  *.tssCluster.bed.gz from scafe.tool.cm.cluster.pl
   --genome                  <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix            <required> [string]  prefix for the output files
   --outDir                  <required> [string]  directory for the output files
   --tssCluster_flank_size   (optional) [integer] size of regions (each side) flanking a TSS cluster summit for
                                                  counting UMI/reads for expression levels calculation (default = 75)
   --local_bkgd_extend_size  (optional) [integer] size of regions (each side) flanking a TSS cluster summit for 
                                                  defining the scope for calculating local background (default = 500)
   --min_gold_num            (optional) [integer] minimum number of gold standard regions for training and testing the
                                                  logical regression model (default = 100)
   --training_pct            (optional) [float]   top and bottom percentage of the TSS clusters, ranked by signal in 
                                                  $training_signal_path, used for training of logical regression model
                                                  (default = 5)
   --training_signal_path    (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                                  regions (e.g. annotated CRE, in bed format) used for training of logical 
                                                  regression model If null, $usr_glm_model_path must be supplied for 
                                                  pre-built logical regression model. It overrides usr_glm_model_path 
                                                  (default=null)
   --testing_signal_path     (optional) [string]  quantitative signal (e.g. ATAC -logP, in bigwig format), or binary genomic 
                                                  regions (e.g. annotated CRE, in bed format) used for testing the performance 
                                                  of the logical regression model. If null, annotated TSS from $genome will be 
                                                  used as binary genomic regions. (default=null)
   --usr_glm_model_path      (optional) [string]  pre-built logical regression model from the Caret package in R. Used only if 
                                                  training_signal_path is not supplied. Models were pre-built for each genome
                                                  and used as default.
   --Rscript_bin             (optional) [string]  path to the Rscript bin, aim to allow users to supply an R version other the 
                                                  system wide R version. Package Caret must be installed. (Defaul = Rscript)
   --default_cutoff          (optional) [integer] logistic probablity cutoffs for the "default" stringency (Default = 0.5)
   --exclude_chrom_list      (optional) [string]  a list of comma delimited chromosome to be excluded in the training and 
                                                  testing of the logical regression model (Default = chrM)
   --overwrite               (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bedtools
   bigWigAverageOverBed

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.filter \
   --overwrite=yes \
   --ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.collapse.ctss.bed.gz \
   --ung_ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.unencoded_G.collapse.ctss.bed.gz \
   --tssCluster_bed_path=./demo/output/sc.solo/cluster/demo/bed/demo.tssCluster.bed.gz \
   --training_signal_path=./demo/input/atac/demo.atac.bw \
   --testing_signal_path=./demo/input/atac/demo.atac.bw \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/filter/

scafe.tool.cm.ctss_to_bigwig [top]

This tool converts a ctss bed file into two bigwig file, one for each strand, for visualization purpose

 Usage:
   scafe.tool.cm.ctss_to_bigwig [options] --ctss_bed_path --genome --outputPrefix --outDir
   
   --ctss_bed_path  <required> [string] "collapse" ctss file from scafe.tool.sc.bam_to_ctss.pl
   --genome         <required> [string] name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix   <required> [string] prefix for the output files
   --outDir         <required> [string] directory for the output files
   --overwrite      (optional) [yes/no] erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedGraphToBigWig

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.ctss_to_bigwig \
   --ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.collapse.ctss.bed.gz \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/ctss_to_bigwig/

scafe.tool.cm.cluster [top]

This tool generate TSS cluster from a ctss bed file, using an external tool paraclu with user-defined cutoffs

 Usage:
   scafe.tool.cm.cluster [options] --cluster_ctss_bed_path --outputPrefix --outDir
   
   --cluster_ctss_bed_path       <required> [string]  ctss file used for clustering,
                                                      "collapse" ctss file from scafe.tool.sc.bam_to_ctss.pl, 
                                                      4th column is number of cells and 5th column is number UMI
   --outputPrefix                <required> [string]  prefix for the output files
   --outDir                      <required> [string]  directory for the output files
   --count_ctss_bed_path_list    (optional) [string]  comma delimited list of ctss bed file, 
                                                      using for filtering of clusters based signal 
                                                      (default=$cluster_ctss_bed_path) 
   --count_scope_bed_path        (optional) [string]  a bed file specify the scope for counting in $count_ctss_bed_path_list, 
                                                      using for filtering of clusters based signal
                                                      (default=$cluster_ctss_bed_path) 
   --min_pos_count               (optional) [integer] minimum counts per position, used for filtering the raw signal 
                                                      in $cluster_ctss_bed_path before clustering (default = 1)
   --min_cluster_cpm             (optional) [float]   minimum counts per million (cpm) for a cluster (default = 1e-5)
   --min_summit_count            (optional) [integer] minimum counts at the summit of a cluster (default = 3)
   --min_cluster_count           (optional) [integer] minimum counts within a cluster (default = 5)
   --min_num_sample_expr_cluster (optional) [integer] minimum number of samples (or cells) detected at the 
                                                      summit of a cluster (default = 3)
   --min_num_sample_expr_summit  (optional) [integer] minimum number of samples (or cells) detected within 
                                                      of a cluster (default = 5)
   --merge_dist                  (optional) [integer] maximum distance for merging closely located clusters, 
                                                      -1 to turn off merging (default = -1)
   --overwrite                   (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   paraclu
   paraclu-cut.sh
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.cluster \
   --overwrite=yes \
   --cluster_ctss_bed_path=./demo/output/sc.solo/bam_to_ctss/demo/bed/demo.collapse.ctss.bed.gz \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/cluster/

scafe.tool.cm.annotate [top]

This tool defines tCRE from TSS clusters and annotates them based their overlap with gene models.

 Usage:
   scafe.tool.cm.annotate [options] --tssCluster_bed_path --tssCluster_info_path --genome --outputPrefix --outDir
   
   --tssCluster_bed_path     <required> [string]  bed file contains the ranges of filtered TSS clusters,
                                                  *.tssCluster.*.filtered.bed.gz from scafe.tool.cm.filter.pl
   --tssCluster_info_path    <required> [string]  tsv file contains the information of all TSS clusters,
                                                  *.tssCluster.log.tsv from scafe.tool.cm.filter.pl
   --genome                  <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix            <required> [string]  prefix for the output files
   --outDir                  <required> [string]  directory for the output files
   --up_end5Rng              (optional) [integer] TSS clusters will be classified as gene TSS, exonic, intron 
                                                  and intergenic. $up_end5Rng determines the range upstream of 
                                                  annotated gene TSS to be used for gene TSS assignment 
                                                  (default = 500)
   --dn_end5Rng              (optional) [integer] TSS clusters will be classified as gene TSS, exonic, intron 
                                                  and intergenic. $dn_end5Rng determines the range downstream of 
                                                  annotated gene TSS to be used for gene TSS assignment 
                                                  (default = 500)
   --exon_slop_rng           (optional) [integer] TSS clusters will be classified as gene TSS, exonic, intron 
                                                  and intergenic. $exon_slop_rng determines the range to be extended
                                                  (i.e. slopped) from exon for assignment of exonic class. 
                                                  Used -1 to NOT to extend (default = -1)
   --merge_dist              (optional) [integer] TSS clusters outside annotated gene promoters are grouped
                                                  as "dummy genes" (for operational uniformity) by merging closely 
                                                  located TSS clusters.  $merge_dist determines the maximum distances 
                                                  between TSS clusters to be merged (default = 500)
   --addon_length            (optional) [integer] see $merge_dist. add-on "dummy transcrips" will assigned to TSS cluster of 
                                                  "dummy genes" (for operational uniformity).$addon_length determines 
                                                  the length of these add-on "dummy transcrips" (default = 500).
   --proximity_slop_rng      (optional) [integer] TSS clusters will be assigned to annotated gene TSS are "proximal"
                                                  TSS clusters. $proximity_slop_rng determines the range to be extended
                                                  (i.e. slopped) from gene TSS for assignment of proximal TSS clusters. 
                                                  (default = 500)
   --merge_strandness        (optional) [string]  see $merge_dist. $merge_strandness decides the merge to be 
                                                  strand-aware ("stranded") or strand-agnostic "strandless".
                                                  (default = strandless)
   --proximal_strandness     (optional) [string]  closely located proximal TSS clusters are merged  
                                                  tCREs. $proximal_strandness decides the merge to be 
                                                  strand-aware ("stranded") or strand-agnostic "strandless".
                                                  (default = stranded)
   --CRE_extend_size         (optional) [integer] tCREs were defined by merging the extended ranges of TSS clusters.
                                                  $CRE_extend_size determine the size of this range (both sides of 
                                                  summit) (default = 500)
   --CRE_extend_upstrm_ratio (optional) [float]   see $CRE_extend_size. $CRE_extend_upstrm_ratio determines the ratio 
                                                  (X:1) of flanking sizes on the upstream and downstream of summit. 
                                                  e.g. $CRE_extend_upstrm_ratio=4, upstream and downstream size will be 
                                                  taken as 4:1 ratio. $CRE_extend_size=500 and $CRE_extend_upstrm_ratio=4,
                                                  upstream and downstream will be 400 and 100 respectively 
                                                  (default = 4)
   --overwrite               (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.cm.annotate \
   --overwrite=yes \
   --tssCluster_bed_path=./demo/output/sc.solo/filter/demo/bed/demo.tssCluster.default.filtered.bed.gz \
   --tssCluster_info_path=./demo/output/sc.solo/filter/demo/log/demo.tssCluster.log.tsv \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/sc.solo/annotate/

scafe.tool.bk.subsample_ctss [top]

This tool subsample a ctss bed file from bulk CAGE ctss

 Usage:
   scafe.tool.bk.subsample_ctss [options] --UMI_CB_ctss_bed_path --subsample_num --outputPrefix --outDir
   
   --long_ctss_bed_path <required> [string]  ctss file for subsampling, one line read in "long" format,
                                             *long.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
   --subsample_num      <required> [integer] number of UMI to be subsampled
   --outputPrefix       <required> [string]  prefix for the output files
   --outDir             <required> [string]  directory for the output files
   --overwrite          (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.bk.subsample_ctss \
   --overwrite=yes \
   --long_ctss_bed_path=./demo/output/bk.solo/bam_to_ctss/demo/bed/demo.long.ctss.bed.gz \
   --subsample_num=100000 \
   --outputPrefix=demo \
   --outDir=./demo/output/bk.subsample/subsample_ctss/

scafe.tool.bk.pool [top]

This tool pools multiple bulk CAGE ctss bed file

 Usage:
   scafe.tool.bk.pool [options] --lib_list_path --genome --outputPrefix --outDir
   
   --lib_list_path <required> [string]  a list of libraries, in formation of 
                                        <lib_ID><\t><long_ctss_bed><\t><collapse_ctss_bed>
                                        lib_ID = Unique ID of the cellbarcode
                                        long_ctss_bed = *long.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
                                        collapse_ctss_bed = *collapse.ctss.bed.gz from scafe.tool.bk.bam_to_ctss.pl, 
   --genome        <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix  <required> [string]  prefix for the output files
   --outDir        <required> [string]  directory for the output files
   --max_thread    (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                        avoid memory overflow (default=5)
   --overwrite     (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.bk.pool \
   --overwrite=yes \
   --lib_list_path=./demo/input/bk.pool/lib_list_path.txt \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/bk.pool/pool/

scafe.tool.bk.count [top]

This tool counts the CAGE reads within a set of user-defined regions, e.g. tCRE, and returns the reads per regions

 Usage:
   scafe.tool.bk.count [options] --countRegion_bed_path --ctss_bed_path --outputPrefix --outDir
   
   --countRegion_bed_path   <required> [string] bed file contains the regions for counting CTSS, e.g. tCRE ranges, 
                                                *.CRE.coord.bed.gz from scafe.tool.cm.annotate.pl
   --ctss_bed_path          <required> [string] ctss file for counting,
                                                *CB.ctss.bed.gz from scafe.tool.sc.bam_to_ctss.pl, 
                                                4th column cellbarcode and 5th column is number UMI
   --outputPrefix           <required> [string] prefix for the output files
   --outDir                 <required> [string] directory for the output files
   --overwrite              (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.bk.count \
   --overwrite=yes \
   --countRegion_bed_path=./demo/output/bk.solo/annotate/demo/bed/demo.CRE.annot.bed.gz \
   --ctss_bed_path=./demo/output/bk.solo/bam_to_ctss/demo/bed/demo.collapse.ctss.bed.gz \
   --outputPrefix=demo \
   --outDir=./demo/output/bk.solo/count/

scafe.tool.bk.bam_to_ctss [top]

This tool converts a bulk CAGE bam file to a ctss bed file, identifies read 5'end (capped TSS, i.e. ctss), extracts the unencoded G information, pileup ctss, and deduplicate the UMI

 Usage:
   scafe.tool.bk.bam_to_ctss [options] --bamPath --genome --outputPrefix --outDir
   
   --bamPath      <required> [string]  bam file (of CAGE reads), can be read 1 only or pair-end
   --genome       <required> [string]  name of genome reference, e.g. hg19.gencode_v32lift37
   --outputPrefix <required> [string]  prefix for the output files
   --outDir       <required> [string]  directory for the output files
   --include_flag (optional) [string]  samflag to be included, comma delimited 
                                       e.g. '64' to include read1, (default=null)
   --exclude_flag (optional) [string]  samflag to be excluded, comma delimited, 
                                       e.g. '128,256,4' to exclude read2, secondary alignment 
                                       and unaligned reads (default=128,256,4)
   --min_MAPQ     (optional) [integer] minimum MAPQ to include (default=0)
   --max_thread   (optional) [integer] maximum number of parallel threads, capped at 10 to 
                                       avoid memory overflow (default=5)
   --overwrite    (optional) [yes/no]  erase outDir/outputPrefix before running (default=no)

 Dependencies:
   bedtools
   samtools

 To demo run, cd to SCAFE dir and run:
   scafe.tool.bk.bam_to_ctss \
   --overwrite=yes \
   --bamPath=./demo/input/bk.solo/demo.CAGE.bam \
   --genome=hg19.gencode_v32lift37 \
   --outputPrefix=demo \
   --outDir=./demo/output/bk.solo/bam_to_ctss/

scafe.download.resources.genome [top]

This script download reference genome data and save in ./resources/genome.

 Usage:
   download.resources.genome --genome
   
   --genome <required> [string] name of genome reference, currently available genomes:
                                hg19.gencode_v32lift37
                                hg38.gencode_v32
                                mm10.gencode_vM25
                                TAIR10.AtRTDv2

 Dependencies:
   wget
   tar

 To demo run, cd to SCAFE dir and run:
   scafe.download.resources.genome \
   --genome=hg19.gencode_v32lift37

scafe.download.demo.input [top]

This scripts download demo data and save in ./demo/input dir.

 Usage:
   download.demo.input

 Dependencies:
   wget
   tar

 To demo run, cd to SCAFE dir and run:
   scafe.download.demo.input

scafe.demo.test.run [top]

This scripts test run for demo data in the ./demo/input dir. It runs user-selected workflows. Demo input data must be downloaded from using ./script/download.demo.input Genome reference hg19.gencode_v32lift37 must be downloaded using ./scripts/download.resources.genome

 Usage:
   demo.test.run [options] --run_outDir
   
   --run_outDir           <required> [string]  directory for the output test runs
   --workflow             (optional) [string]  comma delimited list of workflows, 
                                               or use 'all' to run all workflows.
                                               Available workflows includes,
                                               scafe.workflow.sc.subsample ---> workflow, single-cell mode, subsample ctss
                                               scafe.workflow.sc.solo ---> workflow, single-cell mode, process a single sample
                                               scafe.workflow.sc.pool ---> workflow, single-cell mode, pool ctss of multiple samples
                                               scafe.workflow.bk.subsample ---> workflow, bulk mode, subsample ctss
                                               scafe.workflow.bk.solo ---> workflow, bulk mode, process a single sample
                                               scafe.workflow.bk.pool ---> workflow, bulk mode, process a single sample
                                               (default=all)
   --overwrite            (optional) [yes/no]  erase run_outDir before running (default=no)

 Dependencies:
   R packages: 'ROCR','PRROC', 'caret', 'e1071', 'ggplot2', 'scales', 'reshape2'
   bigWigAverageOverBed
   bedGraphToBigWig
   bedtools
   samtools
   paraclu
   paraclu-cut.sh

 To demo run, cd to SCAFE dir and run:
   scafe.demo.test.run \
   --overwrite=yes \
   --run_outDir=./demo/output/

scafe.check.dependencies [top]

This scripts check the integrity of tools and workflow scripts, 3rd executable dependencies and R packages.

 Usage:
   check.dependencies

 Dependencies:
   wget
   tar
   Rscript

 To demo run, cd to SCAFE dir and run:
   scafe.check.dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SCAFE Tools and Workflows

scafe.workflow.sc.subsample [top]

scafe.workflow.sc.solo [top]

scafe.workflow.sc.pool [top]

scafe.workflow.bk.subsample [top]

scafe.workflow.bk.solo [top]

scafe.workflow.bk.pool [top]

scafe.tool.sc.subsample_ctss [top]

scafe.tool.sc.pool [top]

scafe.tool.sc.link [top]

scafe.tool.sc.count [top]

scafe.tool.sc.bam_to_ctss [top]

scafe.tool.cm.remove_strand_invader [top]

scafe.tool.cm.prep_genome [top]

scafe.tool.cm.filter [top]

scafe.tool.cm.ctss_to_bigwig [top]

scafe.tool.cm.cluster [top]

scafe.tool.cm.annotate [top]

scafe.tool.bk.subsample_ctss [top]

scafe.tool.bk.pool [top]

scafe.tool.bk.count [top]

scafe.tool.bk.bam_to_ctss [top]

scafe.download.resources.genome [top]

scafe.download.demo.input [top]

scafe.demo.test.run [top]

scafe.check.dependencies [top]

FilesExpand file tree

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md

SCAFE Tools and Workflows

scafe.workflow.sc.subsample [top]

scafe.workflow.sc.solo [top]

scafe.workflow.sc.pool [top]

scafe.workflow.bk.subsample [top]

scafe.workflow.bk.solo [top]

scafe.workflow.bk.pool [top]

scafe.tool.sc.subsample_ctss [top]

scafe.tool.sc.pool [top]

scafe.tool.sc.link [top]

scafe.tool.sc.count [top]

scafe.tool.sc.bam_to_ctss [top]

scafe.tool.cm.remove_strand_invader [top]

scafe.tool.cm.prep_genome [top]

scafe.tool.cm.filter [top]

scafe.tool.cm.ctss_to_bigwig [top]

scafe.tool.cm.cluster [top]

scafe.tool.cm.annotate [top]

scafe.tool.bk.subsample_ctss [top]

scafe.tool.bk.pool [top]

scafe.tool.bk.count [top]

scafe.tool.bk.bam_to_ctss [top]

scafe.download.resources.genome [top]

scafe.download.demo.input [top]

scafe.demo.test.run [top]

scafe.check.dependencies [top]