在线销售网站设计文献,有关电子商务网站建设的论文,没有域名可以做网站,药品网上商城在进行变异检测时#xff0c;以群体基因组重测序数据为例#xff0c;涉及到的个体基本都是上百个#xff0c;而其中大多数流程均是重复的步骤。 本文将基于GATK进行SNP calling的流程写入循环#xff0c;便于批量分析。
1 涉及变量
1.工作目录work_dir/ 2.参考基因组ref…在进行变异检测时以群体基因组重测序数据为例涉及到的个体基本都是上百个而其中大多数流程均是重复的步骤。 本文将基于GATK进行SNP calling的流程写入循环便于批量分析。
1 涉及变量
1.工作目录work_dir/ 2.参考基因组ref_genome.fa 3.Reads列表read_list.txt 4.测序平台Illumina 5.调用线程数
2 调用数据
1.参考基因组ref_genome.fa 2.重测序数据sample1/sample1_1.fq.gz、sample1/sample1_2.fq.gz…… 3.Reads列表read_list.txt 生成方法预先将存放各个个体Reads的文件夹放入一个文件夹work_dir/然后使用下列命令生成
ls work_dir/ read_list.txt3 主要脚本
usage:
bash GATK_pipeline.sh work_dir/ ref_genome.fa read_list.txt Illumina 10GATK_pipeline.sh #---------------------------------------------------------------#
# objection defined by user #
#---------------------------------------------------------------#set -au# 1.
# Master dir.:
WORK_dir$1# 2.
# Reference genome:
REF$2# 3.
# Read list:
READ_list$3# 4.
# Seqencing platform:
PL$4# 5.
# number of threads:
NT$5#---------------------------------------------------------------#
# main loop for SNPs calling by gatk pipeline #
#---------------------------------------------------------------##READ_list.txt is a list of read groups.
while read -r READdoSAMPLESM_${READ}
ID${READ}
READ1${WORK_dir}${READ}_1.fq
READ2${WORK_dir}${READ}_2.fq
OUT${READ}#1.
#Alignning reads to reference genome by BWA-MEM2-mem, producing a .sam data
bwa-mem2 \mem \-M \-t ${NT} \-R RG\tID:${ID}\tSM:${SAMPLE}\tPL:${PL} \${REF} \${READ1} \${READ2} \ ${OUT}.sam#2.
#Sorting .sam by gatk-SortSam, producing a .bam data
gatk \SortSam \-I ${OUT}.sam \-O ${OUT}.bam \-SO coordinate \-VALIDATION_STRINGENCY LENIENT \-CREATE_INDEX true \-TMP_DIR ./${OUT}tmp.sort
#3.
#Marking dupulications in .bam by gatk-MarkDuplicates
#producing a .dup.bam and .dup.txt data
gatk \MarkDuplicates \-I ${OUT}.bam \-O ${OUT}.dup.bam \-M ${OUT}.dup.txt \-REMOVE_DUPLICATES true \-VALIDATION_STRINGENCY LENIENT \-CREATE_INDEX true \-TMP_DIR ${OUT}tmp.dup#4.
#QC by samtools-flagstat, producing a .dup.bam.stat data
samtools \flagstat \${OUT}.dup.bam \ ${OUT}.dup.bam.stat#5.
#Calling SNPs by gatk-HaplotypeCaller, producing a .dup.vcf data
gatk \HaplotypeCaller \-R ${REF} \-I ${OUT}.dup.bam \-O ${OUT}.dup.vcfdone $READ_list
##