WES(一)——文件准备
1. 下载参考文件
2. 建立索引
3. 建立Dictionary文件
4. 下载已知variation站点文件与VCF索引文件
mkdir aligned_reads reads scripts results data #创建文件夹#!/bin/bash# Script to call germline variants in a human WGS paired end reads 2 X 100bp
# Following GATK4 best practices workflow - https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
# This script is for demonstration purposes onlyIf false #避免重复下载,耗时耗内存
then
# 下载数据
Wget (-p /绝对路径/reads) ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/sequence_read/SRR062634_1.filt.fastq.gz
wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/HG00096/sequence_read/SRR062634_2.filt.fastq.gzecho "Run Prep files..."
# ---------------------------download and gunzip .gz-----------------------
#wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
#gunzip hg38.fa.gz# ---------- index ref - .fai file before running haplotype caller---------
#samtools faidx hg38.fa# ----------- ref dict - .dict file before running haplotype caller---------
/data/software/gatk-4.4.0.0/gatk CreateSequenceDictionary R=hg38.fa O=hg38.dict# ---------download known sites files for BQSR from GATK resource bundle----
wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
#wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idxfi
所有准备文件已经备好,如下图片: