Bioinformatics Data Skills by Oreilly学习笔记-2

Chapter2 Setting Up and Managing a Bioinformatics Project

Organizing Data to Automate File Processing Tasks

  1. Shell Expansion Tips
$ echo dog-{gone,bowl,bark}
dog-gone dog-bowl dog-bark
$ mkdir -p zmays-snps/{data/seqs,scripts,analysis}
#在zmays-snps目录下同时创建多个子目录
$ cd data
$ touch seqs/zmays{A,B,C}_R{1,2}.fastq
$ ls seqs/
zmaysA_R1.fastq zmaysB_R1.fastq zmaysC_R1.fastq
zmaysA_R2.fastq zmaysB_R2.fastq zmaysC_R2.fastq
$ ls seqs/zmaysB*
zmaysB_R1.fastq zmaysB_R2.fastq

OS X and Linux systems have a limit to the number of arguments that can be supplied to a command (more technically, the limit is to the total length of the arguments)
see “Using find and xargs” on page 411 for the solution

$ ls zmays[AB]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq
$ ls zmays[A-B]_R1.fastq
zmaysA_R1.fastq zmaysB_R1.fastq

Bioinformatics Data Skills by Oreilly学习笔记-2_第1张图片
2. Leading Zeros and Sorting
3. Markdown for Project Notebooks, Formatting Basics
e.g.

# *Zea Mays* SNP Calling
We sequenced three lines of *zea mays*, using paired-end
sequencing. This sequencing was done by our sequencing core and we
received the data on 2013-05-10. Each variety should have **two**
sequences files, with suffixes `_R1.fastq` and `_R2.fastq`, indicating
which member of the pair it is.
## Sequencing Files
All raw FASTQ sequences are in `data/seqs/`:
$ find data/seqs -name "*.fastq"
data/seqs/zmaysA_R1.fastq
data/seqs/zmaysA_R2.fastq
data/seqs/zmaysB_R1.fastq
data/seqs/zmaysB_R2.fastq
data/seqs/zmaysC_R1.fastq
data/seqs/zmaysC_R2.fastq
## Quality Control Steps
After the sequencing data was received, our first stage of analysis
was to ensure the sequences were high quality. We ran each of the
three lines' two paired-end FASTQ files through a quality diagnostic
and control pipeline. Our planned pipeline is:
1. Create base quality diagnostic graphs.
2. Check reads for adapter sequences.
3. Trim adapter sequences.
4. Trim poor quality bases.
Recommended trimming programs:
- Trimmomatic
- Scythe
  1. Using Pandoc to Render Markdown to HTML
    Using Pandoc is very simple—to convert from Markdown to HTML, use the --from mark
    down and --to html options and supply your input file as the last argument:
$ pandoc --from markdown --to html notebook.md > output.html

你可能感兴趣的:(bioinformatics,Bioinformatics)