RNA-seq data processing using Galaxy and Linux

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

A Fine-Tuned Universe

RNA-seq data processing using Galaxy and Linux 본문

Bioinformatics

RNA-seq data processing using Galaxy and Linux

정재준 2020. 11. 11. 14:36

728x90

1. Upload data

Filezilla 설치 후 galaxy ftp에 접속, fastq.gz 파일을 업로드한다

Host: usegalaxy.org

Username:

Password:

2. Fetch data

usegalaxy.org 접속 후 FTP에 올려놓은 파일을 가져온다

3. Quality check - FastQC

FastQC로 raw data의 quality check 수행, default 값으로

- Short read data from your current history: input file 선택

- Contaminant list: 선택안함

- Adapter list: 선택안함

- Submodule and Limit specifing file: 선택안함

- Disable grouping of bases for reads >50bp: 선택안함

- Lower limit on the length of the sequence to be shown in the report: 입력안함

- length of Kmer to look for: (default) 7

Execute 클릭하면 output으로 raw data, HTML file

4. Trimming - Sickle

- Single-end or paired-end reads?: Single-end

- Single-end FASTQ reads:

- Quality threshold: 30

- Length threshold: 20

- Don't do 5' trimming: No

- Truncate sequences with Ns at first N position: No

- Execute

- output: Trimmed fastQ files

- output 파일도 용량이 크기 때문에 PC로 다운받은 후 다시 서버에 올리는 것은 비효율적이다. Putty로 서버에 접속 후 trimmed fastQ 파일을 저장하고자 하는 디렉토리에서 wget [link] 명령어를 사용해 galaxy에서 직접 다운로드 받을 수 있다. -b는 background에서 실행하라는 뜻, -O는 저장되는 파일이름 지정

wget -b --user=[username] --password=[password] [galaxy에서 복사한 링크] -O [filename]

5. Quality check after trimming

FastQC로 raw data의 quality check 수행, default 값으로

input: fastq.gz

output: raw data, HTML file

6. 두 개의 fastq를 합치기

galaxy에서 할 경우 concatenate datasets 를 이용하여 두 개의 파일을 하나의 파일로 만들기

리눅스 터미널에서 할 경우

(압축을 풀 경우)

gzip -d [filename]

cat filename_forward_read.fasta filename_reverse_read.fasta > filename.fasta

cat filename_contig1.fasta filename_contig2.fasta > filename.fasta

(압축을 안 풀 경우) <- 추천

gzip -dc forward.fastq.gz reverse.fastq.gz | gzip -c > filename.gz

7. BWA

(galaxy)

bwa mem 이용

$source

bwa index genome.fasta [genome fasta 파일]

bwa mem genome.fasta genome.fastq.gz -k 20 -t 36 > Result.sam

(예시 bwa mem NIES-298.fasta A1.fastq.gz -k 20 -t 36 > A1_NIES298.sam)

8. Samtools

(galaxy) 아래와 같은 순서대로 분석

bwa mem (index 파일은 자동으로 생성된다고 함)

samtools sort

bedtools BAM to BED

bedtools Compute both the depth and breadth of coverage

-a genome.gff

-b bed file

Text transformation with sed

(Linux)

samtools view -S -b filename.sam > filename.bam

samtools sort filename.bam -o filename-sorted.bam

samtools index -b filename-sorted.bam

Galaxy에도 samtools sort 가 있긴 한데

sort key라는게 뭔지 모르겠다. 일단 실행시켜보고 공부해보자

9. Bedtools

bedtools bamtobed -i filename-sorted.bam > filename-sorted.bed

bedtools coverage -a 0000.gff -b 0000.bed > 00000.txt && sed -n '/CDS/p' 00000.txt > 0000-CDS.txt

LIST

저작자표시 비영리 변경금지

'Bioinformatics' 카테고리의 다른 글

Statistical analysis of RNA-seq data (limma) (1)	2020.11.18
Openrefine을 이용한 텍스트 편집 (0)	2020.11.16
Artemis (Genome analysis, visualization, a circular map) (0)	2020.08.05
Pan-genome 분석 나름대로 순서 정리 (0)	2020.03.12
standalone BLAST (계속 수정 중) (0)	2019.09.19

'Bioinformatics' Related Articles

A Fine-Tuned Universe

RNA-seq data processing using Galaxy and Linux 본문

RNA-seq data processing using Galaxy and Linux

'Bioinformatics' 카테고리의 다른 글

티스토리툴바