A Fine-Tuned Universe

Nucmer 결과 파일의 Header 정리 본문

Bioinformatics

Nucmer 결과 파일의 Header 정리

정재준 2019. 7. 19. 23:45
728x90

Mummer를 쓰다보면 불친절한 output에 당황하게 된다.


nucmer 명령어를 사용하니 snps 파일이 만들어졌는데 header 가 무얼 의미하는지 몰라서 찾다가 결국 매뉴얼에 쓰여있는 걸 발견했다.

다시 찾기 싫어서 아래에 복붙한다.


Output is to stdout and is slightly different depending on which command switches are set. For instance, by default the output is arranged in a table style,


however if the -T option is active, the output will be tab-delimited.

-T를 붙이면 Tab-delimited 로 나온다


Also, the sequence files, alignment type and column headers are output by default, however if the -H option is active, the headers will be stripped from the output.

-H를 붙이면 header 가 없어진다


Other options like -l -C -x will add or remove columns from the output.


So, for description purposes, all possible column headers will be given and it is up to the user to pair the column header with the column number. The descriptions for each header tag follows.



자 이제부터 설명시작

[P1] position of the SNP in the reference sequence.

For indels, this position refers to the 1-based position of the first character before the indel, e.g. for an indel at the very beginning of a sequence this would report 0. For indels on the reverse strand, this position refers to the forward-strand position of the first character before indel on the reverse-strand, e.g. for an indel at the very end of a reverse complemented sequence this would report 1.


[SUB] character or gap at this position in the reference [SUB] character or gap at this position in the query [P2] position of the SNP in the query sequence


[BUFF] distance from this SNP to the nearest mismatch (end of alignment, indel, SNP, etc) in the same alignment [DIST] distance from this SNP to the nearest sequence end

[R] number of repeat alignments which cover this reference position

[Q] number of repeat alignments which cover this query position

[LEN R] length of the reference sequence

[LEN Q] length of the query sequence

[CTX R] surrounding reference context

[CTX Q] surrounding query context

[FRM] sequence direction (NUCmer) or reading frame (PROmer)

[TAGS] the reference and query FastA IDs respectively.


All positions are relative to the forward strand of the DNA input sequence, while the [BUFF] distance is relative to the sorted sequence.

LIST