使用HiC-Pro软件分析Hi-C数据,在bowtie2比对完_R1、_R2.fastq后进行pair流程时出现错误提示:

2023年 05月 25日 星期四 11:51:28 CST
Pairing of R1 and R2 tags ...
Logs: logs/GSMxxxxx/mergeSAM.log
make: *** [/mnt/data/soft/hicpro/HiC-Pro_3.1.0/bin/../scripts//Makefile:144:bowtie_pairing] 错误 1

/home/miniconda3/envs/hic/bin/python /mnt/data/soft/hicpro//HiC-Pro_3.1.0/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R1_GRCh38_noalt_as.bwt2merged.bam -r bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R2_GRCh38_noalt_as.bwt2merged.bam -o bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_GRCh38_noalt_as.bwt2pairs.bam
[E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R1_GRCh38_noalt_as.bwt2merged.bam'
[E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R2_GRCh38_noalt_as.bwt2merged.bam'
## mergeBAM.py
## forward= bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R1_GRCh38_noalt_as.bwt2merged.bam
## reverse= bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_R2_GRCh38_noalt_as.bwt2merged.bam
## output= bowtie_results/bwt2/GSMxxxxx/GSMxxxxx_GRCh38_noalt_as.bwt2pairs.bam
## min mapq= 10
## report_single= False
## report_multi= False
## verbose= True
## Merging forward and reverse tags ...
Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.

于是查看GSMxxxxx_R1.fastq.gz和GSMxxxxx_R2.fastq.gz的两个gz文件内容:

zcat GSMxxxxx_R1.fastq.gz |head -10
@SRRxxxxxx49.lite.1 SN7018393:385:HKL2KBCXY:2:1101:1203:2051 length=101
NTATAGAATTATGGATTGATAGTTTGTTTCTGTTGCTGCACTTGTTTTTTTTTCTATTTGTAAATCTGCTGTCATACTTATCTTTATTTCTGCGTATATGA
+SRRxxxxxx49.lite.1 SN7018393:385:HKL2KBCXY:2:1101:1203:2051 length=101
?????????????????????????????????????????????????????????????????????????????????????????????????????
@SRRxxxxxx49.lite.2 SN7018393:385:HKL2KBCXY:2:1101:1182:2070 length=101
NTTTATGGTACGCTGGACTTTGTAGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCA
+SRRxxxxxx49.lite.2 SN7018393:385:HKL2KBCXY:2:1101:1182:2070 length=101
?????????????????????????????????????????????????????????????????????????????????????????????????????
@SRRxxxxxx49.lite.3 SN7018393:385:HKL2KBCXY:2:1101:1202:2080 length=101
TATTAATGGAGGAATTTATTTGTGGTTGTGTGTGGATATTTTTATTTAAACACAAGTGGCATAAACTCTGTGGGTTCCTGGTAATCAATAATAGAAGATGA
zcat GSMxxxxx_R2.fastq.gz |head -10
@SRRxxxxxx50.lite.1 SN7018393:385:HKL2KBCXY:2:1101:1203:2051 length=101
NNNNNNNNACAACTTTGACACAAACTNNNNNNTCNNNNNNNNNNNNNNNNNNNNNNNNNCNNGCNCAGGAAACATGATGAAACTACATGAAGGAACATCAN
+SRRxxxxxx50.lite.1 SN7018393:385:HKL2KBCXY:2:1101:1203:2051 length=101
?????????????????????????????????????????????????????????????????????????????????????????????????????
@SRRxxxxxx50.lite.2 SN7018393:385:HKL2KBCXY:2:1101:1182:2070 length=101
NNNNNACCAAAGACGAGCGCCTTTACGCTTGCCTTTAGTACCTCGCANCGGNTGCGGACGACCAGGGCGAGCGCCAGAACGTTTTTTACCTTTAGACATTN
+SRRxxxxxx50.lite.2 SN7018393:385:HKL2KBCXY:2:1101:1182:2070 length=101
?????????????????????????????????????????????????????????????????????????????????????????????????????
@SRRxxxxxx50.lite.3 SN7018393:385:HKL2KBCXY:2:1101:1202:2080 length=101
NCCCAAGCTGGTCTCAAACTTCTGAGCTCAAGGGATCCACCTGCCTTGGCCTTCCAAAGTGCTGGGATTACAGGTACGAGCCACCACACAGAGCCGCAAAN
@tag的名称不一致导致,_R1、_R2无法pair。
将SRRxxxxxx50替换成SRRxxxxxx49
zcat GSMxxxxx_R2.fastq.gz | sed 's/SRRxxxxxx50/SRRxxxxxx49/' |gzip > GSMxxxxx_R22.fastq.gz

多线程操作

zcat GSMxxxxx_R2.fastq.gz | parallel --pipe -k -j 64 sed 's/SRRxxxxxx50/SRRxxxxxx49/' |pigz > GSMxxxxx_R222.fastq.gz

替换字符即可。

标签: Hi-C, HiC-Pro