PURPOSE:
To build Core Regulatory Circuitry from H3K27ac ChIP-seq data
INSTALLATION:
1)Install Miniconda environment:
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
conda config --set show_channel_urls yes
2)conda create -n crcbuilder python=3.8
source activate crcbuilder
conda install -c pwwang bwtool
conda install -c bioconda meme
conda install -c bioconda pyfasta
conda install -c conda-forge networkx
conda install -c conda-forge matplotlib-base
REQUIREMENTS:
Fasta files for the genome(e.g. hg38.fa) used must be placed in a directory that will be specified when runing the program (-f option). They can be downloaded from ftp://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ (it will need to be unzipped)
The bigwig(.bw or .bigwig) file of sequencing reads for H3K27ac and its super-enhancer table (_peaks_SuperEnhancers.table.xls) generated by ROSE software. The filenames should begin with the same words. e.g.: Hela_1-H3K27ac.bw Hela_1-H3K27ac_peaks_SuperEnhancers.table.xls
CONTENT :
CRCBuilder.py: main program
utils.py: utility methods
TFlist_NMid_hg.txt: TFs used and their human NMIDs
source/CIS_BP_HOCOMOCOv11_motif.meme: Motifs library
source/MotifDictionary.txt: TFs used and their associated motif names
USAGE:
The program is run by calling CRCBuilder.py from the directory containing all the documents:
python CRCBuilder.py -s [--step] -b [--bw_dir] -f [--fasta]
-s [--step]
Select the step to start with (CalculatePromoterActivity(CPA) / findCanidateTFs(FCT) / findMotifs(FM) / buildCRCs(BC)).
-b [--bw_dir]
The directory contains bigwig files for H3K27ac sequencing reads.
-f [--fasta]
The path of fasta file for the genome version used, the suffix must be '.fa' or '.fasta'.
EXAMPLE:
python CRCBuilder.py -s CPA -b /mnt/data/Hela-H3K27ac/ -f /mnt/genome/hg38.fa
python CRCBuilder.py -s FM -b /mnt/data/Hela-H3K27ac/
(-f option could be omitted in step findMotifs(FM) and buildCRCs(BC))
OUTPUT FILES:
SAMPLE_*_ASSIGNMENT_GENES.txt: list of gene names for genes assigned to SEs.
SAMPLE_*_ASSIGNMENT_TRANSCRIPTS.txt: Transcripts NMIDs for transcripts assigned to SEs.
SAMPLE_*_bg.meme: DNA background sequence file used with FIMO.
SAMPLE_*_CANDIDATE_TF_AND_SUPER_TABLE.txt : table containing the candidate TFs and the location of their associated SEs.
SAMPLE_*_connections.txt : table containing TF-TF interconnections.
SAMPLE_*_EXPRESSED_GENES.txt: list of genes considered expressed (top 2/3).
SAMPLE_*_EXPRESSED_TRANSCRIPTS.txt: list of transcripts considered expressed.
SAMPLE_*_SUBPEAKS.fa: fasta file of SE constituent sequences used with FIMO.
mergeAUTOREG_*.txt: list of TFs gene names predicted to bind their own SE.
mergeCRC_SCORES_*.txt: all possible CRCs, ranked based on the average frequency of occurrence of the TFs they contain across all the possible interconnected auto regulatory loops.
CRC GRAPH CONVERT:
Submit the CRC members extracted from mergeCRC_SCORES_*.txt: