• CRCBuilder_Python3            

  • PURPOSE:
          To build Core Regulatory Circuitry from H3K27ac ChIP-seq data

    INSTALLATION:
    1)Install Miniconda environment:
          wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
          bash Miniconda3-latest-Linux-x86_64.sh
          source ~/.bashrc
          conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
          conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
          conda config --add channels http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
          conda config --set show_channel_urls yes
    2)conda create -n crcbuilder python=3.8
          source activate crcbuilder
          conda install -c pwwang bwtool
          conda install -c bioconda meme
          conda install -c bioconda pyfasta
          conda install -c conda-forge networkx
          conda install -c conda-forge matplotlib-base

    REQUIREMENTS:
          Fasta files for the genome(e.g. hg38.fa) used must be placed in a directory that will be specified when runing the program (-f option). They can be downloaded from ftp://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ (it will need to be unzipped)
          The bigwig(.bw or .bigwig) file of sequencing reads for H3K27ac and its super-enhancer table (_peaks_SuperEnhancers.table.xls) generated by ROSE software. The filenames should begin with the same words. e.g.: Hela_1-H3K27ac.bw  Hela_1-H3K27ac_peaks_SuperEnhancers.table.xls

    CONTENT :
          CRCBuilder.py: main program
          utils.py: utility methods
          TFlist_NMid_hg.txt: TFs used and their human NMIDs
          source/CIS_BP_HOCOMOCOv11_motif.meme: Motifs library
          source/MotifDictionary.txt: TFs used and their associated motif names

    USAGE:
          The program is run by calling CRCBuilder.py from the directory containing all the documents:
          python CRCBuilder.py -s [--step] -b [--bw_dir] -f [--fasta]
          -s [--step]
            Select the step to start with (CalculatePromoterActivity(CPA) / findCanidateTFs(FCT) / findMotifs(FM) / buildCRCs(BC)).
          -b [--bw_dir]
            The directory contains bigwig files for H3K27ac sequencing reads.
          -f [--fasta]
            The path of fasta file for the genome version used, the suffix must be '.fa' or '.fasta'.

    EXAMPLE:
          python CRCBuilder.py -s CPA -b /mnt/data/Hela-H3K27ac/ -f /mnt/genome/hg38.fa
          python CRCBuilder.py -s FM -b /mnt/data/Hela-H3K27ac/
            (-f option could be omitted in step findMotifs(FM) and buildCRCs(BC))

    OUTPUT FILES:
          SAMPLE_*_ASSIGNMENT_GENES.txt: list of gene names for genes assigned to SEs.
          SAMPLE_*_ASSIGNMENT_TRANSCRIPTS.txt: Transcripts NMIDs for transcripts assigned to SEs.
          SAMPLE_*_bg.meme: DNA background sequence file used with FIMO.
          SAMPLE_*_CANDIDATE_TF_AND_SUPER_TABLE.txt : table containing the candidate TFs and the location of their associated SEs.
          SAMPLE_*_connections.txt : table containing TF-TF interconnections.
          SAMPLE_*_EXPRESSED_GENES.txt: list of genes considered expressed (top 2/3).
          SAMPLE_*_EXPRESSED_TRANSCRIPTS.txt: list of transcripts considered expressed.
          SAMPLE_*_SUBPEAKS.fa: fasta file of SE constituent sequences used with FIMO.
          mergeAUTOREG_*.txt: list of TFs gene names predicted to bind their own SE.
          mergeCRC_SCORES_*.txt: all possible CRCs, ranked based on the average frequency of occurrence of the TFs they contain across all the possible interconnected auto regulatory loops.

    CRC GRAPH CONVERT:
          Submit the CRC members extracted from mergeCRC_SCORES_*.txt:

          
          

  • 苏打实验室 ▪ Suda-Lab.com   Copyright © 2021-2024
    苏ICP备2021035990号-1 苏公网安备 32059002003657号

    Research Interests of Hematology and Oncology, Suzhou, China.


    Connected Succeed! 2024/12/21 23:39:26 B 3.12.146.153