IT评测·应用市场-qidao123.com

标题: Smart-seq2分析 [打印本页]

作者: 大连全瓷种植牙齿制作中心    时间: 2025-3-22 14:52
标题: Smart-seq2分析
1、概述

Smart-seq2是一种单细胞RNA测序技术,用于分析单个细胞的基因表达情况,并可以对单个细胞的基因表达进行分析。


2、根本原理

Smart-seq2利用了莫罗尼小鼠白血病病毒逆转录酶(MMLV-RT)的两个特性:通过设计oligo(dT)VN Primer作为逆转录引物,利用MMLVRT的模板转换活性,在cDNA的3’端添加一段接头序列,通过该接头序列进行反转录,生成cDNA第一条链。当逆转录酶到达mRNA5’末端时,会连续在末端添加几个胞嘧啶(C)残基。然后添加TSO(template-switching oligo)引物,退火后结合在第一条链的3’端与poly(C)突出杂交,合成第二条链。这样得到的cDNA颠末PCR扩增,然后再纯化后用于测序。
优势

局限性

Pipeline FeaturesDescriptionSourceAssay Typepaired-end plate-based Smart-seq2 Overall workflowQuality control module and transcriptome quantification moduleCode available from GithubWorkflow languageWDLopenWDLGenomic reference sequenceGRCh38 human genome primary sequenceGENCODEGene ModelGENCODE v27 PRI GTF and Fasta filesGENCODEAlignerHISAT2Kim, et al.,2015HISAT2 toolQCMetrics determined using Picard command line toolsPicard ToolsEstimation of gene expressionRSEM (rsem-calculate-expression) is used to estimate the gene expression profile. The input of RSEM is a bam file aligned by HISAT2.Li and Dewey, 2011Data Input File FormatFile format in which sequencing data is providedFASTQData Output File FormatFile formats in which Smart-seq2 pipeline output is providedBAMZarr version 23、数据预处理

起首,需要对原始测序数据进行质控和比对。质控可以利用FastQCMultiQC工具来查抄数据质量。比对可以利用HISAT2工具,将测序数据比对到参考基因组1
# 质控fastqc -t 6 -o ./fastqc_result ./RAW/SRR*fastq.gzmultiqc ./fastqc_result
# 比对hisat2 -p 10 -x genome_index -1 sample_1.fastq.gz -2 sample_2.fastq.gz -S output.samsamtools sort -O bam -@ 10 -o output.bam output.samsamtools index output.bam4、表达矩阵构建与seurat对象创建

利用featureCounts工具对比对后的BAM文件进行定量,生成基因表达矩阵2
featureCounts -T 10 -p -t exon -g gene_name -a annotation.gtf -o counts.txt *.bam 根本批量运行流程如下:
  1. #!/bin/bash
  2. # 检查是否安装了GNU Parallel
  3. if ! command -v parallel &> /dev/null; then
  4.     echo "GNU Parallel could not be found, please install it first."
  5.     exit 1
  6. fi
  7. # 创建namelist文件,列出所有样本文件夹名称
  8. ls -d */ > namelist
  9. # 定义进一步处理RNA序列的函数
  10. process_rna() {
  11.     local sample=$1
  12.     # 去除末尾的斜杠
  13.     sample=$(echo "$sample" | sed 's:/*$::')
  14.     echo "${sample} RNA processing start"
  15.    
  16.     if [ ! -d "${sample}" ]; then
  17.         echo "Directory ${sample}does not exist, skipping."
  18.         return
  19.     fi
  20.    
  21.     cd "${sample}" || { echo "Failed to enter ${sample}"; continue; }
  22.    
  23.     source /data5/xxx/zengchuanj/Software/MACS3/MyPythonEnv/bin/activate
  24.     trim_galore -j 20 --phred33 --gzip --trim-n -o result --paired *.fastq.gz
  25.    
  26.     cd result
  27.    
  28.     source /data5/tan/zengchuanj/conda/bin/activate
  29.     conda activate HiC-Pro
  30.     fastp -i "../${sample}_clean_R1.fastq.gz" -I "../${sample}_clean_R2.fastq.gz" -o "${sample}_clean_R1.fq.gz" -O "${sample}_clean_R2.fq.gz" -q 20 -w 16 -n 5
  31.     fastqc -t 10 "${sample}_clean_R1.fq.gz" "${sample}_clean_R2.fq.gz"
  32.     hisat2 -p 20 -x /data5/tan/zengchuanj/pipeline/HIC/juicer/references/mm10/mm10 -1 ${sample}_clean_R1_val_1.fq.gz -2 ${sample}_clean_R2_val_2.fq.gz 2>"${sample}_hisat.txt" | samtools view -o "${sample}_outname.bam"
  33.     samtools sort -@ 20 -o ${sample}.sort.bam ${sample}_outname.bam
  34.     ##featurecounts定量
  35.     # 使用ensamble的GTF
  36.     echo ${sample} 'Feature counts start'
  37.     ### (ucsc的话 -t exon)
  38.     # 根据gene与exon调整参数
  39.     featureCounts -p --countReadPairs -T 20 -t gene -a /data5/xxx/zengchuanj/xxx/references/GTF/Mus_musculus.GRCm38.102.gtf -o ${sample}_count.txt ${sample}.sort.bam
  40.     echo "${sample} RNA processing finish"
  41.     cd ../../..
  42. }
  43. # 使用parallel并行处理每个样本,并限制最大线程数为15
  44. export -f process_rna
  45. # 并行进一步处理RNA序列
  46. parallel -j 10 process_rna ::: $(cat namelist)
复制代码
[code]# 定义一个用于处理Smart-seq数据并创建Seurat对象的函数process_smart_to_seurat_data




欢迎光临 IT评测·应用市场-qidao123.com (https://dis.qidao123.com/) Powered by Discuz! X3.4