生物信息学 高通量癌症研究
时间:2025-03-09
时间:2025-03-09
基于高通量测序技术的癌症研究
林钊linzhao@
Cancer Background
CACER GENOMICSn
Cancers are caused by changes that have occurred in the DNA sequence of the genomes of cancer cells
n
Characteristic:The high heterogenicity in the different cancer tissue,different developing period
nü
Target:a comprehensive catalogue of somatic mutations cancer samples identification of further potentially druggable cancer genes utility of somatic mutations as biomarkers for prognosis
ü
ü
hypothesis-driven
data-driven, large scale analysis
Problems and difficulties of classical methods Unable to detect rare variants,MAF>5%.
Rare SNPs were true diseases risk variants. Classical methods have just looked at cancer cells and sequenced genes known or suspected to be linked to cancer,it may overlooked key mutations, especially new ones. Hypothesis genes chosen, long cycle time and low successful rate.
All these can be solved by sequencing
It’s time to sequencing!
MR Stratton et al. Nature 458, 719-724 (2009) 7
Overview of Cancer SolutionsExome sequencingResearch design 100 tumor and 100 control 50X/sample
Whole genome sequencing10 groups (blood+ tumor tissue) 30X per sample
Cell line
Single-cell sequencing50X exome of 20 normal and 100 tumor single cells;
whole genome sequencing 50X 170800bp PE; 20X 2k40kbp PE;
Deliverable s
find SNV, Indel
find SNV, find SNV, indel, indel CNV,SV,Viru s integrations or rearrangements
find SNV,SV, novel squence by assembly
Cancer Solution 1: Exome squencing
100 tumor and 100 control> 50X/sampleBackground:ØØ
The high heterogenicity in the same cancer tissueRequire hundreds of cases to be sequenced to identify a cancer gene that is mutated in
Scientific goal:ØØ
To detect the most of the somatic mutationsTry to Identify drive and passenger9
Analysis PipelineExome Sequencing:>50×depth
Alignment with SOAPaligner
SNVs detected by SNVdetector or other softwares
Indels (short reads)
Alignment to reference genome
Quality control
Indels detected by SoapSV or other softwares Filtering out indels in normal tissues Excluding indels in dbSNP/YH/1000 genomes
Potential somatic SNVs Excluding SNVs in dbSNP/YH/1000 genomes
Somatic mutations
Somatic indels
Sequencing Data ProductionNormal Sequencing analysis Total effective reads(M) Total effective yield(Mb) Effective sequence on target(Mb) Average sequencing depth on target Coverage of target region Tumor Sequencing analysis Total effective reads(M) Total effective yield(Mb) Effective sequence on target(Mb) Average sequencing depth on target Coverage of target region GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 40.08 37.04 32.16 32.7 37.62 35.96 32.1 37.15 34.95 44.38 GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 11.7 11.76 11.75 11.83 21.44 12.19 12.46 21.02 21.52 9.33
856.08 861.88 808.88 823.36 1558.41 899.66 915.95 150
9.57 1558.94 746.08 334.59 302.87 290.43 281.15 550.05 318.27 321.69 529.31 549.31 293.7 9.81 8.88 8.51 8.24 16.13 9.33 9.43 15.52 16.1 8.61
92.7% 90.8% 91.8% 92.5% 94.3% 93.2% 92.0% 94.3% 94.6% 92.2%
2930.61 2831.84 2395.21 2433.29 2864.62 2728.28 2381.05 2823.45 2644.37 3550.2 1075.9 971.22 824.17 851.02 1040.93 986.48 865.74 1024.37 995.13 1397.8 31.54 28.47 24.16 24.95 30.52 28.92 25.38 30.03 29.18 40.98
95.5% 94.8% 94.8% 95.1% 95.0% 95.2% 94.6% 95.0% 95.3% 95.5%
Schematic diagram of SNVs filtering process and gene annotation8277 somatic SNVs
7517 present in dbSNP and 1000 genome project760 (9.2%) new SNVs 346 synonymous and UTR’s SNVs 414 (54.5%)nonsynonymous and splice-site SNVs
357 predicted cancer genes 113 recorded in COSMIC 244 novel predicted cancer genes
249 random select SNV for technical validation
216 (86.7%)validated
SNV profile
SNV spectrum
SNVs location
Transcription factor network in 3 pathways
The expression alteration of MUC17Patients with varied MUC17 were represented good prognostic comparing with ones of wild-type MUC17
Cancer solution 2: Whole Genome Sequencing10 groups (blood/normal tissue+tumor tissue) sample 30X per
Background:u
Need to know the whole aspect of genomics,including intro、 promotor region to find mutations
Research:Large-scale analyses of genes in tumors have shown that the mutation load in cancer is abundant, hetero-geneous, and widespread
WorkflowDNA sample prepration Library construction HiSeq 2000 sequencing Alignment Basic bioinformatics analysis Advanced bioinformatics analysis
Short InDel calling
SNV calling SNV annotation
CNV calling
SV calling
InDel annotation
CNV annotation
SV annotationPersonalized bioinformatics analysis
Demographic analysis
Selection
Others
Others
Mutations Summary
Cancer solution 3: cell line Introduction: Human immortal cancer cell lines--an accessible, easily usable set of biological models
Advantage:1.give out very clear pattern about what happened in that cell line.2.build a systematic characterization of the genetics and genomics 3.High-accuracy SV,CNV, information/clear pattern 21
…… 此处隐藏:2925字,全部文档内容请下载后查看。喜欢就下载吧 ……