BIOINFORMATICS Algorithms
时间:2026-01-14
时间:2026-01-14
Vol. 21 Suppl. 2 2005, pages ii224–ii229 doi:10.1093/bioinformatics/bti1137 Growing Bayesian network models of gene networks from seed genes
BIOINFORMATICS
Algorithms
Vol.21Suppl.22005,pagesii224–ii229
doi:10.1093/bioinformatics/bti1137
GrowingBayesiannetworkmodelsofgenenetworksfromseedgenes
J.M.Peña1, ,J.Björkegren2andJ.Tegnér1,2
Biology,DepartmentofPhysicsandMeasurementTechnology,LinköpingUniversity,
58183Linköping,Swedenand2CenterforGenomicsandBioinformatics,KarolinskaInstitutet,17177Stockholm,Sweden
1Computational
ABSTRACT
Motivation:Forthelastfewyears,Bayesiannetworks(BNs)havereceivedincreasingattentionfromthecomputationalbiologycom-munityasmodelsofgenenetworks,thoughlearningthemfromgene-expressiondataisproblematic.Mostgene-expressiondata-basescontainmeasurementsforthousandsofgenes,buttheexistingalgorithmsforlearningBNsfromdatadonotscaletosuchhigh-dimensionaldatabases.Thismeansthattheuserhastodecideinadvancewhichgenesareincludedinthelearningprocess,typicallynomorethanafewhundreds,andwhichgenesareexcludedfromit.Thisisnotatrivialdecision.Weproposeanalternativeapproachtoovercomethisproblem.
Results:WeproposeanewalgorithmforlearningBNmodelsofgenenetworksfromgene-expressiondata.OuralgorithmreceivesaseedgeneSandapositiveintegerRfromtheuser,andreturnsaBNforthegenesthatdependonSsuchthatlessthanRothergenesmediatethedependency.OuralgorithmgrowstheBN,whichinitiallyonlycontainsS,byrepeatingthefollowingstepR+1timesand,then,pruningsomegenes; ndtheparentsandchildrenofallthegenesintheBNandaddthemtoit.Intuitively,ouralgorithmprovidestheuserwithawindowofradiusRaroundStolookattheBNmodelofagenenetworkwithouthavingtoexcludeanygeneinadvance.Weprovethatouralgorithmiscorrectunderthefaithfulnessassumption.Weevaluateouralgorithmonsimulatedandbiologicaldata(Rosettacompendium)withsatisfactoryresults.Contact:jmp@ifm.liu.se
1INTRODUCTION
Muchofacell’scomplexbehaviorcanbeexplainedthroughtheconcertedactivityofgenesandgeneproducts.Thisconcertedactivityistypicallyrepresentedasanetworkofinteractinggenes.Identifyingthisgenenetworkiscrucialforunderstandingthebehaviorofthecellwhich,inturn,canleadtobetterdiagnosisandtreatmentofdiseases.Forthelastfewyears,Bayesiannetworks(BNs)(Neapolitan,2003;Pearl,1988)havereceivedincreasingattentionfromthecomputationalbiologycommunityasmodelsofgenenetworks(Badea,2003;BernardandHartemink,2005;Friedmanetal.,2000;Harteminketal.,2002;Ottetal.,2004;Pe’eretal.,2001;Peña,2004).ABNmodelofagenenetworkrepresentsaprobabilitydis-tributionforthegenesinthenetwork.TheBNminimizesthenumber
To
ofparametersneededtospecifytheprobabilitydistributionbytak-ingadvantageoftheconditionalindependenciesbetweenthegenes.Theseconditionalindependenciesareencodedinanacyclicdirectedgraph(DAG)tohelpvisualizationandreasoning.LearningBNmodelsofgenenetworksfromgene-expressiondataisproblematic;mostgene-expressiondatabasescontainmeasurementsforthousandsofgenes(Hughesetal.,2000;Spellmanetal.,1998),buttheexistingalgorithmsforlearningBNsfromdatadonotscaletosuchhigh-dimensionaldatabases(Friedmanetal.,1999;Tsamardinosetal.,2003).Thisimpliesthatinthereferencescitedabove,forinstance,theauthorshavetodecideinadvancewhichgenesareincludedinthelearningprocess(inallthecases<1000)andwhichgenesareexcludedfromit.Thisisnotatrivialdecision.Weproposeanalternativeapproachtoovercomethisproblem.
Inthispaper,weproposeanewalgorithmforlearningBNmodelsofgenenetworksfromgene-expressiondata.OuralgorithmreceivesaseedgeneSandapositiveintegerRfromtheuser,andreturnsaBNforthegenesthatdependonSsuchthatlessthanRothergenesmediatethedependency.OuralgorithmgrowstheBN,whichinitiallyonlycontainsS,byrepeatingthefollowingstepR+1timesand,then,pruningsomegenes; ndtheparentsandchildrenofallthegenesintheBNandaddthemtoit.Intuitively,ouralgorithmprovidestheuserwithawindowofradiusRaroundStolookattheBNmodelofagenenetworkwithouthavingtoexcludeanygeneinadvance.
Therestofthepaperisorganizedasfollows.InSection2,wereviewBNs.InSections3,wedescribeournewalgorithm.InSection4,weevaluateouralgorithmonsimulatedandbiologicaldata[Rosettacompendium(Hughesetal.,2000)]withsatisfactoryresults.Finally,inSection5,wediscussrelatedworksandpossibleextensionstoouralgorithm.
2BAYESIANNETWORKS
whomcorrespondenceshouldbeaddressed.
Thefollowingde nitionsandtheoremcanbefoundinmostbooksonBayesiannetworks(Neapolitan,2003;Pearl,1988).Weassumethatthereaderisfamiliarwithgraphandprobabilitytheories.Weabbreviateifandonlyifbyiff,suchthatbystandwithrespecttobywrt.
LetUdenoteanon-empty nitesetofrandomvariables.ABNforUisapair(G,θ),whereGisaDAGwhosenodescorrespondtotherandomvariablesinU,andθareparametersspecifyingacondi-tionalprobabilitydistributionforeachnodeXgivenitsparentsinG,p(X|PaG(X)).ABN(G,θ)representsaprobabilitydistributionfor
ii224
©TheAuthor2005.PublishedbyOxfordUniversityPress.Allrightsreserved.ForPermissions,pleaseemail:journals.permissions@http://www.77cn.com.cn
Vol. 21 Suppl. 2 2005, pages ii224–ii229 doi:10.1093/bioinformatics/bti1137 Growing Bayesian network models of gene networks from seed genes
GrowingBayesiannetworkmodelsofgenenetworks
U,p(U),throughthefactorizationp(U)=X∈Up(X|PaG(X)).Hereafter,PCG(X)denotestheparentsandchildrenofXinG,andNDG(X)thenon-descendantsofXinG.
AnyprobabilitydistributionpthatcanberepresentedbyaBNwithDAGG,i.e.byaparameterizationθofG,satis escertainconditionalindependenciesbetweentherandomvariablesinUthatcanbereadfromGviathed-separationcriterion,i.e.ifd-sepG(X,Y|Z),thenX⊥⊥pY|ZwithX,YandZthreemutuallydisjointsubsetsofU.Thestatementd-sepG(X,Y|Z)istruewhenforeveryundirectedpathinGbetweenanodeinXandanodeinYthereexistsanodeWinthepathsteither(1)WdoesnothavetwoparentsinthepathandW∈Z,or(2)WhastwoparentsinthepathandneitherWnoranyofitsdescendantsinGisinZ.AprobabilitydistributionpissaidtobefaithfultoaDAGGwhenX⊥⊥pY|Ziffd-sepG(X,Y| …… 此处隐藏:28582字,全部文档内容请下载后查看。喜欢就下载吧 ……
下一篇:预测2016年高考作文