BIOINFORMATICS Algorithms

时间:2026-01-14

Vol. 21 Suppl. 2 2005, pages ii224–ii229 doi:10.1093/bioinformatics/bti1137 Growing Bayesian network models of gene networks from seed genes

BIOINFORMATICS

Algorithms

Vol.21Suppl.22005,pagesii224–ii229

doi:10.1093/bioinformatics/bti1137

GrowingBayesiannetworkmodelsofgenenetworksfromseedgenes

J.M.Peña1, ,J.Björkegren2andJ.Tegnér1,2

Biology,DepartmentofPhysicsandMeasurementTechnology,LinköpingUniversity,

58183Linköping,Swedenand2CenterforGenomicsandBioinformatics,KarolinskaInstitutet,17177Stockholm,Sweden

1Computational

ABSTRACT

Motivation:Forthelastfewyears,Bayesiannetworks(BNs)havereceivedincreasingattentionfromthecomputationalbiologycom-munityasmodelsofgenenetworks,thoughlearningthemfromgene-expressiondataisproblematic.Mostgene-expressiondata-basescontainmeasurementsforthousandsofgenes,buttheexistingalgorithmsforlearningBNsfromdatadonotscaletosuchhigh-dimensionaldatabases.Thismeansthattheuserhastodecideinadvancewhichgenesareincludedinthelearningprocess,typicallynomorethanafewhundreds,andwhichgenesareexcludedfromit.Thisisnotatrivialdecision.Weproposeanalternativeapproachtoovercomethisproblem.

Results:WeproposeanewalgorithmforlearningBNmodelsofgenenetworksfromgene-expressiondata.OuralgorithmreceivesaseedgeneSandapositiveintegerRfromtheuser,andreturnsaBNforthegenesthatdependonSsuchthatlessthanRothergenesmediatethedependency.OuralgorithmgrowstheBN,whichinitiallyonlycontainsS,byrepeatingthefollowingstepR+1timesand,then,pruningsomegenes; ndtheparentsandchildrenofallthegenesintheBNandaddthemtoit.Intuitively,ouralgorithmprovidestheuserwithawindowofradiusRaroundStolookattheBNmodelofagenenetworkwithouthavingtoexcludeanygeneinadvance.Weprovethatouralgorithmiscorrectunderthefaithfulnessassumption.Weevaluateouralgorithmonsimulatedandbiologicaldata(Rosettacompendium)withsatisfactoryresults.Contact:jmp@ifm.liu.se

1INTRODUCTION

Muchofacell’scomplexbehaviorcanbeexplainedthroughtheconcertedactivityofgenesandgeneproducts.Thisconcertedactivityistypicallyrepresentedasanetworkofinteractinggenes.Identifyingthisgenenetworkiscrucialforunderstandingthebehaviorofthecellwhich,inturn,canleadtobetterdiagnosisandtreatmentofdiseases.Forthelastfewyears,Bayesiannetworks(BNs)(Neapolitan,2003;Pearl,1988)havereceivedincreasingattentionfromthecomputationalbiologycommunityasmodelsofgenenetworks(Badea,2003;BernardandHartemink,2005;Friedmanetal.,2000;Harteminketal.,2002;Ottetal.,2004;Pe’eretal.,2001;Peña,2004).ABNmodelofagenenetworkrepresentsaprobabilitydis-tributionforthegenesinthenetwork.TheBNminimizesthenumber

To

ofparametersneededtospecifytheprobabilitydistributionbytak-ingadvantageoftheconditionalindependenciesbetweenthegenes.Theseconditionalindependenciesareencodedinanacyclicdirectedgraph(DAG)tohelpvisualizationandreasoning.LearningBNmodelsofgenenetworksfromgene-expressiondataisproblematic;mostgene-expressiondatabasescontainmeasurementsforthousandsofgenes(Hughesetal.,2000;Spellmanetal.,1998),buttheexistingalgorithmsforlearningBNsfromdatadonotscaletosuchhigh-dimensionaldatabases(Friedmanetal.,1999;Tsamardinosetal.,2003).Thisimpliesthatinthereferencescitedabove,forinstance,theauthorshavetodecideinadvancewhichgenesareincludedinthelearningprocess(inallthecases<1000)andwhichgenesareexcludedfromit.Thisisnotatrivialdecision.Weproposeanalternativeapproachtoovercomethisproblem.

Inthispaper,weproposeanewalgorithmforlearningBNmodelsofgenenetworksfromgene-expressiondata.OuralgorithmreceivesaseedgeneSandapositiveintegerRfromtheuser,andreturnsaBNforthegenesthatdependonSsuchthatlessthanRothergenesmediatethedependency.OuralgorithmgrowstheBN,whichinitiallyonlycontainsS,byrepeatingthefollowingstepR+1timesand,then,pruningsomegenes; ndtheparentsandchildrenofallthegenesintheBNandaddthemtoit.Intuitively,ouralgorithmprovidestheuserwithawindowofradiusRaroundStolookattheBNmodelofagenenetworkwithouthavingtoexcludeanygeneinadvance.

Therestofthepaperisorganizedasfollows.InSection2,wereviewBNs.InSections3,wedescribeournewalgorithm.InSection4,weevaluateouralgorithmonsimulatedandbiologicaldata[Rosettacompendium(Hughesetal.,2000)]withsatisfactoryresults.Finally,inSection5,wediscussrelatedworksandpossibleextensionstoouralgorithm.

2BAYESIANNETWORKS

whomcorrespondenceshouldbeaddressed.

Thefollowingde nitionsandtheoremcanbefoundinmostbooksonBayesiannetworks(Neapolitan,2003;Pearl,1988).Weassumethatthereaderisfamiliarwithgraphandprobabilitytheories.Weabbreviateifandonlyifbyiff,suchthatbystandwithrespecttobywrt.

LetUdenoteanon-empty nitesetofrandomvariables.ABNforUisapair(G,θ),whereGisaDAGwhosenodescorrespondtotherandomvariablesinU,andθareparametersspecifyingacondi-tionalprobabilitydistributionforeachnodeXgivenitsparentsinG,p(X|PaG(X)).ABN(G,θ)representsaprobabilitydistributionfor

ii224

©TheAuthor2005.PublishedbyOxfordUniversityPress.Allrightsreserved.ForPermissions,pleaseemail:journals.permissions@http://www.77cn.com.cn

Vol. 21 Suppl. 2 2005, pages ii224–ii229 doi:10.1093/bioinformatics/bti1137 Growing Bayesian network models of gene networks from seed genes

GrowingBayesiannetworkmodelsofgenenetworks

U,p(U),throughthefactorizationp(U)=X∈Up(X|PaG(X)).Hereafter,PCG(X)denotestheparentsandchildrenofXinG,andNDG(X)thenon-descendantsofXinG.

AnyprobabilitydistributionpthatcanberepresentedbyaBNwithDAGG,i.e.byaparameterizationθofG,satis escertainconditionalindependenciesbetweentherandomvariablesinUthatcanbereadfromGviathed-separationcriterion,i.e.ifd-sepG(X,Y|Z),thenX⊥⊥pY|ZwithX,YandZthreemutuallydisjointsubsetsofU.Thestatementd-sepG(X,Y|Z)istruewhenforeveryundirectedpathinGbetweenanodeinXandanodeinYthereexistsanodeWinthepathsteither(1)WdoesnothavetwoparentsinthepathandW∈Z,or(2)WhastwoparentsinthepathandneitherWnoranyofitsdescendantsinGisinZ.AprobabilitydistributionpissaidtobefaithfultoaDAGGwhenX⊥⊥pY|Ziffd-sepG(X,Y| …… 此处隐藏:28582字,全部文档内容请下载后查看。喜欢就下载吧 ……

BIOINFORMATICS Algorithms.doc 将本文的Word文档下载到电脑

    精彩图片

    热门精选

    大家正在看

    × 游客快捷下载通道(下载后可以自由复制和排版)

    限时特价:4.9 元/份 原价:20元

    支付方式:

    开通VIP包月会员 特价:19元/月

    注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
    微信:fanwen365 QQ:370150219