Abstract The MediaMill TRECVID 2005 Semantic Video Search En(2)

时间：2026-01-23

UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.

Figure1:Data owconventionsasusedinthispaper.Di erentarrowsindicatedi erenceindata ows.

theaimistodetectanindexωfromshotiusingproba-bilitypi(ω|xi).Weexploitsupervisedlearningtolearntherelationbetweenωandxi.Thetrainingdataofthemulti-mediaarchive,togetherwithlabeledsamples,areforlearn-ingclassi ers.Theotherdata,thetestdata,aresetasidefortesting.ThegeneralarchitectureforsupervisedlearningintheMediaMillsemanticvideosearchenginearchitectureisillustratedinFig.2.

Wecanchoosefromalargevarietyofsupervisedmachinelearningapproachestoobtainpi(ω|xi).Forourpurpose,themethodofchoiceshouldbecapableofhandlingvideodocuments.Tothatend,ideallyitmustlearnfromalimitednumberofexamples,itmusthandleunbalanceddata,anditshouldaccountforunknownorerroneouslydetecteddata.Insuchheavydemands,theSupportVectorMachine(SVM)framework[35,4]hasproventobeasolidchoice[1,29].TheusualSVMmethodprovidesamarginintheresult.WepreferPlatt’sconversionmethod[19]toachieveaposteriorprobabilityoftheresult.SVMclassi ersthustrainedforω,resultinanestimatepi(ω|xi, q),where qareparametersoftheSVMyettobeoptimized.

Thein uenceoftheSVMparametersonvideoindexingissigni cant[14].Weobtaingoodparametersettingsforaclassi er,byusinganiterativesearchonalargenumberofSVMparametercombinations.Wemeasureaveragepreci-sionperformanceofallparametercombinationsandselectthecombinationthatyieldsthebestperformance, q .Hereweuse3-foldcrossvalidation[11]with3repetitionstopre-ventover ttingofparameters.Theresultoftheparametersearchover qistheimprovedmodelp q ).Inthei(ω|xi, followingwedrop q whereobvious.

Figure2:Generalarchitectureforsupervisedlearningininthe

MediaMillsemanticvideosearchengine,usingtheconventionsofFig.1.

(321),car(192),charts(52),crowd(270),desert(82), re(67),US- ag(98),maps(44),mountain(41),road(143),sky(291),smoke(64),snow(24),vegetation(242),water(108),wherethenumberinbracketsindicatesthenumberofannotationsamplesofthatconcept.WeagainusedtheTRECVID2005commonannotatione ortasabasisforselectingrelevantshotscontainingtheproto-concepts.Inthoseshots,weannotatedrectangularregionswheretheproto-conceptisvisibleforatleast20frames.

Wesplitthetrainingdataaprioriintofournon-overlappingtrainingandvalidationsetstopreventover t-tingofclassi ers.TrainingsetsA,B,andCcontain30%percentofthe2005trainingdata,validationsetDcontainstheremaining10%.WeassignallshotsinthetrainingsetrandomlytoeithersetA,B,C,orD.

3SemanticPath nderIndexing

2.2DataPreparation

Supervisedlearningrequireslabeledexamples.Inpart,werelyontheprovidedgroundtruthoftheTRECVID2005common

annotatione ort[36].Itisextendedmanuallytoarriveatanincomplete,butreliablegroundtruthforanunprecedentedamountof101semanticconceptsinlexiconΛS.Inaddition,wemanuallylabeledasubstantialpartofthetrainingsetwithrespecttodominanttypeofcamerawork,i.e.pan,tilt,and/orzoom,ifpresent.

Inordertorecognizeconceptsbasedonlow-levelvisualanalysis,weannotated15di erentproto-concepts:building

Thecentralassumptioninoursemanticindexingarchitec-tureisthatanybroadcastvideoistheresultofanauthor-ingprocess.Whenwewanttoextractsemanticsfromadigitalbroadcastvideothisauthoringprocessneedstobereversed.Forauthoring-drivenanalysisweproposedthesemanticpath nder[30].Thesemanticpath nderiscom-posedofthreeanalysissteps.Itfollowsthereverseauthor-ingprocess.Eachanalysisstepinthepathdetectsseman-ticconcepts.Inaddition,onecanexploittheoutputofananalysisstepinthepathastheinputforthenextone.Thesemanticpath nderstartsinthecontentanalysisstep.Inthisanalysisstep,wefollowadata-drivenapproachofin-dexingsemantics.Thestyleanalysisstepisthesecondanal-ysisstep.Herewetackletheindexingproblembyviewingavideofromtheperspectiveofproduction.Thisanalysisstepaidsespeciallyinindexingofrichsemantics.Finally,toenhancetheindexesfurther,inthecontextanalysisstep,weviewsemanticsincontext.Onewouldexpectthatsomeconcepts,likevegetation,havetheiremphasisoncontentwherethestyle(ofthecameraworkthatis)andcontext(of

…… 此处隐藏：2132字，全部文档内容请下载后查看。喜欢就下载吧 ……

Abstract The MediaMill TRECVID 2005 Semantic Video Search En(2).doc 将本文的Word文档下载到电脑

下载这篇word文档