Abstract The MediaMill TRECVID 2005 Semantic Video Search En(2)
时间:2026-01-23
时间:2026-01-23
UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.
Figure1:Data owconventionsasusedinthispaper.Di erentarrowsindicatedi erenceindata ows.
theaimistodetectanindexωfromshotiusingproba-bilitypi(ω|xi).Weexploitsupervisedlearningtolearntherelationbetweenωandxi.Thetrainingdataofthemulti-mediaarchive,togetherwithlabeledsamples,areforlearn-ingclassi ers.Theotherdata,thetestdata,aresetasidefortesting.ThegeneralarchitectureforsupervisedlearningintheMediaMillsemanticvideosearchenginearchitectureisillustratedinFig.2.
Wecanchoosefromalargevarietyofsupervisedmachinelearningapproachestoobtainpi(ω|xi).Forourpurpose,themethodofchoiceshouldbecapableofhandlingvideodocuments.Tothatend,ideallyitmustlearnfromalimitednumberofexamples,itmusthandleunbalanceddata,anditshouldaccountforunknownorerroneouslydetecteddata.Insuchheavydemands,theSupportVectorMachine(SVM)framework[35,4]hasproventobeasolidchoice[1,29].TheusualSVMmethodprovidesamarginintheresult.WepreferPlatt’sconversionmethod[19]toachieveaposteriorprobabilityoftheresult.SVMclassi ersthustrainedforω,resultinanestimatepi(ω|xi, q),where qareparametersoftheSVMyettobeoptimized.
Thein uenceoftheSVMparametersonvideoindexingissigni cant[14].Weobtaingoodparametersettingsforaclassi er,byusinganiterativesearchonalargenumberofSVMparametercombinations.Wemeasureaveragepreci-sionperformanceofallparametercombinationsandselectthecombinationthatyieldsthebestperformance, q .Hereweuse3-foldcrossvalidation[11]with3repetitionstopre-ventover ttingofparameters.Theresultoftheparametersearchover qistheimprovedmodelp q ).Inthei(ω|xi, followingwedrop q whereobvious.
Figure2:Generalarchitectureforsupervisedlearningininthe
MediaMillsemanticvideosearchengine,usingtheconventionsofFig.1.
(321),car(192),charts(52),crowd(270),desert(82), re(67),US- ag(98),maps(44),mountain(41),road(143),sky(291),smoke(64),snow(24),vegetation(242),water(108),wherethenumberinbracketsindicatesthenumberofannotationsamplesofthatconcept.WeagainusedtheTRECVID2005commonannotatione ortasabasisforselectingrelevantshotscontainingtheproto-concepts.Inthoseshots,weannotatedrectangularregionswheretheproto-conceptisvisibleforatleast20frames.
Wesplitthetrainingdataaprioriintofournon-overlappingtrainingandvalidationsetstopreventover t-tingofclassi ers.TrainingsetsA,B,andCcontain30%percentofthe2005trainingdata,validationsetDcontainstheremaining10%.WeassignallshotsinthetrainingsetrandomlytoeithersetA,B,C,orD.
3SemanticPath nderIndexing
2.2DataPreparation
Supervisedlearningrequireslabeledexamples.Inpart,werelyontheprovidedgroundtruthoftheTRECVID2005common
annotatione ort[36].Itisextendedmanuallytoarriveatanincomplete,butreliablegroundtruthforanunprecedentedamountof101semanticconceptsinlexiconΛS.Inaddition,wemanuallylabeledasubstantialpartofthetrainingsetwithrespecttodominanttypeofcamerawork,i.e.pan,tilt,and/orzoom,ifpresent.
Inordertorecognizeconceptsbasedonlow-levelvisualanalysis,weannotated15di erentproto-concepts:building
Thecentralassumptioninoursemanticindexingarchitec-tureisthatanybroadcastvideoistheresultofanauthor-ingprocess.Whenwewanttoextractsemanticsfromadigitalbroadcastvideothisauthoringprocessneedstobereversed.Forauthoring-drivenanalysisweproposedthesemanticpath nder[30].Thesemanticpath nderiscom-posedofthreeanalysissteps.Itfollowsthereverseauthor-ingprocess.Eachanalysisstepinthepathdetectsseman-ticconcepts.Inaddition,onecanexploittheoutputofananalysisstepinthepathastheinputforthenextone.Thesemanticpath nderstartsinthecontentanalysisstep.Inthisanalysisstep,wefollowadata-drivenapproachofin-dexingsemantics.Thestyleanalysisstepisthesecondanal-ysisstep.Herewetackletheindexingproblembyviewingavideofromtheperspectiveofproduction.Thisanalysisstepaidsespeciallyinindexingofrichsemantics.Finally,toenhancetheindexesfurther,inthecontextanalysisstep,weviewsemanticsincontext.Onewouldexpectthatsomeconcepts,likevegetation,havetheiremphasisoncontentwherethestyle(ofthecameraworkthatis)andcontext(of
…… 此处隐藏:2132字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:自定义动画---陀螺旋
下一篇:刑法学案例分析题1