Abstract The MediaMill TRECVID 2005 Semantic Video Search En
时间:2026-01-23
时间:2026-01-23
UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.
TheMediaMillTRECVID2005SemanticVideoSearchEngine
C.G.M.Snoek,J.C.vanGemert,J.M.Geusebroek,B.Huurnink,D.C.Koelma,G.P.Nguyen,
O.deRooij,F.J.Seinstra,A.W.M.Smeulders,C.J.Veenman,M.Worring
IntelligentSystemsLabAmsterdam,UniversityofAmsterdam
Kruislaan403,1098SJAmsterdam,TheNetherlands
http://www.mediamill.nl
Abstract
InthispaperwedescribeourTRECVID2005experiments.TheUvA-MediaMillteamparticipatedinfourtasks.Forthedetectionofcamerawork(runid:Aweinvestigatethebene tofusingatessellationofdetectorsincombinationwithsupervisedlearningoverastandardapproachusingglobalimageinforma-tion.Experimentsindicatethataverageprecisionresultsincreasedrastically,especiallyforpan(+51%)andtilt(+28%).Forcon-ceptdetectionweproposeagenericapproachusingoursemanticpath nder.Mostimportantnoveltycomparedtolastyearssys-temistheimprovedvisualanalysisusingproto-conceptsbasedonWiccestfeatures.Inaddition,thepathselectionmechanismwasextended.Basedonthesemanticpath nderarchitecturewearecurrentlyabletodetectanunprecedentedlexiconof101semanticconceptsinagenericfashion.Weperformedalargesetofexper-iments(runid:BvA).Theresultsshowthatanoptimalstrategyforgenericmultimediaanalysisisonethatlearnsfromthetrain-ingsetonaper-conceptbasiswhichtactictofollow.Experimentsalsoindicatethatourvisualanalysisapproachishighlypromis-ing.Thelexiconof101semanticconceptsformsthebasisforoursearchexperiments(runid:BWeparticipatedinau-tomatic,manual(usingonlyvisualinformation),andinteractivesearch.Thelexicon-drivenretrievalparadigmaidssubstantiallyinallsearchtasks.Whencoupledwithinteraction,exploitingseveralnovelbrowsingschemesofoursemanticvideosearchen-gine,resultsareexcellent.Weobtainatop-3resultfor19outof24searchtopics.Inaddition,weobtainthehighestmeanaverageprecisionofallsearchparticipants.WeexploitedthetechnologydevelopedfortheabovetaskstoexploretheBBCrushes.Mostintriguingresultisthatfromthelexiconof101visual-onlymod-elstrainedfornewsdata25conceptsperformreasonablywellonBBCdataalso.
thevideosoriginatefromnon-Englishspeakingcountries,suchasChinaorTheNetherlands,queryingthecontentbe-comesevenharderasautomaticspeechrecognitionresultsaremuchpoorer.Forvideosfromthesesources,anad-ditionalvisualanalysispotentiallyyieldsmorerobustness.Fore ectivevideoretrievalthereisaneedformultimediaanalysis;inwhichtextretrievalisanimportantfactor,butnotthedecisiveelement.Weadvocatethattheidealmul-timediaretrievalsystemshould rstlearnalargelexiconofconcepts,basedonmultimediaanalysis,tobeusedfortheinitialsearch.Then,theidealsystemshouldemploysimi-larityandinteractiontore nethesearchuntilsatisfaction.Weproposeamultimediaretrievalparadigmbuiltonthreeprinciples:learningofalexiconofsemanticconcepts,multimediadatasimilarity,anduserinteraction.Withintheproposedparadigm,weexplorethecombinationofquery-by-concept,query-by-similarity,andinteractive lteringus-ingadvancedvisualizationsoftheMediaMillsemanticvideosearchengine.Todemonstratethee ectivenessofourmul-timediaretrievalparadigm,severalcomponentsareevalu-atedwithinthe2005NISTTRECVIDvideoretrievalbench-mark[16].
Theorganizationofthispaperisasfollows.First,wedis-cussourgenerallearningarchitectureanddatapreparationsteps.Oursystemarchitectureforgenericsemanticindex-ingispresentedinSection3.WedescribeourapproachforcameraworkindexinginSection4.Ourmultimediare-trievalparadigmispresentedinSection5.OurexplorativeworkonBBCrushesisaddressedinSection6.
2Preliminaries
1Introduction
TheMediaMillsemanticvideosearchengineexploitsacom-monarchitecturewithastandardizedinput-outputmodeltoallowforsemanticintegration.TheconventionstodescribethemodularsystemarchitectureareindicatedinFig.1.
Despitetheemergenceofcommercialvideosearchengines,suchasGoogle[9]andBlinkx[3],multimediaretrievalisbynomeansasolvedproblem.Infact,presentdayvideosearchenginesrelymainlyontext-intheformofclosedcaptions[9]ortranscribedspeech[3]-forretrieval.Thisre-sultsindisappointingperformancewhenthevisualcontentisnotre ectedintheassociatedtext.Inaddition,when
2.1GeneralLearningArchitecture
Weperceiveofvideoindexingasapatternrecognitionprob-lem.We rstneedtosegmentavideo.Weoptforcam-erashots[18],indicatedbyi,followingthestandardinTRECVIDevaluations.Givenpatternx,partofashot,
…… 此处隐藏:2409字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:自定义动画---陀螺旋
下一篇:刑法学案例分析题1