Abstract The MediaMill TRECVID 2005 Semantic Video Search En

时间:2026-01-23

UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.

TheMediaMillTRECVID2005SemanticVideoSearchEngine

C.G.M.Snoek,J.C.vanGemert,J.M.Geusebroek,B.Huurnink,D.C.Koelma,G.P.Nguyen,

O.deRooij,F.J.Seinstra,A.W.M.Smeulders,C.J.Veenman,M.Worring

IntelligentSystemsLabAmsterdam,UniversityofAmsterdam

Kruislaan403,1098SJAmsterdam,TheNetherlands

http://www.mediamill.nl

Abstract

InthispaperwedescribeourTRECVID2005experiments.TheUvA-MediaMillteamparticipatedinfourtasks.Forthedetectionofcamerawork(runid:Aweinvestigatethebene tofusingatessellationofdetectorsincombinationwithsupervisedlearningoverastandardapproachusingglobalimageinforma-tion.Experimentsindicatethataverageprecisionresultsincreasedrastically,especiallyforpan(+51%)andtilt(+28%).Forcon-ceptdetectionweproposeagenericapproachusingoursemanticpath nder.Mostimportantnoveltycomparedtolastyearssys-temistheimprovedvisualanalysisusingproto-conceptsbasedonWiccestfeatures.Inaddition,thepathselectionmechanismwasextended.Basedonthesemanticpath nderarchitecturewearecurrentlyabletodetectanunprecedentedlexiconof101semanticconceptsinagenericfashion.Weperformedalargesetofexper-iments(runid:BvA).Theresultsshowthatanoptimalstrategyforgenericmultimediaanalysisisonethatlearnsfromthetrain-ingsetonaper-conceptbasiswhichtactictofollow.Experimentsalsoindicatethatourvisualanalysisapproachishighlypromis-ing.Thelexiconof101semanticconceptsformsthebasisforoursearchexperiments(runid:BWeparticipatedinau-tomatic,manual(usingonlyvisualinformation),andinteractivesearch.Thelexicon-drivenretrievalparadigmaidssubstantiallyinallsearchtasks.Whencoupledwithinteraction,exploitingseveralnovelbrowsingschemesofoursemanticvideosearchen-gine,resultsareexcellent.Weobtainatop-3resultfor19outof24searchtopics.Inaddition,weobtainthehighestmeanaverageprecisionofallsearchparticipants.WeexploitedthetechnologydevelopedfortheabovetaskstoexploretheBBCrushes.Mostintriguingresultisthatfromthelexiconof101visual-onlymod-elstrainedfornewsdata25conceptsperformreasonablywellonBBCdataalso.

thevideosoriginatefromnon-Englishspeakingcountries,suchasChinaorTheNetherlands,queryingthecontentbe-comesevenharderasautomaticspeechrecognitionresultsaremuchpoorer.Forvideosfromthesesources,anad-ditionalvisualanalysispotentiallyyieldsmorerobustness.Fore ectivevideoretrievalthereisaneedformultimediaanalysis;inwhichtextretrievalisanimportantfactor,butnotthedecisiveelement.Weadvocatethattheidealmul-timediaretrievalsystemshould rstlearnalargelexiconofconcepts,basedonmultimediaanalysis,tobeusedfortheinitialsearch.Then,theidealsystemshouldemploysimi-larityandinteractiontore nethesearchuntilsatisfaction.Weproposeamultimediaretrievalparadigmbuiltonthreeprinciples:learningofalexiconofsemanticconcepts,multimediadatasimilarity,anduserinteraction.Withintheproposedparadigm,weexplorethecombinationofquery-by-concept,query-by-similarity,andinteractive lteringus-ingadvancedvisualizationsoftheMediaMillsemanticvideosearchengine.Todemonstratethee ectivenessofourmul-timediaretrievalparadigm,severalcomponentsareevalu-atedwithinthe2005NISTTRECVIDvideoretrievalbench-mark[16].

Theorganizationofthispaperisasfollows.First,wedis-cussourgenerallearningarchitectureanddatapreparationsteps.Oursystemarchitectureforgenericsemanticindex-ingispresentedinSection3.WedescribeourapproachforcameraworkindexinginSection4.Ourmultimediare-trievalparadigmispresentedinSection5.OurexplorativeworkonBBCrushesisaddressedinSection6.

2Preliminaries

1Introduction

TheMediaMillsemanticvideosearchengineexploitsacom-monarchitecturewithastandardizedinput-outputmodeltoallowforsemanticintegration.TheconventionstodescribethemodularsystemarchitectureareindicatedinFig.1.

Despitetheemergenceofcommercialvideosearchengines,suchasGoogle[9]andBlinkx[3],multimediaretrievalisbynomeansasolvedproblem.Infact,presentdayvideosearchenginesrelymainlyontext-intheformofclosedcaptions[9]ortranscribedspeech[3]-forretrieval.Thisre-sultsindisappointingperformancewhenthevisualcontentisnotre ectedintheassociatedtext.Inaddition,when

2.1GeneralLearningArchitecture

Weperceiveofvideoindexingasapatternrecognitionprob-lem.We rstneedtosegmentavideo.Weoptforcam-erashots[18],indicatedbyi,followingthestandardinTRECVIDevaluations.Givenpatternx,partofashot,

…… 此处隐藏:2409字,全部文档内容请下载后查看。喜欢就下载吧 ……
Abstract The MediaMill TRECVID 2005 Semantic Video Search En.doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:4.9 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:19元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219