Abstract The MediaMill TRECVID 2005 Semantic Video Search En(9)

时间:2025-05-05

UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.

Table2:Validationsetaverageprecisionperformancefor3typesofcameraworkusingseveralversionsofourcameraworkdetector.

Pan

TiltZoomMAPLateFusion

0.8620.7860.8620.837LateFusion+SelectedContext0.8590.7520.8660.826LateFusion+Context0.8560.6560.8560.789EarlyFusion0.7030.5580.7830.681Global

0.5690.6130.8130.665Global+Context

0.5910.5620.7920.648EarlyFusion+Context

0.616

0.461

0.765

0.614

Theresultsofourvisual-runre ecttheimportanceofvi-sualanalysis.Forfourconcepts(explosion,US ag,build-ing,car)weoutperformthepath ndersystem.Thisim-provementmightbeattributedtotheuseofimprovedvisualfeaturesandtothefactthatweusetheentiretrainingsetinSVM-training.However,sincethevisualanalysisstepisembeddedinthepath ndersystem,thevisualanalysisshouldneverperformbetter.Thereforewebelievethatre-sultsofthepath ndersystemwillimprovewhenthenewfeaturesareincluded.

4CameraWork

Forthedetectionofcameraworkwestartwithanexist-ingimplementationbasedonspatiotemporalimageanaly-sis[34,12].Givenasetofglobalintensityimagesfromshoti,thealgorithm rstextractspatiotemporalimages.Ontheseimagesadirectionanalysisisappliedtoestimatedi-rectionparameters.Theseparametersformtheinputforasupervisedlearningmoduletolearnthreetypesofcamerawork.Wemodi edthealgorithminvariousways.Wesu-perimposedatessellationof8regionsoneachinputframetodecreasethee ectoflocaldisturbances.Parametersthusobtainedareexploitedusinganearlyfusionandlatefusionapproach.Inadditionweexploredwhetherthe101conceptscoresobtainedfromthesemanticpath nderaidindetec-tionofcamerawork.

4.1Experiments

ExperimentsonvalidationsetDindicatethataveragepre-cisionresultsincreasedrastically,especiallyforpan(+51%)andtilt(+28%),seeTable2.Thebestapproachisalatefu-sionschemewithouttheusageofcontext.Relativetootherparticipantsweperformedquitegoodinprecision,butquitebadintermsofrecall.Resultsindicatethatthebasede-tectoristooconservative.However,italsoshowsthatanyglobalimagebasedcameraworkdetectorhasthepotentialtopro tfromatessellationofregion-baseddetectors.

5Lexicon-drivenRetrieval

Weproposealexicon-drivenretrievalparadigmtoequipuserswithsemanticaccesstomultimediaarchives.The

aimistoretrievefromamultimediaarchiveS,whichiscomposedofnuniqueshots{s1,s2,...,sn},thebestpossi-bleanswersetinresponsetoauserinformationneed.Tothatend,weusethe101conceptsinthelexiconaswellasthe3typesofcameraworkforourautomatic,manual,andinteractivesearchsystems.

5.1AutomaticSearch

Ourautomaticsearchengineusesonlytopictextasin-put[10],aswepostulatethatitisunreasonabletoexpectausertoprovideavideosearchsystemwithexamplevideosinarealworldscenario.Werelypurelyontextandthelex-iconof101semanticconceptdetectorsthatwehavedevel-opedusingthesemanticpath nder,seeSection3,tosearchthroughthevideocollection.Wedevelopedoursearchsys-temusingthevideodata,topics,andgroundtruthsfromthe2003and2004TRECVIDevaluationsasatrainingset.5.1.1

IndexingComponents

OurautomaticsearchsystemincorporatesregularTFIDF-basedindicesforstandardretrievalusingthebfx-bfx[24]formula,LatentSemanticIndexing[5]fortextretrievalwithimplicitqueryexpansion,and101thedi erentseman-ticconceptindicesforquery-by-concept.Eachindexwasmatchedtooneormoreconcepts,orsynsetsintheWord-Net[13]lexicaldatabaseonanindividualbasis,accordingtowhethertheconceptdirectlymatchesthecontentofthedetectors.Forexample,thedetectorfortheconceptbaseball ndsshotsofbaseballgames,andtheseshotsinvariablyin-cludebaseballplayers,baseballequipment,andabaseballdiamond,sotheseconceptsarealsomatched.AdditionalsynsetsareaddedtoWordNetforsemanticconceptsthatdonothaveadirectWordNetequivalent.5.1.2

AutomaticQueryInterfaceSelection

Weperformthestandardstoppingandstemmingproce-duresonthetopictext(usingtheSMARTstoplist[23]withtheadditionofthewords ndandshots;andthePorterstemmingalgorithm[20]respectively).Inaddition,weper-formpart-of-speechtaggingandchunkingusingtheTree-Tagger[26].Thisgrammaticalinformationisusedtoiden-tifytwodi erentquerycategorizations:complexvs.simplequeriesandgeneralvs.speci cqueries.Anytopiccontain-ingmorethanonenounchunkisclassi edascomplex,asitreferstomorethanoneobject,whilerequestscontainingonlyasinglenounchunkareclassi edassimple.Ifare-questcontainsaname(apropernoun)itreferstoaspeci cobject,ratherthanageneralcategory,sowecategorizeallrequestscontainingpropernounsasspeci crequests,andallothersasgeneralrequests.

Subsequently,weextracttheWordNetwordsinthetopictextthroughdictionarylookupofnounchunksandnouns.WeidentifythecorrectsynsetforWordNetwordswithmultiplemeaningsthroughdisambiguation.Weevaluated

…… 此处隐藏:2717字,全部文档内容请下载后查看。喜欢就下载吧 ……
Abstract The MediaMill TRECVID 2005 Semantic Video Search En(9).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:7 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:29元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219