Abstract The MediaMill TRECVID 2005 Semantic Video Search En(6)

时间:2026-01-23

UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.

algorithm[25].Tosegmenttheauditorylayout,periodsofspeechandsilencearedetectedbasedontheprovidedau-tomaticspeechrecognitionresults.Weobtainavoice-overdetectorbycombiningthespeechsegmentationwiththecamerashotsegmentation[31].Thesetoflayoutfeaturesisthusgivenby:L={shotlength,overlayedtext,silence,voice-over}.

AsconcernsthecontentC,afrontalfacedetector[27]isappliedtodetectpeople.Wecountthenumberoffaces,andforeachfaceitslocationisderived[31].Inaddition,wemeasuretheaverageamountofobjectmotioninacam-erashot[29].Basedonprovidedspeakeridenti cationweidentifyeachofthethreemostfrequentspeakers.Eachcamerashotischeckedforpresenceofspeechfromoneofthethree[31].Wealsoexploittheprovidednamedentityrecognition.Thesetofcontentfeaturesisthusgivenby:C={faces,facelocation,objectmotion,frequentspeaker,voicenamedentity}.

ForcaptureT,wecomputethecameradistancefromthesizeofdetectedfaces[27,31].Itisunde nedwhennofaceisdetected.Inadditiontocameradistance,severaltypesofcameraworkaredetected[2],e.g.pan,tilt,zoom,andsoon.Finally,forcapturewealsoestimatetheamountofcameramotion[2].Thesetofcapturefeaturesisthusgivenby:T={cameradistance,camerawork,cameramotion}.ThecontextSservestoenhanceorreducethecorrelationbetweensemanticconcepts.Detectionofvegetationcanaidinthedetectionofaforestforexample.Likewise,theco-occurrenceofaspaceshuttleandabicycleinoneshotisimprobable.Astheperformanceofsemanticconceptde-tectorsisunknownandlikelytovarybetweenconcepts,weexploititerationtoaddthemtothecontext.Therationalehereistoaddconceptsthatarerelativelyeasytodetect rst.Theyaidindetectionperformancebyincreasingthenumberoftruepositivesorreducingthenumberoffalsepositives.Topreventbiasfromdomainknowledge,weusetheperformanceonvalidationsetDofallconceptsfromΛSinthecontentanalysisstepastheorderingforthecontext.Toassigndetectionresultsforthe rstandleastdi cultconcept,werankallshotresultsonp i).Thisrank-i(ω1|m

ingisthenexploitedtocategorizeresultsforω1intooneof velevels.Thebasicsetofcontextfeaturesisthusgivenby:S={contentanalysisstepω1}.

Theconcatenationof{L,C,T,S}forshotiyieldsstylevector si.Thisvectorformstheinputforaniterativeclassi- er[31]thattrainsastylemodelforeachconceptinlexiconΛS.WeclassifyallωinΛSagaininthestyleanalysisstep.Weuse3-foldcrossvalidationwith3repetitionsontrain-ingsetBtooptimizeparametersettingsinthisanalysisstep.Weusetheresultingprobabilityasoutputforconceptdetectioninthestyleanalysisstep.

Figure7:Featureextractionandclassi cationinthecontextanal-ysisstep,specialcaseofFig.2.

BoththecontentanalysisstepandthestyleanalysisstepyieldaprobabilityforeachshotiandallconceptsωinΛS.Theprobabilityindicateswhetheraconceptispresent.Wefusethesesemanticfeaturesofananalysisstepforashotiintoacontextvector,seeFig.7.

Weconsiderthreepathsinthecontextanalysisstep.The rstpathstemsdirectlyfromthecontentanalysisstep.We

i.fusethe101p i)conceptscoresintocontextvectordi(ω|m

Thesecondpathstemsfromthestyleanalysisstepwherewefusethe101p si)scoresintocontextvectorp i.Thethirdi(ω|

pathselectsthebestperformeronvalidationsetDfromeithercontentanalysissteporstyleanalysisstep.Thesebestperformersarefusedincontextvector bi.

Fromthesethreevectorswelearnrelationsbetweencon-ceptsautomatically.Tothatendthevectorsserveastheinputforasupervisedlearningmodule,whichassociatesacontextualprobabilityp ci)toashotiforallωinΛS,i(ω|

i,pwhere ci∈{d i, bi}.Tooptimizeparametersettings,we

use3-foldcrossvalidationwith3repetitionsontheprevi-ouslyunuseddatafromtrainingsetC.

Theoutputofthecontextanalysisstepisalsotheoutputoftheentiresemanticpath nderonvideodocuments.Onthewaywehaveincludedinthesemanticpath nder,there-sultsoftheanalysisonrawdata,factsderivedfromproduc-tionbytheuseofstylefeatures,andacontextperspectiveoftheauthor’sintentbyusingsemanticfeatures.Foreachconceptweobtainseveralprobabilitiesbasedon(partial)content,style,andcontext.Weselectfromallpossibilitiestheonethatmaximizesaverageprecisionbasedonperfor-manceonvalidationsetD.Thesemanticpath nderpro-videsuswiththeopportunitytodecidewhetheraone-shotanalysisstepisbestfortheconceptonlyconcentratingon(visual)content,oratwo-analysisstepclassi erincreasingdiscriminatorypowerbyaddingproductionstyletocontent,orthataconceptpro tsmostfromaconsecutiveanalysisoncontent,style,andcontextlevel.

3.4Experiments

3.3ContextAnalysisStep

Thecontextanalysisstepaddscontexttoourinterpretationofthevideo.Ourultimateaimisthereconstructionoftheauthor’sintentbyconsideringdetectedconceptsincontext.

Wetraversedtheentiresemanticpath nderforall101con-cepts.Theaverageprecisionperformanceoftheseman-ticpath nderanditssub-systems,onvalidationsetD,areshowninFig.8.

Weevaluatedforeachconceptfouranalysisstrategiesinthecontentanalysisstep:text-only,visual-only,earlyfu-

…… 此处隐藏:2926字,全部文档内容请下载后查看。喜欢就下载吧 ……
Abstract The MediaMill TRECVID 2005 Semantic Video Search En(6).doc 将本文的Word文档下载到电脑

精彩图片

热门精选

大家正在看

× 游客快捷下载通道(下载后可以自由复制和排版)

限时特价:4.9 元/份 原价:20元

支付方式:

开通VIP包月会员 特价:19元/月

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219