Abstract The MediaMill TRECVID 2005 Semantic Video Search En(6)
时间:2026-01-23
时间:2026-01-23
UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information.
algorithm[25].Tosegmenttheauditorylayout,periodsofspeechandsilencearedetectedbasedontheprovidedau-tomaticspeechrecognitionresults.Weobtainavoice-overdetectorbycombiningthespeechsegmentationwiththecamerashotsegmentation[31].Thesetoflayoutfeaturesisthusgivenby:L={shotlength,overlayedtext,silence,voice-over}.
AsconcernsthecontentC,afrontalfacedetector[27]isappliedtodetectpeople.Wecountthenumberoffaces,andforeachfaceitslocationisderived[31].Inaddition,wemeasuretheaverageamountofobjectmotioninacam-erashot[29].Basedonprovidedspeakeridenti cationweidentifyeachofthethreemostfrequentspeakers.Eachcamerashotischeckedforpresenceofspeechfromoneofthethree[31].Wealsoexploittheprovidednamedentityrecognition.Thesetofcontentfeaturesisthusgivenby:C={faces,facelocation,objectmotion,frequentspeaker,voicenamedentity}.
ForcaptureT,wecomputethecameradistancefromthesizeofdetectedfaces[27,31].Itisunde nedwhennofaceisdetected.Inadditiontocameradistance,severaltypesofcameraworkaredetected[2],e.g.pan,tilt,zoom,andsoon.Finally,forcapturewealsoestimatetheamountofcameramotion[2].Thesetofcapturefeaturesisthusgivenby:T={cameradistance,camerawork,cameramotion}.ThecontextSservestoenhanceorreducethecorrelationbetweensemanticconcepts.Detectionofvegetationcanaidinthedetectionofaforestforexample.Likewise,theco-occurrenceofaspaceshuttleandabicycleinoneshotisimprobable.Astheperformanceofsemanticconceptde-tectorsisunknownandlikelytovarybetweenconcepts,weexploititerationtoaddthemtothecontext.Therationalehereistoaddconceptsthatarerelativelyeasytodetect rst.Theyaidindetectionperformancebyincreasingthenumberoftruepositivesorreducingthenumberoffalsepositives.Topreventbiasfromdomainknowledge,weusetheperformanceonvalidationsetDofallconceptsfromΛSinthecontentanalysisstepastheorderingforthecontext.Toassigndetectionresultsforthe rstandleastdi cultconcept,werankallshotresultsonp i).Thisrank-i(ω1|m
ingisthenexploitedtocategorizeresultsforω1intooneof velevels.Thebasicsetofcontextfeaturesisthusgivenby:S={contentanalysisstepω1}.
Theconcatenationof{L,C,T,S}forshotiyieldsstylevector si.Thisvectorformstheinputforaniterativeclassi- er[31]thattrainsastylemodelforeachconceptinlexiconΛS.WeclassifyallωinΛSagaininthestyleanalysisstep.Weuse3-foldcrossvalidationwith3repetitionsontrain-ingsetBtooptimizeparametersettingsinthisanalysisstep.Weusetheresultingprobabilityasoutputforconceptdetectioninthestyleanalysisstep.
Figure7:Featureextractionandclassi cationinthecontextanal-ysisstep,specialcaseofFig.2.
BoththecontentanalysisstepandthestyleanalysisstepyieldaprobabilityforeachshotiandallconceptsωinΛS.Theprobabilityindicateswhetheraconceptispresent.Wefusethesesemanticfeaturesofananalysisstepforashotiintoacontextvector,seeFig.7.
Weconsiderthreepathsinthecontextanalysisstep.The rstpathstemsdirectlyfromthecontentanalysisstep.We
i.fusethe101p i)conceptscoresintocontextvectordi(ω|m
Thesecondpathstemsfromthestyleanalysisstepwherewefusethe101p si)scoresintocontextvectorp i.Thethirdi(ω|
pathselectsthebestperformeronvalidationsetDfromeithercontentanalysissteporstyleanalysisstep.Thesebestperformersarefusedincontextvector bi.
Fromthesethreevectorswelearnrelationsbetweencon-ceptsautomatically.Tothatendthevectorsserveastheinputforasupervisedlearningmodule,whichassociatesacontextualprobabilityp ci)toashotiforallωinΛS,i(ω|
i,pwhere ci∈{d i, bi}.Tooptimizeparametersettings,we
use3-foldcrossvalidationwith3repetitionsontheprevi-ouslyunuseddatafromtrainingsetC.
Theoutputofthecontextanalysisstepisalsotheoutputoftheentiresemanticpath nderonvideodocuments.Onthewaywehaveincludedinthesemanticpath nder,there-sultsoftheanalysisonrawdata,factsderivedfromproduc-tionbytheuseofstylefeatures,andacontextperspectiveoftheauthor’sintentbyusingsemanticfeatures.Foreachconceptweobtainseveralprobabilitiesbasedon(partial)content,style,andcontext.Weselectfromallpossibilitiestheonethatmaximizesaverageprecisionbasedonperfor-manceonvalidationsetD.Thesemanticpath nderpro-videsuswiththeopportunitytodecidewhetheraone-shotanalysisstepisbestfortheconceptonlyconcentratingon(visual)content,oratwo-analysisstepclassi erincreasingdiscriminatorypowerbyaddingproductionstyletocontent,orthataconceptpro tsmostfromaconsecutiveanalysisoncontent,style,andcontextlevel.
3.4Experiments
3.3ContextAnalysisStep
Thecontextanalysisstepaddscontexttoourinterpretationofthevideo.Ourultimateaimisthereconstructionoftheauthor’sintentbyconsideringdetectedconceptsincontext.
Wetraversedtheentiresemanticpath nderforall101con-cepts.Theaverageprecisionperformanceoftheseman-ticpath nderanditssub-systems,onvalidationsetD,areshowninFig.8.
Weevaluatedforeachconceptfouranalysisstrategiesinthecontentanalysisstep:text-only,visual-only,earlyfu-
…… 此处隐藏:2926字,全部文档内容请下载后查看。喜欢就下载吧 ……上一篇:自定义动画---陀螺旋
下一篇:刑法学案例分析题1