Structure analysis of soccer video with Hidden Markov Models
时间:2025-04-20
时间:2025-04-20
In this paper, we present algorithms for parsing the structure of produced soccer programs. The problem is important in the context of a personalized video streaming and browsing system. While prior work focuses on the detection of special events such as g
STRUCTURE ANALYSIS OF SOCCER VIDEO WITH HIDDEN MARKOV MODELS
Lexing Xie, Shih-Fu Chang
Department of Electrical Engineering Columbia University, New York, NY {xlx, sfchang}@ee.columbia.edu
ABSTRACT
In this paper, we present algorithms for parsing the structure of produced soccer programs. The problem is important in the context of a personalized video streaming and browsing system. While prior work focuses on the detection of special events such as goals or corner kicks, this paper is concerned with generic structural elements of the game. We begin by defining two mutually exclusive states of the game, play and break based on the rules of soccer. We select a domain-tuned feature set, dominant color ratio and motion intensity, based on the special syntax and content characteristics of soccer videos. Each state of the game has a stochastic structure that is modeled with a set of hidden Markov models. Finally, standard dynamic programming techniques are used to obtain the maximum likelihood segmentation of the game into the two states. The system works well, with 83.5% classification accuracy and good boundary timing from extensive tests over diverse data sets.
1. INTRODUCTION
In this paper, we present new algorithms for soccer video structure analysis. The problem is useful in automatic content filtering for soccer fans and professionals, and it is more interesting in the broader background of video structure analysis and content understanding. By structure, we are primarily concerned with the temporal sequence of high-level game states, namely play and break, and the goal of this paper is to parse the continuous video stream into an alternating sequence of the two states automatically. This approach is distinctive from existing works, most of which focus on the detection of domain-specific events. And the advantages of parsing structures separately from event detection are: (1) typically no more than 60% of content corresponds to play, thus we can achieve significant information reduction; (2) content characteristics in play and break are different, thus we can optimize event detectors with such prior knowledge.
Related work in the literature mainly lies in sports video analysis, including soccer and various other games, and general video segmentation. For soccer video, prior work has been on shot classification [2], scene reconstruction [8], and rule-based semantic classification [6]. For other sports video, supervised learning was used in [9] to recognize canonical views such as baseball pitching and tennis serve. For general video classification, hidden Markov models (HMM) is used [3] to distinguish different types of programs such as news, commercial, etc. Our previous work [7] built heuristic rules using a domain-specific feature, dominant color ratio, to segment play and break. The work presented in this paper focuses on two specific aspects that were not investigated in the previous work: (1) using formal statistical techniques to model domain-specific syntactic constraints rather than constructing
Ajay Divakaran, Huifang Sun
Mitsubishi Electric Research Lab
Murray Hill, NJ {ajayd, hsun}@http://www.77cn.com.cn
heuristic rules directly; (2) using simple, but effective features to capture the content syntax.
We first define play and break as the set of soccer semantic alphabets used in this paper, and then we select two features based on observations of soccer video syntax: dominant color ratio and motion intensity. The stochastic structure within a play or a break is modeled with a set of HMMs, and the transition among these HMMs is captured with dynamic programming. Average classification accuracy per segment is above 80%, and most of the play/break boundaries are correctly detected within a 3-second offset.
Section 2 presents relevant observations of soccer video syntax and the selection of features; section 3 includes algorithms for HMM training and classification; section 4 describes our experiments and results in greater detail; section 5 concludes the paper.
2. VIDEO SYNTAX AND FEATURE SELECTION 2.1 Soccer game semantics
We define the set of mutually exclusive and complete semantic states in a soccer game: play and break [5]. The game is in play when the ball is in the field and the game is going on; break, or out of play, is the compliment set, i.e. whenever “the ball has completely crossed the goal line or touch line, whether on the ground or in the air” or “the game has been halted by the referee”.
Segmenting a soccer video into play /break is hard because of: (1) the absence of a canonical scene (such as the serve scene in tennis or the pitch scene in baseball video [9]); (2) the loose temporal structure, i.e. play/break transitions and highlights of a game (goal, corner kick, shot, etc) do not have a deterministic relationship with other perceivable events (as opposed to volleys are always preceded by a serve in a tennis game). Yet identifying play/break is interesting because not only can we achieve about 40% information reduction (Table 1), play/break information also has potential applications such as play-by-play browsing and editing, or play-break game statistics analysis.
2.2 Soccer video syntax
Soccer video syntax refers to the typical production style and editing patterns that help the viewer understand and appreciate the game. Two major factors influencing the syntax are the producer and the game itself, and the purpose of syntax is to emphasize the events as well as to attract viewers’ attention (such as the use of cutaways). Specifically, soccer video syntax can be characterized …… 此处隐藏:20457字,全部文档内容请下载后查看。喜欢就下载吧 ……