Accueil || Parcours || Recherche || S'enregistrer || Mon Compte || Contacts || Aide || Langues
Jitén Söderberg, Joakim (2007) Modèle de Markov caché multidimensionnelle appliqué aux images et à l'analyse vidéo. Doctorat Multimedia, Signal et Images, Institut Eurécom, ENST p.164.
Plein texte disponible en tant que :
|
|
URL officielle: http://www.eurecom.fr/people/jiten.en.htm
Résumé
Recent progress and prospects in cognitive vision, multimedia, human-computer interaction, communications and the Web call for, and can profit from applications of advanced image and video analysis. Adaptive robust systems are required for analysis, indexing and summarization of large amounts of audio-visual data.
Image classification is perhaps the most important part of digital image analysis. The objective is to identify and portray the visual features occurring in an image in terms of differentiated classes or themes. Applications can be found in a wide range of domains such as medical image understanding, surveillance applications, remote sensing and interactive TV.
Traditional image classification methods analyses independent blocks of an image, which results in a context-free formalism. However there is a fairly wide-spread agreement that observations should be presented as collections of features which appear in a given mutual position or shape (e.g. sun in the sky, sky above landscape or boat in the water etc.) [20], [21]. Consider analyzing local features in a small region of an image; it is sometimes difficult even for a human to tell what the image is about.
In this dissertation we apply a statistical machine learning approach to model context in sequential data. With a statistical model in hand, we can perform several important tasks to image analysis such as; estimation, classification and segmentation.
We employ a new efficient algorithm that models images by a two dimensional hidden Markov model (HMM). The HMM considers observations statistically dependent on neighboring observations through transition probabilities organized in a Markov mesh, giving a dependency in two dimensions. The main difficulty with applying a 2-D HMM to images is the computational complexity which grows exponentially with the number of image blocks.
The main technical contribution of this thesis is a way of estimating the parameters of a 2-D HMM in O(whN^2) complexity instead of O(wN^(2h)), where N is the number of states in the model and (w,h) is the width respectively height of the image.
We investigate the performance of our proposed model (DT HMM), and search for its point of operation. Application to classification of TV broadcast frames reveal intrinsic weaknesses of the HMMs for which we propose remedies.
In an effort to introduce both global and local context in images, the DT HMM was extended to model multiple image resolutions. The results indicate that the earlier recorded deficiency can be conquered and that its performance can be compared with other known algorithms.
Finally we illustrate that the DT HMM formalism is open to a great variety of extensions and tracks. Since 3-D HMMs has been little studied we investigate the extension of the framework to three dimensions. We consider the case of video data, where the two dimensions are spatial, while the third dimension is temporal. To investigate the impact of the time-dimension dependency we explore the ability of the model to track objects that cross each other or pass behind another static object.
| Type d'EPrint: | Thèse (Doctorat) |
|---|---|
| Directeur de Thèse: | Merialdo, Bernard |
| Date: | 14 Février 2007 |
| Jury de Thèse: | Richard, Gaël et Rigoll, Gerhard et Joly, Philippe et Quenot, Georges |
| Ecole Doctorale: | ED 130 INFORMATIQUE, TELECOMMUNICATIONS ET ELECTRONIQUE (EDITE) |
| Discipline: | Multimedia, Signal et Images |
| Fonds: | ENST |
| Institution: | ENST |
| Laboratoire: | Institut Eurécom |
| Sujets: | 2. Sciences et technologies de l'information et de la communication |
| Mots-clés libres: | Hidden Markov Model, three-dimensional HMM, Video modeling, Object tracking, Image classification, multiresolution hidden Markov model, EM-algorithm. |
| Code ID: | 2454 |
| Déposé par : | Joakim Jiten Söderberg |
| Déposé le : | 25 Juin 2007 |
Table des Matières
1.1 Motivation - 1
1.2 Contribution and Outline - 2
2.1 Image Understanding - 5
2.1.1 The Meaning of the Picture and Ontologies - 6
2.1.2 Hyponomic Ontology - 6
2.1.3 Meronomic Ontology - 7
2.1.4 Notes from Cognitive Psychology - 8
2.2 Knowledge Representation and Control Strategy - 8
2.3 Applications - 9
2.4 Statistical Learning - 12
2.4.1 Concept Learning - 12
2.4.2 Classification Algorithms - 13
2.4.3 Bayesian Classifier - 14
2.4.4 Gaussian Mixture Model - 16
2.4.5 Naive Bayesian Classifier - 21
2.4.6 Dynamic Bayesian Nets - 21
2.4.7 Hidden Markov Model - 22
2.4.8 HMM and Image Modeling - 23
2.4.9 Reestimation Formulas - 24
2.5 2-D HMM - 27
2.5.1 Necessary 2-D Extensions for Image Classification - 28
2.5.2 Markov Random Field - 29
2.5.3 Markov Mesh Random Field - 31
2.5.4 Previous Work on 2-D HMM - 33
3.1 Dependency Tree - 40
3.2 Solution to Problem 1 (Evaluation Problem) - 42
3.3 Solution to Problem 2 (Decoding Problem) - 43
3.4 Solution to Problem 3 (Learning Problem) - 44
3.5 Implementation Issues for HMMs - 49
3.6 Experiment - 50
3.6.1 Context Dependent Image Categorization - 50
3.6.2 System Design - 51
3.6.3 Extracted Low-Level Features - 52
3.6.4 The Dataset - 53
3.6.5 Results - 54
3.6.6 Conclusion - 57
3.7 Combination with a Global Model - 58
3.7.1 Vector Quantization - 58
3.7.2 Global Model - 60
3.7.3 Results - 60
3.7.4 Conclusion - 63
3.8 Influence of the Dependency Tree - 64
3.8.1 DT HMM Probability Estimation - 64
3.8.2 Estimation by Average - 65
3.8.3 Estimation by unique sampling - 66
3.8.4 Estimation with dual tree - 67
3.8.5 Conclusion - 68
4.1 Semantic Image Segmentation - 69
4.1.1 Related Work - 70
4.1.2 Semantic Segmentation - 70
4.1.3 States with semantic labels - 71
4.1.4 Model Training - 72
4.1.5 Experiment - 74
4.1.6 Conclusion - 76
4.2 Multiresolution Hidden Markov Model - 78
4.2.1 Previous Work on 2-D MHMM - 78
4.2.2 DT MHMM - 82
4.2.3 Algorithms - 83
4.2.4 Experiment - 84
4.2.5 Conclusion - 87
5.1 Introduction - 89
5.2 3-D DT HMM - 90
5.2.1 3-D Viterbi Algorithm - 91
5.2.2 Relative Frequency Estimation - 93
5.3 Experiment - 94
5.3.1 Model Training - 94
5.3.2 Object Tracking - 96
5.3.3 Conclusion - 100
6.1 Summary and Contributions - 101
6.2 Future Work - 104
Appendix A Training Data 107
A.1 TRECVid Archive - 107
A.2 Low-level Features - 109
Appendix B Implementation Notes 115
Appendix C Notation and Abbreviations 119
Bibliography 123
Administrateurs de l'archive uniquement : éditer cet enregistrement