본 연구는 비디오의 오디오 정보를 추출하여 자동으로 요약하는 알고리즘을 설계하고, 제안된 알고리즘에 의해서 구성한 오디오 요약의 품질을 평가하여 효율적인 비디오 요약의 구현 방안을 제안하였다. 구체적인 연구 결과를 살펴보면 다음과 같다. 먼저, 제안 오디오 요약의 품질이 위치 기반 오디오 요약의 품질 보다 내재적 평가에서 더 우수하게 나타났다. 이용자 평가(외재적 평가)의 요약문 정확도에서는 제안 요약문이 위치 기반 요약문 보다 더 우수한 것으로 나타났지만, 항목 선택에서는 이 두 요약문간의 성능 차이는 없는 것으로 나타났다. 이외에 비디오 브라우징을 위한 오디오 요약에 대한 이용자 만족도를 조사하였다. 끝으로 이러한 조사 결과를 기초로 하여 제안된 오디오 요약 기법을 인터넷이나 디지털 도서관에 활용하는 방안들을 제시하였다.
The study proposed the algorithm for automatically summarizing the audio information from a video and then conducted an experiment for the evaluation of the audio extraction that was constructed based on the proposed algorithm. The research results showed that first, the recall and precision rates of the proposed method for audio summarization were higher than those of the mechanical method by which audio extraction was constructed based on the sentence location. Second, the proposed method outperformed the mechanical method in summary making tasks, although in the gist recognition task(multiple choice), there is no statistically difference between the proposed and mechanical methods. In addition, the study conducted the participants' satisfaction survey regarding the use of audio extraction for video browsing and also discussed the practical implications of the proposed method in Internet and digital library environments.
김재곤. (2000). 효율적인 비디오 브라우징을 위한 동적 요약 및 요약 기술구조. 방송공학회논문지, 5(1), 82-93.
정영미. (2005). 정보검색연구:구미무역 출판부.
진성호. (2005). 개인화된 의미기반 컨텐츠 소비를 위한 지능형방송 시스템과 서비스. 방송공학회 논문지, 10(3), 422-435.
Edmunson, H. P. (1969). New methods in automatic extracting. Journal of the ACM, 16(2), 265-285.
Furini, M. (2006). An Audio- video smmarisation scheme based on audio and video analysis (1209-1213). Proceedings of the IEEE Consumer Communications and Networking Conference.
Gunther, R. (2004). Using 3D sound as a navigational aid in virtual environments. Behaviour and Information Technology, 23(6), 435-446.
Hauptmann, A. G. (2005). Lessons for the future from a decade of informedia video analysis research. http://www.informedia.cs.cmu.edu/documents/CIVR05_Hauptmann.pdf.
Kristin, B. (2006). Audio surrogation for digital video: A design framework. UNC School of Information and Library Science.
Kupiec, J. (1995). A trainable document summarizer (68-73). Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159-165.
Mani, I. (2001). Automatic summarization:John Benjamins Publishing Co.
Marchionini, G. (2006). The Open Video Digital Library: A Möbius strip of research and practice. Journal of the American Society for Information Science and Technology, 57(12), 1623-1643.
Money, A. G. (2008). Video summarisation: A conceptual framework and survey of the state of the art. Journal of visual communication and image representation, 19(2), 121-143.
Money, A. G. (2009). Analysing user physiological responses for affective video summarisation. Displays, 30, 59-70.
Myaeng, S. H. (1999). Development and evaluation of a statistically-based document summarization system in Advances in automatic text summarization:The MIT Press.
Over, P. (2005). TRECVID, 2005: An introduction (1-14). Proceedings of the TRECVID.
Schmandt, C. Audio- Streamer: Exploiting simultaneity for listening. http://doi.acm.org.libproxy.lib.unc.edu/10.1145/223355.223533.
Smeaton, A. F. (2007). Techniques used and open challenges to the analysis, in- dexing and retrieval of digital video. Information Systems, 32, 545-559.
Smeaton, A. F. (2006). A usage study of retrieval modalities for video shot retrieval. Information Processing and Management, 42(5), 1330-1344.
Song, Y. (2007). Effects of audio and visual surrogates for making sense of digital video (867-876). Proceedings of CHI 2007.
Sparck Jones, K. (2007). Automatic summarising: The state of the art. Information Processing and Management, 43, 1449-1481.
Witbrock, M. (1998). Speech recognition for a digital video library. Journal of the American Society for Information Science and Technology, 49(7), 619-632.
Yang, M. (2005). Deci- phering visual gist and its implications for video retrieval and interface de- sign (2-7). Conference on Human Factors in Computing Systems(CHI).