바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2012, v.29 no.4, pp.191-208
https://doi.org/10.3743/KOSIM.2012.29.4.191

  • Downloaded
  • Viewed

Abstract

Two fundamental aspects of speech summary generation are the extraction of key speech content and the style of presentation of the extracted speech synopses. We first investigated whether acoustic features (speaking rate, pitch pattern, and intensity) are equally important and, if not, which one can be effectively modeled to compute the significance of segments for lecture summarization. As a result, we found that the intensity (that is, difference between max DB and min DB) is the most efficient factor for speech summarization. We evaluated the intensity-based method of using the difference between max-DB and min-DB by comparing it to the keyword-based method in terms of which method produces better speech summaries and of how similar weight values assigned to segments by two methods are. Then, we investigated the way to present speech summaries to the viewers. As such, for speech summarization, we suggested how to extract key segments from a speech video efficiently using acoustic features and then present the extracted segments to the viewers.

keywords
speech summarization, acoustic features, prosodic features, TED Talks, Praat, 스피치 요약, 비디오, 피치, 강도, 내재적 평가, 스피치 속도, speech summarization, acoustic features, prosodic features, TED Talks, Praat

Reference

1.

김현희. (2011). 비디오 의미 파악을 위한 멀티미디어 요약의 비동시적 오디오와 이미지 정보간의 상호 작용 효과 연구. 한국문헌정보학회지, 45(2), 97-118.

2.

정영미. (2007). 정보검색연구:구미무역출판부.

3.

Boersma, P.. (2006). Praat: Doing phonetics by computer. http://www.praat.org/.

4.

Cawkell, A.. (1995). A guide to image processing and picture management:Gower Publishing Ltd.

5.

Chen, B.. (2012). A risk-aware modeling framework for speech summarization. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 211-222.

6.

Ding, W.. (1999). Multimodal surrogates for video browsing (85-93). Proceedings of the Fourth ACM conference on Digital Libraries.

7.

Fujii, Y.. (2008). Class lecture summarization taking into account consecutiveness of important sentences (2438-2441). Proceedings of Interspeech.

8.

Furui, S.. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Transactions on Speech Audio Process, 12(4), 401-408.

9.

Hirschberg, J.. (1996). prosodic analysis of discourse segments in direction-given monologues (286-293). Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics.

10.

Lin, S.. (2009). A comparative study of probabilistic ranking models for Chinese spoken document summarization. ACM Transactions on Asian Language Information Processing, 8(1), 1-23.

11.

Liu, Y.. (2011). Speech summarization, In Spoken language understanding: Systems for extracting semantic information from speech:John Wiley & Sons, Ltd.

12.

Maskey, S.. (2008). Automatic broadcast news speech summarization.

13.

Maskey, S.. (2005). Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization (621-624). Proceedings of Interspeech.

14.

Maskey, S.. (2006). Summarizing speech without text using Hidden Markov Models (89-92). Proceedings of the Human Language Technology Conference of the NAACL (Companion Volume: Short Papers). Association for Computational Linguistics.

15.

Marchionini, G.. (2009). Multimedia surrogates for video gisting: Toward combining spoken words and imagery. Information Processing and Management, 45(6), 615-630.

16.

Murray, G.. (2005). Extractive summarization of meeting recordings (593-596). Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH).

17.

Turner, J.. (1994). Determining the subject content of still and moving documents for storage and retrieval: An experimental investigation.

18.

Turney, P.. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303-336.

19.

van Houten, Y.. (2000). Video browsing and summarization. Telematica Instituut.

20.

Wang, D.. (2007). An acoustic measure for word prominence in spontaneous speech. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 690-701.

21.

Xie, S.. (2009). Integrating prosodic features in extractive meeting summarization (387-391). Proceedings of the 11th Biannual IEEE Workshop on Automatic Speech Recognition and Understanding.

22.

Zhang, J.. (2007). Speech summarization without lexical features for Mandarin broadcast news (213-216). Proceedings of NAACL HLT(Companion Volume).

23.

Zhang, Z.. (2012). Active learning with semi-automatic annotation for extractive speech summarization. ACM Transactions on Speech and Language Processing, 8(4), 1-25.

24.

Zhang, J.. (2007). Improving lecture speech summarization using rhetorical information (195-200). Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding.

25.

Zhang, J.. (2007). A comparative study on speech summarization of broadcast news and lecture speech (2781-2784). Proceedings of the annual conference of the international speech communication association.

26.

Zhu, X.. (2009). Summarizing multiple spoken documents: Finding evidence from untranscribed audio (549-557). Proceedings of ACL/AFNLP.

Journal of the Korean Society for Information Management