바로가기메뉴

본문 바로가기 주메뉴 바로가기

ACOMS+ 및 학술지 리포지터리 설명회

  • 한국과학기술정보연구원(KISTI) 서울분원 대회의실(별관 3층)
  • 2024년 07월 03일(수) 13:30
 

logo

A Study on Effective Internet Data Extraction through Layout Detection

INTERNATIONAL JOURNAL OF CONTENTS / INTERNATIONAL JOURNAL OF CONTENTS, (P)1738-6764; (E)2093-7504
2005, v.1 no.2, pp.5-9
Sun Bok-Keun (Dept. Computer Engineering, Asan, Hoseo University)
Han Kwang-Rok (Dept. Computer Engineering, Asan, Hoseo University)

Abstract

Currently most Internet documents including data are made based on predefined templates, but templates are usually formed only for main data and are not helpful for information retrieval against indexes, advertisements, header data etc. Templates in such forms are not appropriate when Internet documents are used as data for information retrieval. In order to process Internet documents in various areas of information retrieval, it is necessary to detect additional information such as advertisements and page indexes. Thus this study proposes a method of detecting the layout of Web pages by identifying the characteristics and structure of block tags that affect the layout of Web pages and calculating distances between Web pages. This method is purposed to reduce the cost of Web document automatic processing and improve processing efficiency by providing information about the structure of Web pages using templates through applying the method to information retrieval such as data extraction.

keywords
Information Retrieval, Data Extraction, Layout, HTML, XML Technologies

INTERNATIONAL JOURNAL OF CONTENTS