바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Incorporating Deep Median Networks for Arabic Document Retrieval Using Word Embeddings-Based Query Expansion

JOURNAL OF INFORMATION SCIENCE THEORY AND PRACTICE / JOURNAL OF INFORMATION SCIENCE THEORY AND PRACTICE, (P)2287-9099; (E)2287-4577
2024, v.12 no.3, pp.36-48
https://doi.org/10.1633/JISTaP.2024.12.3.3
Yasir Hadi Farhan (Department of Medical Physics, College of Applied Sciences, University of Fallujah, Fallujah, Iraq)
Mohanaad Shakir (Department of Management Information System (MIS), College of Business (COB), University of Buraimi (UOB), Buraimi, Oman)
Mustafa Abd Tareq (Department of Computer Science, University of Technology-Iraq, Baghdad, Iraq)
Boumedyen Shannaq (Department of Management Information System (MIS), College of Business (COB), University of Buraimi (UOB), Buraimi, Oman)

Abstract

The information retrieval (IR) process often encounters a challenge known as query-document vocabulary mismatch, where user queries do not align with document content, impacting search effectiveness. Automatic query expansion (AQE) techniques aim to mitigate this issue by augmenting user queries with related terms or synonyms. Word embedding, particularly Word2Vec, has gained prominence for AQE due to its ability to represent words as real-number vectors. However, AQE methods typically expand individual query terms, potentially leading to query drift if not carefully selected. To address this, researchers propose utilizing median vectors derived from deep median networks to capture query similarity comprehensively. Integrating median vectors into candidate term generation and combining them with the BM25 probabilistic model and two IR strategies (EQE1 and V2Q) yields promising results, outperforming baseline methods in experimental settings.

keywords
automatic query expansion, information retrieval, word embedding, deep median networks, Arabic document retrieval, natural language processing

Submission Date
2023-10-10
Revised Date
2024-04-10
Accepted Date
2024-05-09

JOURNAL OF INFORMATION SCIENCE THEORY AND PRACTICE