Introduction
The raccoon dog, Nyctereutes procyonoides, is a member of the Canidae family that has a widespread distribution across various regions, including Europe (Schwemmer et al., 2021), China (Diao et al., 2022), Vietnam (Van Pham et al., 2023), Japan (Okabe & Agetsuma, 2007), and South Korea (Jeong et al., 2017). The animal exhibits omnivorous food behavior, which includes the consumption of various types of food sources, including small mammals, birds, reptiles, amphibians, insects, fruits, berries, and carrion (Sidorovich et al., 2008; Sutor et al., 2010). Globally, six subspecies of raccoon dog have been identified (Ellerman & Morrison-Scott, 1951), with two subspecies occurring within the geographic area of the Korean Peninsula. Nyctereutes procyonoides koreensis is predominantly found in the southern of the Korean Peninsula (Hong et al., 2013, 2018; Yang et al., 2017), while a fewer number of Nyctereutes procyonoides ussuriensis can be found in the southern and northern Hamgyong-do regions (Won, 1967).
The utilization of geospatial technologies, such as geographic information systems (GIS), is highly advantageous for understanding and investigating the distribution of habitats and their connections with environmental factors (Choi et al., 2011a; Lee & Rezaie, 2021). GIS approaches can be classified as statistical, heuristic or index-based, and artificial intelligence (AI). Statistical approaches that support data analysis include bivariate analysis and multivariate analysis. Both involve the examination of relationships between variables that have distinct characteristics and assist different purposes. The frequency ratio (FR) (Asmare, 2023), spatial principal component analysis (Ruymgaart, 1981), and logistic regression (Choi et al., 2011a; Lee & Sambath, 2006) represent both bivariate and multivariate approaches. In a previous study, FR was applied for habitat mapping of two polychaeta species, Prionospio japonica and Prionospio pulchra, showing prediction accuracies of 77.71% and 74.87%, respectively (Choi et al., 2011b). FR can consider spatial correlation between variables, which can be beneficial when analyzing spatial data (Lee & Pradhan, 2007). The FR approach is very effective in identifying phenomena. The allocation of weights to parameters and decision alternatives for generating distribution maps is accomplished through development methods, including heuristic or index-based approaches (Huang et al., 2020; Yalcin et al., 2011) and AI (Lee & Rezaie, 2021).
The heuristic or index-based approaches have been applied in various scenarios to assist indecision-making processes. Their application involves the integration of various criteria or variables to evaluate and identify disasters (Quarantelli et al., 2007), landslide susceptibility (Althuwaynee et al., 2016), ground subsidence hazard mapping (Park et al., 2012), habitat suitability mapping (Ahmed et al., 2021), and forest management (Henderson & Hoganson, 2021). These methods employ a predetermined set of decision rules or criteria to allocate weights and scores to various environmental variables. The heuristic or index-based approaches were utilized to integrate and prioritize criteria in multi-criteria decision-making for flood susceptibility (Khosravi et al., 2019), the analytical hierarchy process groundwater potential mapping (Arabameri et al., 2018), and fuzzy logic for analyzing the suitability of nesting habitat (Zabihi et al., 2017). This enabled the development of maps through weighted combinations of factors. Fuzzy algorithms are beneficial for dealing with various real-world issues (Murmu & Biswas, 2015). However, the fuzzy logic’s generalization capability is considered inadequate due to its reliance on heuristic algorithms for defuzzification, rule evolution, and antecedent processing (Singh et al., 2012).
Machine learning provides the benefit of adaptability based on specific requirements, including feature selection (Jie et al., 2018), spatial data handling (Du et al., 2020), model interpretability (Li et al., 2022), and model evaluation (Reich & Barai, 1999). The selection of a specific algorithm is based on various factors including the characteristics of the data, the pattern complexity of input data, and the model’s intended application or level of accuracy. The following algorithms have been frequently used for habitat mapping: random forest, support vector machines (Oh et al., 2019), artificial neural networks (ANN) (Lee et al., 2013), gradient boosting (Cai et al., 2014), and clustering algorithms (Barve et al., 2023). In a previous study, the ANN method was applied to estimate the potential habitat distribution of macrobenthos (Macrophthalmus dilatatus, Cerithideopsilla cingulata, and Armandia lanceolate) (Lee et al., 2013). The habitat distribution model was generated with various high-resolution factors, such as the intertidal digital elevation model, slope, aspect, exposure time, channel distance, channel density, sediment distribution, and IKONOS band 4. The validation results indicated that the average prediction accuracies for M. dilatatus, C. cingulata, and A. lanceolata were 74.9%, 78.32%, and 73.27%, respectively (Lee et al., 2013). ANN has emerged as a powerful approach for a wide range of applications, including habitat mapping. However, the challenge of ANN is the determination of an appropriate size and optimal structure for the neural network (Singh et al., 2012).
Deep learning is an influential AI approach that has made remarkable advances in comprehending and processing complex data. Deep learning has been widely recognized as an effective machine learning technique that is controlled, time-efficient, and cost-effective (Dargan et al., 2020). Moreover, achieving optimal results with these models typically entails significant computational resources, extensive labeled datasets, and meticulous model selection and optimization. The most frequently employed deep learning algorithms are recurrent neural networks (Shi et al., 2018), long short-term memory (LSTM) networks, and convolutional neural networks (CNNs) (Lee & Rezaie, 2021). Lee and Rezaie (2021) used CNN and LSTM for mapping potential habitats for Siberian Roe Deer. The results indicated that the predictive performance of both models was similar. However, the LSTM model had higher prediction potential, with a prediction accuracy of 76% for training data and 73% for test data (Lee & Rezaie, 2021). However, the potential for overfitting is a recognized restriction associated with machine learning algorithms, especially in complex models with substantial capacity (Ookura & Mori, 2020). The phenomenon of overfitting presents an obstacle to the algorithm’s capacity to generate precise predictions (Alzubaidi et al., 2021).
Metaheuristic algorithms have made substantial advancements in problem-solving across various fields by offering effective solutions to complex optimization problems that can be challenging for machine and deep learning approaches to address (Rezaie et al., 2022a; Zhang et al., 2022). Sabzi et al. (2021) conducted a comparison of the effectiveness of the harmonic search algorithm and the imperialism competitive algorithm (ICA) in optimizing the hyperparameters of ANN. The findings confirm the superior performance of ICA in enhancing the accuracy and reliability of the outcomes.
The present study establishes a novel approach to habitat distribution modeling designed to reveal the intricate relationships between the distribution of raccoon dog habitat in South Korea and various factors using the FR approach. The main purpose is to establish an accurate distribution map of potential raccoon dog habitat in South Korea by employing the group method of data handling (GMDH), CNN, and LSTM algorithms. Integrating ICA as an optimization algorithm into GMDH, CNN, and LSTM approaches can contribute to optimizing the accuracy and robustness of the model in habitat mapping for raccoon dogs. These maps can be employed to support biodiversity conservation and make progress toward safeguarding the delicate balance between species preservation and human activities.
Materials and Methods
Study area
South Korea is bordered by North Korea to the north, the Yellow Sea to the west, the East Sea (Sea of Japan) to the east, and the Korea Strait to the south. It has a diverse geographical landscape, encompassing a range of topographical features, including mountains, hills, plains, and coastal regions. The western and southern coasts are characterized by numerous bays and inlets, while the eastern coast is more rugged and features steep cliffs. Approximately 70% of the nation’s territory is characterized by the presence of mountains and hills. The Taebaek Mountains run through the eastern borders of the nation, while the Sobaek Mountains are situated in the south. The western and central regions of South Korea exhibit a predominantly flat topography, characterized by fertile landscapes well suited to agricultural activities.
Based on the National Geography Information Institute of South Korea, the geographical characteristics of Korea comprise elevated terrain, mainly found along the eastern coastline, while lower elevations dominate along the western coastline. Consequently, the majority of the rivers in the region flow into the Yellow Sea and the South Sea. The eastern coastline has an identical and continuous stretch, while the rivers that flow into the East Sea have relatively short segments and high slopes. The west coast of South Korea exhibits a complex shoreline characterized by indentations, offshore islands, and deltas. South Korea has five main rivers, namely Nakdonggang, Hangang, Geumgang, Seomjingang, and Yeongsangang. Several rivers that flow towards the western and southern coasts have characteristics like significant length, gentle slopes, and wide-ranging basins, which contribute to substantial discharge volumes.
Over the last 30 years, South Korea’s climate has shown an ongoing increase in temperature. Based on climograph analysis, the average monthly temperatures during the 30-year period from 1981 to 2010 were higher compared with the preceding 30-year period spanning 1971-2000. South Korea’s yearly precipitation increased by 50 mm on average between 1981 and 2010. Precipitation was higher in the summer and lower in the spring and fall in most regions because of the East Asian Monsoon, which caused the summers to become hot and humid and the winters to become cold and dry (Zhisheng et al., 2015).
Dataset for spatial modelling
The National Institute of Ecology (NIE) conducted a comprehensive survey of the raccoon dog distribution with the mission of keeping track of animal populations since 2018. Utilizing a handheld global positioning system, the study tracked the specific positions of raccoon dog habitat. The observations conducted by NIE identified 2,238 locations of raccoon dog habitat thar are illustrated in Fig. 1. The development of a machine learning model and the validation of its performance require using both data obtained from raccoon dog habitats and data collected from non-habitat locations. Therefore, 2,238 points have been randomly selected in areas that have very low potential for raccoon dog habitat.
The habitat and non-habitat locations are randomly divided into training and testing datasets. Specifically, 70% (1,566 points of raccoon dog habitat and 1,566 points of non-raccoon dog habitat) of data is allocated for the training purposes, while the remaining 30% (671 points of raccoon dog habitat and 671 points of non-raccoon dog habitat) is designated for the validation step to compare the predictive ability of the developed models. The extraction of point attributes is performed by overlaying the training and testing datasets with habitat influencing factors.
Factors influencing habitat selection
The method of mapping the distribution of raccoon dog habitat involves considering various environmental factors describing habitat. In previous studies, raccoon dog habitat population showed an increase when the terrain ruggedness index (TRI) was 0-0.4148 m. TRI values of 0-80 m indicate that the terrain used by raccoon dogs is predominantly flat, with minimal changes in elevation over a given distance. The TRI approach refers to investigation and categorization, which aims to establish a measurable assessment of topographic variation (Riley et al., 1999). The investigation of topographic ruggedness has been widely used in several ecological studies as a variable to characterize habitat preference (Beasom et al., 1983; Dilts et al., 2023). Mountainous terrain provides raccoon dogs with natural shelters in the form of rock outcrops, crevices, and caves, which the raccoon dogs may utilize as den sites. The existence of mountains was shown to play a significant role in determining the raccoon dog habitats distribution (Hong et al., 2018). Topographic wetness index (TWI) represents the soil moisture, and equilibrium between catchment water supply and local drainage (Kopecký et al., 2021). In ecological studies, water supply is an aspect intricately connected to wildlife habitats in ecological studies that contributes an essential role in determining their environments and influencing their behavior and survival strategies (Fernald et al., 2012).
Valley depth data can be used to derive preferences of wildlife species, which, in turn, can provide insights into habitat preferences and distribution patterns of wildlife species. According to Marino and Rodríguez (2022) and Traba et al. (2017), deeper valleys are encouraged due to their ability to provide enhanced production and a more concentrated availability of preferred food sources. Raccoon dogs have a preference for habitats located at elevations that are lower than 300 m above sea level. Elevations factors is influenced by the availability of essential resources such as water, food, and suitable shelter in these lower-altitude areas (Melis et al., 2007). Slope and slope height factors influence habitat aspects in a context of livestock-wildlife issues (Marino & Rodríguez, 2022). These factors represents morphometric parameters such as drainage morphometry (Sreedevi et al., 2013), and microclimates (Burnett et al., 2008).
Morphometric characteristics play an essential role to characterize the landscapes, providing significant insights into hydrological processes and environmental conditions (Wilson, 2018). Peaks, ridges, passes, plains, channels, and pits, which have been generated from morphometric characteristics, provide essential components for geological investigations, hydrological assessments, and environmental analysis (Wang et al., 2010). Surface area data is utilized in performing wildlife habitats. This aspect has implications for allowing the migration of wildlife habitats. The wildlife habitat with suitable surface area allow to navigate through varied terrains, encouraging genetic diversity and species survival (Liu et al., 2018).
The presence of water is an essential aspect that significantly influences the distributions and population sizes of animal and plant species. Water availability is crucial in the formation of habitats and the maintenance of the overall well-being and variety of species (Xie et al., 2018). Therefore, the normalized difference vegetation index (NDVI) and normalized difference water index (NDWI) are obtained from Sentinel-2 satellite imagery data, which are processed using the green and near-infrared bands to generate NDWI, while the red and near-infrared bands are used to generate NDVI map with a 30-m spatial resolution. NDWI exhibits high sensitivity to changes in hydrological conditions (Talukdar & Pal, 2019) and has been used for indicating climate variables that were potentially utilized as environmental variables within species distribution models (Teng et al., 2021). NDVI has been used to assess the greenness and health of vegetation (Kusuma et al., 2019; Mohanasundaram et al., 2022).
The Ministry of Environment provided a land use/land cover map that is applied to generate drainage density, distance to drainage, and distance to roads maps. Červinka et al. (2015) showed the impact of road on distribution of raccoon habitats. In wildlife habitats across the world, particularly in regions exposed to hunting, roads, and high-traffic volumes, significant changes in animal spatial behavior and distribution occur (Bonnot et al., 2013). The change in habitat caused by drainage, especially in agricultural or urban areas, can have indirect effects on raccoon dogs and other wildlife species. The food resources and shelter options for raccoon dogs can be influenced by changes in water availability, water quality, and the surrounding vegetation resulting from drainage conditions (Lemly, 1994). Furthermore, the installation of drainage systems in agricultural landscapes can lead to transformations in land use and land cover, consequently affecting the accessibility of appropriate habitats (Ahearn et al., 2005). Based on previous studies and obtaining the required data, 14 variables including elevation, slope, valley depth, TWI, TRI, slope height, surface area, LS factor, NDVI, NDWI, distance to drainage, distance to roads, drainage density, and morphometric features are chosen for mapping potential habitats for raccoon dogs (Fig. 2).
Methods
This study applied four methods, including FR, GMDH-ICA, CNN-ICA, and LSTM-ICA are applied to effectively identify the distribution of potential raccoon dog habitat. The accuracy level of model prediction is assessed using the area under the receiver operating characteristic (ROC) curve (AUC). Fig. 3 illustrates the methodological processes employed in this study.
Frequency ratio
Frequency ratio is a bivariate statistical technique used for detecting potential statistical associations between a phenomenon and each associated variable. In this study, the FR values for each category or range of factors are obtained based on their association with the phenomenon (Lee & Talib, 2005), which—in this study—was raccoon dog habitat distribution. In terms of correlation analysis, the FR refers to the proportion of the area where raccoon dog lives in the study area. The FR is calculated by dividing the area of phenomena (raccoon dog habitat distribution) associated with a specific habitat variable subclass by the total study area within the same subclass, as shown in Equation 1 (Huang et al., 2020).
where N(Ri) represents the pixel of the raccoon dog habitat in the subclass i of the influencing factor; N(Fi) represents the total pixel of subclass i; N(R) represents the total raccoon dog habitat distribution of the influencing factor, and N(A) represents the total area.
A value of 1 represents an average correlation. A value exceeding 1 indicates a strong correlation between raccoon dog habitat potential and a habitat variable, while a value less than 1 indicates a weak correlation.
Group method of data handling
Group method of data handling was developed by Alexey G. Ivakhnenko (1970) in the 1970s and has found applications in various fields, including engineering (Dodangeh et al., 2020), economics (Zhang et al., 2013), and data analysis (Mulashani et al., 2022). The GMDH algorithm implements a self-organization principle to identify the optimal model complexity by systematically evaluating numerous models that fulfil the specified criteria (Ivakhnenko, 1978). The GMDH algorithm consists of multiple functions that effectively handle several issues and enhance the precision of problem-solving outcomes. The functions include linear, polynomial, and ratio-polynomial variations (Ivakhnenko & Ivakhnenko, 2000). The relationship between input and output variables can be described by a complex discrete form of the Volterra functional series, commonly referred to as the Kolmogorov-Gabor polynomial (Farlow, 1984). The model’s input and output variables are linked, as illustrated in Equation 2:
where a is the coefficient calculated using the least squares error approach (Mohebbian et al., 2020); m represents the number of input factor (Tran et al., 2023); y represents the expected result.
Convolutional neural network
Convolutional neural network is a deep learning algorithm that belongs to the broader category of machine learning approaches. Deep learning is a specialized area within the field of machine learning with an emphasis on the utilization of ANNs, which include a layer committed to the convolution operation. The fundamental architecture of the CNN model consists of convolution, pooling, and fully connected layers (Lecun et al., 1998; Yamashita et al., 2018).
The role of the input layer is to receive the raw data and transform it into a numerical format, typically represented as a multi-dimensional array (tensor). The convolutional layer functions is the core part of CNN. The process conducts convolution operations on the input data utilizing a collection of learnable filters, commonly referred to as kernels (Thi Ngo et al., 2021). Following the convolution process, an activation function is applied element-wise to the output of the convolutional layer. The activation function introduces non-linearity into the model, allowing the network to learn complex relationships in the data (Zhang & Wu, 2019).
Pooling helps to reduce computational complexity and improve translation invariance. The pooling layer reduces the spatial dimensions of the data by downsampling the feature maps generated by the convolutional layers (Rezaie et al., 2022b; Zafar et al., 2022). Common pooling operations include max pooling, which selects the maximum value within a small region, and average pooling, which calculates the average value within the region (Yu et al., 2014).
The fully connected layer assists in making predictions based on the high-level features extracted from the previous layers (Panahi et al., 2021). The output layer of a CNN, which generates the final output, typically consists of neurons that correspond to each categorizes.
Long short-term memory
Long short-term memory is another type of deep neural network algorithm which the output of the network is fed back into the network as the subsequent input (Kong et al., 2019). The architectural design of LSTM models demonstrates exceptional proficiency in capturing complex spatial patterns and temporal dynamics within various environments. The LSTM architecture is based on the idea of introducing special memory cells with gating mechanisms, allowing the model to retain and update information over long sequences without losing important information. The key components of an LSTM cell include input gates (it), forget gates (ft), cells ( ), output gates (ot) and cell outputs state (Ct) (Graves, 2012). The input gate determines how much of the new information (input data) should be added to the cell state. The forget gate (ft) is computed using the product of the previous hidden state (ht–1) and the current input (xt). The candidate cell state ( ) is represented in Equation 6 as the new information that could be added to the cell state (Ct) in Equation 5.
The output gate (ot) determines how much of the updated cell state (Ct) should be exposed as the output of the current time step. By considering the memory cells of the output state (Ct), the computation of the output gate (ht) values can be performed using Equation 8, as follows:
where W (Wf, Wi, Wc, Wo) is a weighted matrix; b (bf, bi, bc, bo) represents a bias vector, σ is a sigmoid function; xt is the input to the memory cell layer at time t; ⨀ is the operation of element-wise multiplication; and tanh is a hyperbolic tangent function.
Imperialism competitive algorithm
Imperialism competitive algorithm is a metaheuristic optimization algorithm inspired by the concept of imperialistic competition and the socio-political evolution of empires. It is inspired by the concept of the competitive nature of empires in terms of resources and dominance. In the algorithm, candidate solutions are represented as countries, and the optimization process simulates the interactions between these countries based on their “imperialistic" power and resources (Atashpaz-Gargari & Lucas, 2007). The ICA method has been applied to optimization problems, specifically in solving non-linear equations, for which the optimization technique is more robust and effective (Abdollahi et al., 2013). The algorithm consists of several sections, which represent a potential solution to the problem. These sections aim to achieve the optimum outcome for the specific issue. The steps of ICA are generating initial empires, assimilation, revolution, estimating the total cost of all empires, empire competition, and convergence (Wang et al., 2019). Moreover, the ICA has been effectively used to identify optimal results in certain applications, such as evaluating the quality of fruits and vegetables (Sabzi et al., 2021).
Model evaluation
Area under the curve serves as an effectiveness indicator for evaluating the predictive performance of machine and deep learning algorithms (Bradley, 1997). The AUC values are determined by generating ROC curve and subsequently calculating the area under the curve. Models that demonstrate higher AUC values are regarded to represent higher predictive accuracy. The model’s predictive performance is evaluated across five various ranges of AUC value: fail (0.5-0.6), poor (0.6-0.7), fair (0.7-0.8), good (0.8-0.9), and excellent (0.9-1.0) (Akay, 2021; Zzaman et al., 2021). A value below 0.5 is considered to indicate inconsistency (Swets, 1988). The calculation of AUC is applied to training and testing datasets, which are referred to as success rate and prediction rate, respectively (Arora et al., 2021). The success rate represents the model’s capacity to accurately represent what is observed, while the prediction rate curve demonstrates the model’s effectiveness in generating accurate predictions (Arabameri et al., 2019; Chen et al., 2019).
Results
Effects of habitat characteristics on raccoon dog habitat distribution
The FR model is utilized to assess the correlation between the habitat of the raccoon dog and habitat characteristics. The raccoon dog habitat distribution was analyzed using FR approaches to identify the role of significant habitat variables. As shown in Table 1, TRI has a significant impact on the movement and foraging behavior of raccoon dogs. These animals have been identified to have a wide home range, and rugged terrain might influence their movement patterns. In the TRI factor, classes 0 and 0.01-3.36 have the greatest influence on raccoon dog habitat with FR values of 1.979 and 1.627, respectively. The raccoon dog habitat is found to grow within the slope classes of 0-0.31 and 0.32-9.47 degree, which is shown by the FR value higher than 1. The surface area of raccoon dog habitat is adaptable to a wide range of landscapes in the class category of 900, which showed a higher FR value of 1.975. The relationship between distribution of raccoon dog habitat and TWI can be observed in the class of 12.32-27.22 and with FR value of 2.035. When considering slope height and elevation, the highest FR value is associated with the first class. The highest FR value of the valley depth is found in the last class of 122.99-712.71 m. The drainage density has a significant influence on the raccoon dog habitat, with the highest FR value of 2.176 in the class of 8.37-101.45. The influence of LS factor on raccoon dog habitat is shown by a FR value of 1.923 in class of 0. In terms of distance to roads, the highest FR value is associated with the first class, particularly in the category of very close distances to the road. The distance to drainage demonstrates that the raccoon dog habitat is significantly influenced by specific class of 0-0.01, as indicated by the FR value of 2.397. The vegetation factor is represented by the NDVI that has an influence on raccoon dog habitat with a FR value of 2.093 for the class of 0.34-0.67. The highest FR value for NDWI is 2.155 in the class of 0.22-0.47. The highest FR values for the morphometric features factor, which represents the landscape characteristics, is found to be associated with the class of ridge with the FR value of 4.312.
Map of potential raccoon dog habitat
The generation of a potential raccoon dog habitat map is performed using the FR method, utilizing the following formula:
where FRi referred to the FR value for each factor’s class, N represented the total of influencing variables, and i denoted each factor selected to develop the model.
The map of potential raccoon dog habitat is also generated using GMDH-ICA, CNN-ICA, and LSTM-ICA, and divided into five classes (i.e., very low, low, moderate, high, and very high) using the quantile method (Rezaie et al., 2023). Quantile technique is able to effectively represent the location, variation, and skew distribution of a dataset (Lodder & Hieftje, 1988). Fig. 4 represents the distribution map of raccoon dog habitat using FR, GMDH-ICA, CNN-ICA, and LSTM-ICA. Moreover, Fig. 5 illustrates the percentages of racoon dog habitat in each class of the models.
The evaluation of the predictive accuracy of models is determined by AUC analysis. In the training step, the AUC values for the FR, CNN-ICA, LSTM-ICA, and GMDH-ICA models are 0.775, 0.763, 0.759, and 0.729, respectively (Fig. 6). During the validation step, the FR, CNN-ICA, LSTM-ICA, and GMDH-ICA models achieve AUC values of 0.762, 0.757, 0.754, and 0.727, respectively.
Discussion
The maps of predicted raccoon dog habitat distribution, based on the FR, GMDH-ICA, CNN-ICA, and LSTM-ICA models, all reveal a similar pattern. The distribution of raccoon dog habitat is sparser on the west coast of South Korea. The expansion of their range is primarily attributed to their movement out of their established territory caused by insufficient food resources and disruptions from human activity (Jeong et al., 2017). Raccoon dogs spend a lot of time in wetland areas (during spring and summer) and on the mainland near the sea (Dahl & Åhlén, 2019; Melis et al., 2015). Moreover, raccoon dog habitat has a correlation with vegetation density that is represented by NDVI categorization of 0.34-0.67. The NDVI values between 0.25 and 0.55 indicate prairie, grassland, and farmland, while >0.55 may represent forests and woodland areas (Ghebrezgabher et al., 2020). Prior studies have shown that raccoon dogs residing in rural regions exhibit a preference for forests and grasslands as their primary habitats (Jeong et al., 2017). Raccoon dogs located in rural regions demonstrate a preference for forests and grasslands as their main habitats (Jeong et al., 2017). They tend to be common in wide-open spaces, farmland, along lakeshores, and at low altitudes (<300 m); however, individuals have been found as high as 800 m (Melis et al., 2007). The category drainage distance of 0 m had a high relationship to raccoon dog habitat. The existence of drainage ditches indicates that water bodies have significant characteristics in the raccoon dogs’ natural environment (Süld et al., 2017; Sutor & Schwarz, 2012). Moreover, the medium-sized Canidae have been observed to prefer smaller home ranges near human settlements (Saeki et al., 2007).
The deep learning approach is based on the principle that a data-driven model could be constructed using a variety of interconnected layers of computation (Sarker, 2021). The selection of GMDH, CNN, and LSTM for modeling raccoon dog habitat distribution, coupled with ICA for hyperparameter optimization, enables us to capture nuanced patterns in the raccoon dog habitat distribution data (Tien Bui et al., 2018). These methods offer a well-rounded approach, enhancing accuracy and enabling more precise for raccoon dog habitat mapping. During the validation step, CNN-ICA, LSTM-ICA, and GMDH-ICA models achieve AUC values of 0.757, 0.754, and 0.727, respectively. The elevated AUC values for CNN-ICA and LSTM-ICA, compared with GMDH-ICA methods, can be attributed to their advanced capabilities in capturing intricate patterns within the data. This feature of CNN and LSTM enable them to capture more nuanced patterns, entailing improved discriminatory power and higher predictive accuracy (Lee & Rezaie, 2021), resulting in higher AUC values compared with GMDH methods.
The combination of GMDH, CNN, and LSTM with ICA provides several advantages and addressed challenges in mapping the distribution of raccoon dog habitats. One advantage is the effective use of machine learning and deep learning techniques, enhancing the prediction accuracy of the distribution of raccoon dog habitats. Nevertheless, the algorithms have critical drawbacks. These limitations include the utilization of additional approaches for hyperparameter tuning. Future research can explore and compare different hyperparameter tuning methods, such as the gravity search algorithm, shuffled frog leaping algorithm, or differential evolution, to identify the most efficient approach for optimizing model hyperparameters. Moreover, the selection of suitable factors that influence habitat potential cannot be assisted by the lack of an integrated framework of recommendations. Thus, it is important to devote attention to the factor selection process, which might lead to erroneous outcomes when it has an undue impact. Furthermore, the inclusion of additional spatial data, such as land use and land cover changes, climate data, and human activities, can provide a more comprehensive understanding of raccoon dog habitat preferences and distribution patterns.
Conclusion
The current study has significantly advances our understanding of raccoon dog habitat mapping through using the FR method and optimized machine learning and deep learning approaches (i,e., GMDH-ICA, CNN-ICA, and LSTM-ICA). The model’s predictive ability and robustness is evaluated using the AUC, which confirmed the model’s effectiveness. During the validation step, CNN-ICA outperforms the other models. The integration of geospatial technologies allowed the investigation of complex relationships between raccoon dog habitat and 14 ecological factors. This approach presents a comprehensive perspective on the influence of ecological factors on habitat selection, thereby contributing helpful information for the development of conservation strategies.
References
, , , (2018) Feature selection in machine learning: a new perspective Neurocomputing, 300, 70-79 https://doi.org/10.1016/j.neucom.2017.11.077.
, , , , , , et al. (2019) A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and machine learning methods Journal of Hydrology, 573, 311-323 https://doi.org/10.1016/j.jhydrol.2019.03.073.
, , , , , (2019) Short-term residential load forecasting based on LSTM recurrent neural network IEEE Transactions on Smart Grid, 10, 841-851 https://doi.org/10.1109/TSG.2017.2753802.
, , (2021) Topographic Wetness Index calculation guidelines based on measured soil moisture and plant species composition Science of The Total Environment, 757, 143785 https://doi.org/10.1016/j.scitotenv.2020.143785.
Figures and Tables
Table 1
Factor | Classes | Percentage of pixels | Percentage of habitats | FR |
---|---|---|---|---|
Elevation (m) | 0-56 | 0.201 | 0.350 | 1.737 |
56.1-135 | 0.201 | 0.226 | 1.123 | |
135.1-246 | 0.200 | 0.213 | 1.062 | |
246.1-433 | 0.199 | 0.133 | 0.667 | |
433.1-1,900 | 0.198 | 0.079 | 0.399 | |
Slope (degree) | 0-0.31 | 0.217 | 0.429 | 1.975 |
0.32-9.47 | 0.197 | 0.322 | 1.634 | |
9.48-17.11 | 0.195 | 0.175 | 0.895 | |
17.12-24.44 | 0.196 | 0.051 | 0.260 | |
24.45-77.89 | 0.194 | 0.023 | 0.118 | |
Valley depth (m) | 0-13.97 | 0.192 | 0.090 | 0.468 |
13.98-33.54 | 0.211 | 0.088 | 0.419 | |
33.55-64.28 | 0.212 | 0.194 | 0.915 | |
64.29-122.98 | 0.189 | 0.255 | 1.352 | |
122.99-712.71 | 0.198 | 0.373 | 1.891 | |
TWI | 1.87-4.95 | 0.171 | 0.020 | 0.115 |
4.96-5.84 | 0.208 | 0.075 | 0.358 | |
5.85-7.34 | 0.220 | 0.167 | 0.757 | |
7.35-12.31 | 0.202 | 0.334 | 1.657 | |
12.32-27.22 | 0.199 | 0.405 | 2.035 | |
TRI | 0 | 0.216 | 0.426 | 1.979 |
0.01-3.36 | 0.197 | 0.320 | 1.627 | |
3.37-5.97 | 0.197 | 0.169 | 0.860 | |
5.98-8.95 | 0.195 | 0.057 | 0.291 | |
8.96-95.12 | 0.195 | 0.027 | 0.137 | |
Slope height (m) | 0-2.81 | 0.143 | 0.335 | 2.334 |
2.82-8.43 | 0.223 | 0.337 | 1.513 | |
8.44-19.67 | 0.286 | 0.242 | 0.847 | |
19.68-50.59 | 0.185 | 0.056 | 0.304 | |
50.60-716.71 | 0.163 | 0.029 | 0.181 | |
Surface area | 900 | 0.217 | 0.429 | 1.975 |
900.01-926.59 | 0.197 | 0.322 | 1.634 | |
926.60-953.18 | 0.195 | 0.175 | 0.895 | |
953.19-1,006.36 | 0.196 | 0.051 | 0.260 | |
1,006.37-4,290.24 | 0.194 | 0.023 | 0.118 | |
LS factor | 0 | 0.197 | 0.379 | 1.923 |
0.01-7.22 | 0.236 | 0.334 | 1.419 | |
7.23-14.44 | 0.199 | 0.133 | 0.667 | |
14.45-22.86 | 0.193 | 0.078 | 0.403 | |
22.87-306.77 | 0.175 | 0.076 | 0.433 | |
NDVI | –1-0.33 | 0.196 | 0.284 | 1.449 |
0.34-0.67 | 0.199 | 0.416 | 2.093 | |
0.68-0.76 | 0.181 | 0.162 | 0.895 | |
0.77-0.81 | 0.186 | 0.074 | 0.398 | |
0.82-1 | 0.239 | 0.065 | 0.271 | |
NDWI | –1-0.21 | 0.193 | 0.264 | 1.363 |
0.22-0.47 | 0.196 | 0.423 | 2.155 | |
0.48-0.54 | 0.177 | 0.175 | 0.985 | |
0.55-0.62 | 0.194 | 0.062 | 0.321 | |
0.63-1 | 0.239 | 0.077 | 0.320 | |
Distance to drainage (m) | 0-0.01 | 0.174 | 0.418 | 2.397 |
0.02-65.55 | 0.208 | 0.267 | 1.283 | |
65.56-131.11 | 0.212 | 0.165 | 0.776 | |
131.12-262.21 | 0.204 | 0.090 | 0.439 | |
262.22-16,716.06 | 0.201 | 0.061 | 0.303 | |
Distance to roads (m) | 0-0.01 | 0.209 | 0.462 | 2.213 |
0.02-28.45 | 0.244 | 0.373 | 1.526 | |
28.46-99.58 | 0.192 | 0.101 | 0.524 | |
99.59-241.83 | 0.179 | 0.044 | 0.244 | |
241.84-3,627.52 | 0.175 | 0.020 | 0.116 | |
Drainage density | 0-0.01 | 0.191 | 0.056 | 0.295 |
0.02-2.79 | 0.213 | 0.103 | 0.482 | |
2.80-5.17 | 0.215 | 0.192 | 0.892 | |
5.18-8.36 | 0.200 | 0.255 | 1.275 | |
8.37-101.45 | 0.181 | 0.394 | 2.176 | |
Morphometric features | Peak | 0.225 | 0.334 | 1.483 |
Ridge | 0.001 | 0.006 | 4.312 | |
Pass | 0.357 | 0.443 | 1.240 | |
Plan | 0.004 | 0.005 | 1.339 | |
Channel | 0.411 | 0.211 | 0.513 | |
Pit | 0.0009 | 0.0007 | 0.6937 |