Data validity
From 1998 to 2022, a total of 746 studies on the genetic diversity of marine species in the China Sea were published (Fig. 2). Among these, more studies were published in Chinese, accounting for 56.03% (418 out of 746), and studies published in English accounting for his 43.97% (328 out of 746). After a period of rapid growth from 1998 to 2012, the number of Chinese, English, and total publications recently plateaued. Additionally, the relationship between species and studies is many-to-many, resulting in a total of 840 data sets spanning 343 species. To adhere to filter criteria and ensure accuracy of information, data on taxa, molecular markers, and geographic information were first extracted, followed by a thorough double-checking process. Standardized scientific nomenclature for synonyms. For studies with partial data cited, we examined the cited studies to ensure data accuracy. Meanwhile, a logic test was performed to verify the accuracy of the sampled data: Min_n * Pop_n ≤ Total_n ≤ Max_n * Pop_n.
Taxonomic and molecular marker bias
The proportions of the dataset showed a tendency to cluster among taxa (Fig. 3a). From 1998 to 2022, fishes were the most represented taxon, accounting for 46.43% (390 of 840) of the datasets, followed by molluscs (23.45%). [197 of 840]) and crustacean group (14.76%) [124 of 840]). The respective proportions of other taxa (plants, cnidarians, mammals, etc.) were relatively small (5.24% or less). The most commonly recorded species of the fish group are Larimictis polyactis, Larimictis Crocea and pumps argenteusaccounting for 4.87% (19 out of 390 people), 3.33% (13 out of 390 people), and 3.33% (13 out of 390 people), respectively. Mytilus corcusus (5.58% [11 of 197]) and Ruditapes filipinarum (5.08% [10 of 124]) had the largest number of datasets in the mollusk group, but Portunus trituberculutus (18.55% [23 of 124]) and oratoscira oratoria (14.52% [18 of 124]) ranked first and second in the crustacean group, respectively. All the above fish species are traditionally important economic fishing species in the China Sea.
The composition of molecular markers used by different groups showed a preference for class I (Figure 3b). For fish, molluscs, crustaceans, and other groups, class I was the most commonly used molecular marker class. Their usage frequency is 53.59% (209 out of 390), 43.65% (86 out of 197), 47.58% (59 out of 124), and 43.53% (out of 85) of the datasets in each group, respectively. 37 pieces). The observed and expected numbers (based on mean proportion 1/6) of class I in each group were significantly different (chi-square test, p< 0.0001 for all groups mentioned above). The popularity of mtDNA applications in population genetics studies can be attributed to its well-known properties such as maternal inheritance, small size, and high mutation rate.19.
The use of different marker classes changed over the years into three trends (Figure 3c). Class I (p≦0.0001) and class IV (p≧0.0006). The proportion of class II decreased significantly over time (p≧0.0002), the proportion of class V (p≧0.0043). No statistically significant differences were observed in the proportion of datasets with class III (p≥0.3396) or class VI (p≧0.8848). Class I was the most popular and widely used molecular marker type, accounting for almost half (47.14%). [396 of 840]) of all classes. Although there was an increasing trend, only 4.05% (34 of 840) were high-resolution class IV based on next-generation sequencing (NGS). NGS approaches are well known for their high resolution and efficiency.20, 21, 22, 23We fully expect applications, especially in the marine sector, to increase.
Unbalanced and insufficient sampling
A total of 3,658 sampling sites were identified and mapped in the dataset (Figure 4). Sampling at both provinces and prefecture-level cities showed an unbalanced trend. Sampling was concentrated in multiple provinces, and the top three provinces sampled were Shandong (17.71%). [648 of 3658]), Zhejiang (14.63%) [535 of 3658]), Guangdong (11.92%) [436 of 3658]) state. Within each province, we found some cities with very high sample populations, such as Beihai City (64.22%). [131 of 204]) Guangxi Zhuang Autonomous Region, Qinhuangdao (56.60%) [30 of 53]) Dalian, Hebei Province (54.24%) [160 of 295]) Liaoning. This unbalanced sampling pattern may be associated with reduced fish stocks and increased costs. To save costs, researchers tend to favor pioneer sites with high sampling frequency that serve as anchor sites for taxonomic and spatiotemporal comparisons.11.
Sample size is an important factor in research design as it relates to research costs and the robustness of conclusions.twenty four. Because the samples collected have unique characteristics, it is important to elucidate the current status of sampling. Linear regression showed the relationship between the number of geographic populations and the total number of samples per taxonomic group (Fig. 5a) and marker class (Fig. 5b). However, data sets containing only one geographic population accounted for 27.73% (233 out of 840) of the total data sets for different species. Sampling within the three populations is concentrated (51.55%) [433 of 840]) showed that wildlife sampling conducted in the China Sea was insufficient. We believe that small or unbalanced sample sizes may obscure temporal trends and spatial genetic variance.twenty five. There remains a need to strengthen sampling efforts of wild populations in the China Sea.