Identification of Chinese medicinal materials
China is rich in TCM resources, but the evaluation method for quality standards is an urgent problem to be solved in the TCM industry. Image quantitative analysis technology has been used to establish a microscopic identification pattern identification system for Coptis chinensis tissue cells, which provides a new three-dimensional quantitative research technology and visual data for the identification of authentic medicinal herbs. In addition, the triangular surface element and nonuniform rational basis spline surface reconstruction algorithm as well as the software-generated interrupt three-dimensional real-time graphics card realize the three-dimensional reconstruction and dynamic display of Chinese medicinal herbs.1,2Asarum heterotropoides Fr. Schmidt var. mandshuricum (Maxim.) Kitag, Asarum sieboldi Miq. var. seoulense Nakai, and Asarum sieboldii Miq., three Asarum genera, were identified. A total of 26 samples were selected as the training set, 19 Asarum samples were used as the test set for data mining, and quantitative classification features were obtained from Asarum essential oil using gas chromatography-mass spectrometry analysis. The results are consistent with those reported previously.3
However, the quality of TCM is determined by the type and content of its chemical components. The qualitative and quantitative analyses of only a few of the active ingredients cannot fully reflect the quality difference of TCM due to the overall effect of the TCM theoretical system, the synergistic effect of the TCM, and the compatibility relationship of the medicinal flavor. Regarding the quantitative study of the prescription compatibility principle, fuzzy mathematics quantitative tools are useful, and the integrated use of clustering analysis, pattern recognition technology, and statistical analytical methods with TCM prescriptions split and analyze the interactions between medicinal herbs in order to find the best compatibility relationship and dose, and then clarify the prescription.
Pattern recognition technology4 is becoming one of the most scientific and effective methods for TCM quality evaluation and TCM variety classification. Su et al.5 took the content of 0 macro and trace elements of 20 species as the classification characteristics and used the nonlinear reflection method in pattern identification to identify and classify 78 samples of broadleaf holly leaf, and the results were consistent with the actual situation. Moreover, Guo et al.6 used 3H-geniside as the tracer to observe the quantitative distribution of geniposide in mice, and they discussed the relationship between this change and the theory of gardenia. The results show that the distribution characteristics of the same organ are basically consistent with the relationship between gardenia and the viscera, thus providing a morphological basis for the traditional theory of gardenia. Furthermore, a quantitative method of finding alternatives for medicinal materials has been established: quantify the nature, taste, and meridian of TCM, and investigate the similarity between medicinal materials in order to find alternative medicinal materials.7 Additionally, through atomic absorption spectroscopy of 10 kinds of Xinwen Jiebiao medicinal herbs and 7 kinds of Wenli medicinal herbs, the content of 15 trace elements was determined, and the relationship between their efficacy and content was analyzed. The results showed that the efficacy of these two kinds of medicinal herbs was related to the contents of manganese, barium, and other elements, and the discrimination model of these two kinds of medicinal herbs was successfully established.8
Application of computer technology and data mining in TCM and a prescription database
Through the computer retrieval system of Atlas Classics and Materia Medica developed by Nanjing University of Chinese Medicine, readers can browse all of the original text of Materia Medica or the original text of a certain Chinese medicine. Based on National Chinese Herbal Medicine research, a large database was established that comprises 13,268 Chinese herbal medicine records (772 families), including 11,471 plant medicines, 1,634 animal medicines, and 163 mineral medicines. Each record contains information about the class, Latin scientific name, plant-animal-mineral, medicine, literature, location, efficacy, and other basic information.9 The “Chinese Herbal Medicine Database Retrieval System,” established by the Fujian College of TCM, adopts the mode of block design and uses the dictionary library structure to realize the intensive management of TCM special terms; in addition, it provides modern retrieval tools for scientific research, clinical practice, and teaching of Chinese medicinal herbs.10 The “TCM Prescription Coding and Literature Database System,” developed by the Nanjing University of Chinese Medicine, includes 101,903 ancient and modern prescriptions; this database can be used to search the name of the prescription, title, prescription medicinal herbs, functions, indications, etc. There are more than 40 large-scale TCM databases, including about 1.1 million pieces of information, such as TCM journal literature databases, disease diagnosis and treatment databases, various kinds of TCM prescription databases, ethnic medicine databases, all kinds of pharmaceutical enterprises, and a national standard database. This system can realize the single-library and multi-library selection query. Moreover, the “Chinese Medicine Commonly Used Prescription Database Retrieval System” includes Chinese medicines as well as the retrieval prescription, author, medicine, function, indications, pharmacological effect, and usage. Furthermore, it can be used to investigate the evolution of the prescription and compare the compatibility of the prescription, according to the function and pharmacological query of the corresponding prescription, etc. Previous study11 used the FoxBase + database system and the Universal Chinese Disk Operation System as the Chinese character support environment, and they developed an ancient Chinese medicine management and analysis system that can provide information regarding the medicinal herbal ingredients and other data according to various situations such as dynasty, disease name, and disease certificate. Through the analysis of the medical case prescriptions,12 including 416 prescriptions and 465 types of medicinal material, 23 kinds of “core prescriptions” were found, among which Siwu soup, Liujunzi soup, and Buzhongyiqi soup were given priority, in addition to “core medicinal herbs” such as licorice, ginseng, bighead atractylodes rhizome, angelica, and poria cocos as well as 13 kinds of medicine. These results are consistent with clinical medications and experience. Computer technology also plays a role in the expiration date of medicinal herbs. Chen et al.13 used ultraviolet spectrophotometry and HPLC-DAD-MS/MS to determine the absorption of silver yellow injection, predict the stability of chlorogenic acid and baicalin, and then imported the experimental data to the computer for calculation to predict the expiration date of silver yellow injection. Another research team applied TCM medical record management platform and SAS statistical software to analyze the cases of a professor in the treatment of type 2 diabetes, in order to explore the medication rules for the treatment of type 2 diabetes, so as to enrich and optimize the diagnosis and treatment plan for type 2 diabetes based on the experience of famous doctors.14 Therefore, it can be seen that computer technology is playing an increasingly important role in the study of TCM, and data mining technology provides a new and effective method for the objectification and standardized research of TCM. Combined with the TCM database, through the statistical analysis of the medicinal herbs in the top 20 dynasties, the changed rules of ancient and modern medicinal herbs were analyzed by data mining.15 Li et al. used the cluster analysis method to statistically analyze the medication rules of Banxiaxiexin soup, discussed the distribution and characteristics of clinical cases, and concluded that there are four main combinations of medicinal herbs for the treatment of digestive diseases and that their medication characteristics can reflect the treatment strategy of TCM syndrome differentiation.16 A total of 43 prescriptions were collected, and the data mining and analysis of the correlation rules and frequency analysis were adopted. It was found that 24 commonly used medicinal herbs mainly cure heat and dampness, guide Qi stagnation, and reconcile Qi and blood; these findings were in line with the common clinical rules for the treatment of dysentery in ancient and modern times.17 The prescriptions used for the treatment of thirst elimination were collected, and certain scientific compatibilities between single medicines and prescriptions were found through data mining technology; the conclusion was basically consistent with the principle of syndrome differentiation and treatment.18 Similarly, Dai et al. referred to the Dictionary of Traditional Chinese Medicine, selected 1,355 prescriptions of the spleen and stomach, and used data mining methods such as cluster analysis, corresponding analysis, and frequent collection methods to statistically analyze commonly used medicinal herbs and closely related cluster parties. They concluded that the basic prescription for replenishing the spleen and Qi is represented by the soup. In recent years, with the development of computer technology, molecular biology knowledge, and combined with data analysis, the diagnosis of spleen deficiency in TCM can be made more reasonable and systematic. For example, the assumption-based truth maintenance system (ATMS) artificial intelligence method has been applied to the mining research of clinical data of spleen deficiency in TCM.19 In addition, Cao et al.20 used data mining technology to calculate the contribution rate of the syndrome and syndrome group to diagnose spleen deficiency syndrome, and they recorded 1,564 cases to establish a mathematical model. Moreover, another study21 analyzed the gene expression profiles of the samples from patients with spleen deficiency, and they found type 2 diabetes mellitus-spleen deficiency pattern patients experienced significant hypoimmunity and/or immune dysfunctions, and possessed a specific gene expression profile, therefore they concluded that the gene expression profile was helpful for the differentiation and diagnosis of spleen deficiency syndrome. In the feature extraction stage, the Wilcoxon signed-rank test as well as the between-group and within-group sum of square ratio, respectively, were used. Furthermore, Liu et al.22 used data from 324 samples as the industry assessment samples and data from 99 new samples collected as the external assessment samples, and the degree of agreement between the analysis experts and computer simulation expert diagnostic procedures was assessed to evaluate the practical application effect of the measurement and diagnostic methods of the dialectical scale of spleen and stomach diseases. The positive predictive value was 77.2%, 93%, the total compliance rate was 88%, the compliance rate of the main certificate was 93.8%, and the compliance rate of the concurrent certificate was 79.7%. The positive predictive value of the external assessment was 100%, the real predicted value was 85.5%, the total coincidence rate of the false certificate was 87.9%, the coincidence rate of the main certificate was 90.9%, and the coincidence rate of the concurrent certificate was 73.8%. These results show that the computer measurement method of spleen and stomach diseases has a good diagnostic effect. Tang et al.23 used Bayesian discriminant analysis and a simplified scoring method to establish a differential diagnostic system of spleen and stomach Yin deficiency. According to the 13 main symptoms, the frequency of symptoms, and the contribution rate to the diagnosis, clinical research was conducted. In the systematic identification results, the contribution rate of abdominal distension, unregulated stool, epigastric discomfort, and hunger was >10%, which was considered to have great differential diagnostic significance; while the contribution rate of troublesome fever, fatigue, and thirst was <10%, which had a relatively little significance on the differential diagnosis. The total contribution rate of the differential diagnosis of all 13 symptoms was 92%, and the sensitivity and specificity of 25 cases of Yin deficiency of the spleen and stomach exceeded 90%, thus demonstrating the differential diagnosis by TCM. Jiang et al. first preprocessed the original data (1,355 spleen and stomach prescriptions) to standardize, structure, and digitize the prescription data. Then, according to the data characteristics of the prescription, cluster analysis, correspondence analysis, and frequent set methods were selected for quantitative analysis. The results show that through the data mining of the core medicinal herbs, medicinal herbs with each other, and the “prescription syndrome” and the formula structure, the correlation results are basically consistent with the general rules and characteristics of TCM spleen and stomach prescriptions.24 Zha et al. analyzed 292 clinical studies on TCM treatment of chronic gastritis. After information digitization processing, a total of 28 symptoms and a lingual pulse with a frequency of more than nine times were counted. These symptoms and lingual pulse were clustered, and 28 symptoms and lingual pulse were clustered into three categories. The first category included hiccups, fatigue, loss of appetite, and other symptoms, according to the theory of TCM, which can be judged as spleen and stomach Qi deficiency; the second category included dry mouth, epigastric pain, constipation, and other symptoms, which can differentiate between the liver and the stomach; and the third category included abdominal pain, abdominal pain, acid swallowing, abdominal distension, vomiting, etc.; such symptoms can be liver depression and fire, spleen, and stomach. These statistical results are basically consistent with the actual clinical symptoms.25 More than 600 clinical medical cases collected by Li et al. were standardized and processed, and the data used association analysis and the frequent pattern (FP)-tree algorithm to mine the association between symptoms and prescriptions, symptoms and syndrome, and medicinal herbs and syndrome. The results showed 151 association rules between symptoms and prescriptions, 116 association rules between symptoms and syndrome types, and 144 association rules between medicinal herbs and syndrome types.26 A new method for inheriting the academic thoughts and clinical experience of famous veteran doctors with the help of artificial intelligence technology was explored and a prototype system framework was proposed based on rules and deep learning models, which was demonstrated as a feasible way.27 A decision tree method based on information entropy was applied to explore TCM syndrome differentiation of chronic gastritis. Using the bootstrap method to amplify 406 cases, the decision tree algorithm C4.5 was used to determine the coincidence rate of model classification, and the results were as follows: 83.60% for the training set, 80.67%, and 81.25% for the test set. The sensitivity and specificity of the model were also high, which can be applied to the differential diagnosis of TCM syndrome forms of chronic gastritis.28 The symptoms of patients were grouped according to the stomach status, abdominal status, diet, excretion, tongue diagnosis, etc. First, the hierarchical clustering method was used to group symptoms, and the principal components of the symptoms in each group were used as the input to learn the Bayes network in order to analyze the symptoms after grouping. With 2,021 medical cases, the identification of chronic gastritis also achieved a high accuracy. In addition, according to the learning method of the Bayes network under incomplete data, an improved structural equation modeling algorithm combining simulated annealing and the Bayes classifier (BC) algorithm was proposed, where simulated annealing was used for structure selection and the BC algorithm for initial parameter estimation was able to improve the learning ability of Bayes networks with such data.29
At present, the data mining techniques used for chronic gastritis mainly include association rule mining, cluster mining, the decision tree algorithm, factor analysis, etc. These methods have expanded the methodology to TCM clinical diagnosis and treatment information mining as well as provide great technical support for the inheritance and mining of TCM diagnosis and treatment experience.30
Currently, domestic data mining technology analysis research is still in its infancy. For researchers in the field of medicine, many data processing software programs (such as Weka, B Miner, SPSS Clementine, SAS Enterprise Miner, etc.) contain the function of commonly used data mining methods. As the researchers’ understanding of “data mining” and its applications increases, these novel data analysis tools will have a positive role in promoting medical research.