• Users Online: 972
  • Home
  • Print this page
  • Email this page

 Table of Contents  
Year : 2021  |  Volume : 10  |  Issue : 4  |  Page : 442-456

Prediction of tuberculosis cases based on sociodemographic and environmental factors in gombak, Selangor, Malaysia: A comparative assessment of multiple linear regression and artificial neural network models

1 Department of Environmental and Occupational Health, Universiti Putra Malaysia, Selangor, Malaysia
2 Department of Medical Microbiology, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, Selangor, Malaysia
3 Department of Agriculture Technology, Faculty of Agriculture, Universiti Putra Malaysia, Selangor, Malaysia
4 Institute for Medical Research, National Institutes of Health, Selangor, Malaysia

Date of Submission05-Sep-2021
Date of Decision21-Sep-2021
Date of Acceptance21-Oct-2021
Date of Web Publication13-Dec-2021

Correspondence Address:
Malina Osman
Department of Medical Microbiology, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/ijmy.ijmy_182_21

Rights and Permissions

Background: Early prediction of tuberculosis (TB) cases is very crucial for its prevention and control. This study aims to predict the number of TB cases in Gombak based on sociodemographic and environmental factors. Methods: The sociodemographic data of 3325 TB cases from January 2013 to December 2017 in Gombak district were collected from the MyTB web and TB Information System database. Environmental data were obtained from the Department of Environment, Malaysia; Department of Irrigation and Drainage, Malaysia; and Malaysian Metrological Department from July 2012 to December 2017. Multiple linear regression (MLR) and artificial neural network (ANN) were used to develop the prediction model of TB cases. The models that used sociodemographic variables as the input datasets were referred as MLR1 and ANN1, whereas environmental variables were represented as MLR2 and ANN2 and both sociodemographic and environmental variables together were indicated as MLR3 and ANN3. Results: The ANN was found to be superior to MLR with higher adjusted coefficient of determination (R2) values in predicting TB cases; the ranges were from 0.35 to 0.47 compared to 0.07 to 0.14, respectively. The best TB prediction model, that is, ANN3 was derived from nationality, residency, income status, CO, NO2, SO2, PM10, rainfall, temperature, and atmospheric pressure, with the highest adjusted R2 value of 0.47, errors below 6, and accuracies above 96%. Conclusions: It is envisaged that the application of the ANN algorithm based on both sociodemographic and environmental factors may enable a more accurate modeling for predicting TB cases.

Keywords: Artificial neural network, multiple linear regression, prediction, tuberculosis

How to cite this article:
Mohidem NA, Osman M, Muharam FM, Elias SM, Shaharudin R, Hashim Z. Prediction of tuberculosis cases based on sociodemographic and environmental factors in gombak, Selangor, Malaysia: A comparative assessment of multiple linear regression and artificial neural network models. Int J Mycobacteriol 2021;10:442-56

How to cite this URL:
Mohidem NA, Osman M, Muharam FM, Elias SM, Shaharudin R, Hashim Z. Prediction of tuberculosis cases based on sociodemographic and environmental factors in gombak, Selangor, Malaysia: A comparative assessment of multiple linear regression and artificial neural network models. Int J Mycobacteriol [serial online] 2021 [cited 2023 Jan 28];10:442-56. Available from: https://www.ijmyco.org/text.asp?2021/10/4/442/332352

  Introduction Top

Tuberculosis (TB) is a chronic respiratory disease caused by Mycobacterium tuberculosis infection. The disease is mainly transmitted through the respiratory route from person to person by coughing or sneezing. The disease primarily attacks the lungs, but it can also affect other organs including the kidneys, bones, joints, spine, and brain. TB can enhance the disease progression of human immunodeficiency virus (HIV) infection to acquired immunodeficiency syndrome (AIDS). The World Health Organization (WHO) estimated that there were approximately 10 million TB incident cases in 2019, of which nearly 1.5 million patients have died. In addition, TB has also been ranked as the 10th leading cause of deaths worldwide.[1]

Although Malaysia is not one of the top 30 high TB burden countries identified by the WHO, the annual death rate due to TB infection is the highest (5–7 deaths per 100,000 population) in Malaysia as compared to that of other infectious diseases.[2] Despite efforts to control and stop the transmission of TB toward achieving the goal to eliminate TB in Malaysia by 2030, the disease remains a major public health challenge as there are many factors that affect the TB cases. Previous researches have illustrated that TB transmission is associated with sociodemographic factors such as gender,[3] nationality,[4] employment,[5] health-care workers,[6] income,[7] residency,[8] and smoking status.[9] In addition, environmental factors such as weather and air pollution appear to have a significant correlation with TB occurrence.[10],[11],[12]

One of the major weaknesses of traditional mathematical models such as the Markov chain models, autoregressive integrated moving average class models (ARIMAs), general regression models, and Grey models in predicting the TB cases is the difficulty in meeting assumptions.[13],[14],[15] This has led to the creation of potential bias in the estimation of TB cases. For example, the assumption factors of linear regression models are the normality for all variables, linear relationship of the input–output, a constant variance of the errors, and little or no multicollinearity. The linear regression models usually require a long-term and/or fulfilled dataset to obtain unbiased estimations.[16] On the contrary, machine learning techniques may yield proper estimations despite using noise-contaminated and incomplete datasets.[17],[18] As an advanced statistical tool, machine learning has been efficiently used for analyzing and modeling several complex health disciplines, including ecology, nursing, geospatial health, biomedicine, and epidemiology.[19],[20],[21],[22],[23] The increasing popularity of machine learning is due to their competencies in approximating almost any complex and nonlinear relationship between variables.[24],[25]

Inspired by the neural processes of the human brain, artificial neural network (ANN) is one of the most popular machine learning techniques used in disease prediction studies since the 1980s. It has a large number of highly interconnected processing elements (neurons) operating in unison to solve specific problems.[26],[27] For instance, it can adjust its structure to adapt to the different characteristics of data and has a stronger fitting ability. As compared to the traditional statistical models, ANN is independent of the statistical distribution of the data, and it does not require a priori knowledge of the data to derive patterns. ANN is a simplified mathematical model that can draw the relationship between the input and output layers by receiving several examples, that is, training data as well as analyzing nonlinear datasets. A properly trained network can further be used to predict outcome (s) from new data (test data).[28],[29],[30]

To date, ANN has been used to predict TB cases in a few studies. For instance, Lai et al.[31] applied ANN, support vector machine, and random forest to predict the occurrence of anti-TB drug-induced hepatotoxicity using patients' clinical and genomic data as independent variables. The results indicated that ANN exhibited the best performance, with values of 88.67%, 80%, and 90.40% obtained for accuracy, sensitivity, and specificity, respectively. The area under the curve value for the ANN model was 0.89 and 0.90 for the training and testing, respectively, which was significantly better than that of other machine learning techniques. Mollalo et al.[32] predicted TB cases using ANN models that were developed based on the proportion of immigrant population, underserved segments of the population, and minimum temperature, while the performance of the models was compared using linear regression test. The predictive performance of ANN and linear regression analysis was compared using (i) root mean square error: 0.33 and 0.35 and (ii) mean absolute error: 0.25 and 0.27 for training; (i) root mean square error: 0.35 and 0.36 and (ii) mean absolute error: 0.26 and 0.27 for validation; and (i) root mean square error: 0.35 and 0.36 and (ii) mean absolute error: 0.27 and 0.28 for testing, respectively. The results indicated that ANN performed the best in terms of the prediction accuracy.

A few studies have been published on designing ANN architectures for predicting other diseases worldwide. For instance, Zhang et al.[33] applied a back propagation ANN to predict the severity of rubella incidence. In this study, eight models including group 1 (models 1–4) and group 2 (models 5–8) based on different meteorological variables were utilized to establish rubella prediction models. The highest prediction accuracy was displayed in model 7, that is, 76% which obtained 82% for training and 70% for testing, with wind speed, cloud cover, and temperature being the strongest contributors. Moreover, Li et al.[34] developed an AIDS prediction model using back propagation ANN and ARIMA. The predictive performance of ANN and ARIMA was compared using (i) mean absolute error: 0.03 and 0.01, (ii) mean square error: 2.00 × 10−3 and 1.90 × 10−3, and (iii) mean absolute percentage error: 22.46 and 1.21, respectively, thus indicating better fitting and prediction effects of AIDS incidence by the ANN model.

Public health studies on TB cases and its epidemiological evidence at the national level are relatively insufficient, particularly at the district level in Malaysia. In this study, the Gombak district was selected as it had the highest TB incidence and ranks third for the total number of transient population as compared to other districts in Selangor, thus potentially enhancing the spread of TB. To the best of our knowledge, no study has employed ANN to predict the number of TB cases in Malaysia based on sociodemographic and environmental factors, as well as compared the performance of the models between each factor. Hence, to address this literature gap, it is crucial to employ a highly accurate prediction model across Gombak with the inclusion of local parameters to provide useful insights for TB control and early monitoring of the epidemic at a larger scale. Using surveillance data from 2013 to 2017, this study aims to examine the relationship between sociodemographic and environmental factors with TB cases, and to evaluate the applicability of multiple linear regression (MLR) and ANN to predict the number of TB cases using sociodemographic and environmental factors as the input variables.

  Materials and Methods Top

Study area

This study was performed using retrospective TB data obtained from the Gombak district. The geographical location is between longitude 101°34′ and 14°6′ East and latitude 3°16′ and 27.3°North′ which lies from the middle to the eastern part of Selangor state. Gombak is gazetted as one of the administrative districts in Selangor, with a coverage area of 650.08 km2. It occupies about 8.02% of Selangor's entire land area and represents the fourth largest population in Selangor. Gombak's resident population is about 815,200 in 2018, which ranks fourth (after Petaling, Hulu Langat, and Klang) among the eight districts in Selangor; of which approximately 90% of the population lives in rural areas.[35]

The annual average temperature is 27.1°C, reflecting the tropical climate which being hot and humid throughout the year. There is significant average rainfall in Gombak which is about 2535 mm/year.[36] The terrain is hilly in the eastern and a part of the northern and western regions whereby most of them are still covered with forests with the range of altitude between 100 and 500 m above the sea level. In the central and southwest regions, the areas are relatively low and lowland, with the range of average elevation being between 30 and 70 m above the sea level.[37] Gombak district is a part of the Klang Valley zone which is the area of major municipal areas, especially in the southern and western parts where the rapid urbanization process in the city of Kuala Lumpur spread to the southern part of Gombak. The district has undergone rapid development changes in recent decades, especially those involving industrial processes. The health sector is distributed into two government hospitals, eighteen government clinics, three private hospitals, and three private clinics.[38] In 2018, Selangor accounted for the highest TB burden with 5071 cases, among which Gombak was the district that reported the highest TB cases with about 700 cases each year.[39] [Figure 1] shows the geographical location of Gombak in the state of Selangor and the four mukims of Gombak.
Figure 1: Geographical location of Gombak in the state of Selangor and mukims of Gombak

Click here to view

Data collection

Sociodemographic data of tuberculosis cases

TB is classified as one of the notifiable infectious disease in Malaysia. Patients were diagnosed through X-ray results, pathogen detection, and diagnostic pathology tests based on the diagnosis criteria set by the Ministry of Health (MOH) in 2008. It is mandatory for all health-care providers in hospitals, clinics, institutions of disease prevention and control, and all other designated health-care establishments to report active TB cases in a timely manner into the TB Information System (TBIS) documents of the Gombak. After that, the data would be electronically transferred into the MyTB web by the MOH's officers. TB data from January 1, 2013, to December 31, 2014, were collected at Rawang Health Clinic in Rawang, while TB data from January 1, 2015, to December 31, 2017, were collected at the TB/Leprosy Unit of Gombak District Health Office in Batu Caves. The data were retrieved via MyTB and double checked from TBIS to prevent errors. The data were kept confidential and can only be accessed with approval from the Director of Gombak District Health Office.

The broad range of sociodemographic variables includes age (years <15, 15–64, and 64>), gender (male and female), race (Malay, Chinese, Indian, and others), country of origin (Malaysia, Indonesia, Myanmar, Bangladesh, and others), nationality (Malaysian and non-Malaysian), educational level (secondary school and below and higher than secondary school), employment status (employed and unemployed), health-care worker status (health-care worker and nonhealth-care worker), residency (urban and rural), income status (permanent income and not permanent income), and smoking status (smoking and not smoking). In addition, year and month were defined according to the date when the TB diagnosis was confirmed.

Out of 3590 cases, 181 cases (5.04%) were excluded; in which 2.53% were residents outside the study zone, while another 2.51% were not diagnosed within the study period. After further removing cases that could not be geocoded (2.46%) due to incorrect, missing, or unclear address information, only 3325 cases (97.54%) were included in the final analysis.

Environmental data

Air pollution data including air quality index (AQI) and concentrations (ppm) of carbon monoxide (CO) concentrations, nitrogen dioxide (NO2), and sulfur dioxide (SO2), and concentration (μ/m3) of particulate matter 10 (PM10) were collected from the Department of Environment, Malaysia, which has seven monitoring stations in Kelang, Petaling Jaya, Shah Alam, Kuala Selangor, Banting, Cheras, and Batu Muda. The rainfall (mm) data were collected from the Department of Irrigation and Drainage, Malaysia, based on data from 12 monitoring stations (Batu Arang, Bukit Antarabangsa, Bandar Tasik Puteri, Country Home, Jalan Gombak, Kampung Merbau, Kampung Setia Kuang, Taman Bukit Rawang, Taman Desa Kundang, Taman Garing Utama, Taman Templer, and Ampang). Other weather data including temperature (°C), relative humidity (%), wind speed (m/s), and atmospheric pressure (hPa) were obtained from the Malaysian Meteorological Department based on data from five monitoring stations located in Kepong, Sepang, Petaling Jaya, Subang, and Sungai Buloh Estate.

All these stations [Figure 2] in Selangor and Kuala Lumpur recorded the level of daily environmental measures from July 2012 to December 2017 either automatically or manually. The data from all these stations were tabulated to produce the monthly average data during the study period. Some studies reported that the estimation of the average incubation period of TB infection ranges from 1 to 2 months, with a 2-month interval from the symptom appearance to diagnostic test.[40],[41] In addition, reporting of TB to a health-care institution also requires times that can cause a delay.[42] Leung et al.[43] recommended that the maximum lag time was 6 months. This study set the air pollution and weather factors from 0- to 6-month lag from the diagnosis of TB, which was rational and could be applied to monitor a relatively long-term effect caused by air pollution and weather factors.
Figure 2: Spatial distribution maps of environmental monitoring stations in Selangor and Kuala Lumpur

Click here to view

Based on the air quality and weather monitoring stations in [Figure 2], only few stations are distributed over the study area. The model specification adopted in this study did not allow for any missing values in the variables. Hence, any missing values for the variables were imputed by interpolation from the nearest stations.[44] In this study, kriging was used to interpolate the value of a variable at unsampled locations based on the measurement at nearby locations by fitting a semivariogram model which is a function of spatial distance. Kriging is a proven method to interpolate because it can produce the lower errors and better prediction accuracy when compared to that of other geostatistical interpolation methods such as inverse distance weighting and Kernel smoothing. Further, the fitted model in kriging does depend not only on the distance between the measured points and the prediction location but also on the spatial relationships among the measured values around the prediction location.[45],[46],[47] Although those with datasets consist of relatively few samples, kriging could produce low error estimations.[48] The assessment of the prediction using kriging was conducted to validate the interpolations,[49] whereby this study found that the prediction using kriging has very low value of the root mean square error (<0.157) for interpolating air pollution and weather variables.

The data from all the monitoring stations were tabulated to produce the monthly average data during the study period. Different output cell sizes of environmental data were tested (2000 m, 1000 m, 500 m, 200 m, and 100 m) using the kernel density. Then, Global Moran's I was used to determine the appropriate output cell size for the interpolation of environmental data in kriging. The results of Global Moran's I found that 100 m yielded the best output. Hence, this study specifically set the cell size of 100 m for the output raster dataset in the environment of ArcGIS® version 10.7 (Environmental Systems Research Institute, Inc. Redlands, CA, USA). Subsequently, the maps were produced in the format of raster files in ArcGIS.

Geographical data

A polygon shapefile with each mukim's boundary under the administration of Gombak District with a scale of 1:4000,000 was obtained from the Ministry of Agriculture and agro-based industries. To display the geographical distribution of TB cases, the TB and environmental data were imported into the attribute table of the spatial district data using ArcGIS version 10.7. The Universal Transverse Mercator coordinates of the patient's residence were geocoded from their home addresses recorded in the database using Google Earth™ version 7.15 (Alphabet Inc., Mountain View, CA, USA). The extracted data were georeferenced with the district polygons using Geographic Information System.

Topographic data

Altitude data showing topographic variation were derived from the 30-m digital elevation model obtained from the Shuttle Radar Topography Mission provided by the United States Geological Survey Earth Explorer.

Data analysis

Descriptive analysis

Descriptive statistical analysis for the sociodemographic and environmental data over a 5-year period from 2013 to 2017 was performed using SPSS version 23.0 for Windows (Rel. 11.5.0. 2002; SPSS Chicago, IL, USA). Frequency distributions such as sum, percentage, mean, confidence interval, standard deviation, and calculated probability, that is, P value were used to describe the sociodemographic characteristic of TB cases; whereas mean, median, standard deviation, percentiles, minimum value, and maximum value were used to summarize the environmental data.

Multiple linear regression

MLR was tested based on the same independent variables used in the three geographically weighted regression (GWR) models that had been developed in a previous study.[50] With reference to GWR models, that is, GWR1, GWR2, and GWR3, three MLR models were constructed, namely MLR1, MLR2, and MLR3, respectively. For GWR1, TB cases were associated with sociodemographic factors such as gender, nationality, employment status, health-care worker status, income status, residency, and smoking status. For GWR2, TB cases were associated with environmental factors such as AQI (lag 1), CO (lag 2), NO2 (lag 2), SO2 (lag 1), PM10 (lag 5), rainfall (lag 2), relative humidity (lag 4), temperature (lag 2), wind speed (lag 4), and atmospheric pressure (lag 6). For GWR3, TB cases were associated with both sociodemographic and environmental factors, such as nationality, income status, residency, CO (lag 2), NO2 (lag 2), SO2 (lag 1), PM10 (lag 5), rainfall (lag 2), temperature (lag 2), and atmospheric pressure (lag 6). Therefore, MLR1 assessed the relationship between sociodemographic factors and TB cases, MLR2 assessed the relationship between environmental factors and TB cases, and MLR3 assessed the relationship between both sociodemographic and environmental factors, and TB cases. The MLR analysis was carried out using statistical analysis software SPSS version 23.0 for Windows (Rel. 11.5.0. 2002; SPSS Chicago, IL, USA).

Artificial neural network

The ANN is mainly structured in three layers, that is, the input, hidden, and output layers. The input layer represents the independent variables, the hidden layer represents the relationship developed between the input and output layers, and the output layer represents the dependent variables. In this study, a multilayer perceptron ANN was applied to further analyze the relationship between the independent and dependent variables to build the prediction model for TB cases.

In this study, the ANN consisted of three network architecture models based on the same independent variables used in the GWR models that had been developed in a previous study.[50] With reference to the three GWR models, that is, GWR1, GWR2, and GWR3, three ANN models were constructed, namely ANN1, ANN2, and ANN3, respectively. Hence, ANN1 assessed the relationship between sociodemographic factors and TB cases, ANN2 assessed the relationship between environmental factors and TB cases, and ANN3 assessed the relationship between both sociodemographic and environmental factors, and TB cases. The analysis was performed using the AlyudaNeurointelligence 2.2 (Alyuda Research LLC, Cupertino, CA, USA).

This study systematically evaluated each model based on the combination of model parameters (independent variables) over a 60-month period, that is, January 2013 to December 2017. Firstly, the entire data were split into the following three subsets: (i) the network training dataset, which accounted for 60% of the data, n = 36; (ii) the network validation dataset, which accounted for 20% of the data, n = 12; and (iii) the network testing dataset, which accounted for 20% of the total data, n = 12. The monthly TB cases were used as the input data for model fitting. The purpose of the network training dataset was to learn and update the weights and biases in the network, whereas the network validation dataset was to avoid any overfitting of the models, and the network testing dataset was to evaluate the accuracy and predictive power of the network after the training process.

Secondly, all the input data underwent preprocessing before being inserted into the ANN models, which can help to strengthen the models' performance by converging at a faster speed. The input data were scaled into numbers and ranged from − 1 to 1, as shown in the following equation 1.

where xi is the actual value, Xmin is the minimum of the actual value, Xmax is the maximum of the actual value, and Xs is the respective scaled value.

Thirdly, the network architecture was designed using the forward stepwise feature selection method. The input data were inserted into the input layer; it went through the hidden layer, and subsequently passed to the output layer using simple mathematical operations. Prior to the training of the data, the network architecture for each input layer was identified as follows: (i) the input layer in ANN1 was determined as 7-x-1, (ii) the input layer in ANN2 was assigned as 10-x-1, and finally, (iii) the input layer in ANN3 was represented by 10-x-1. The number of hidden layers, or x, for the different input layers was selected according to the best fitness value of the network architecture. The logistical, that is, sigmoid function[51],[52] was selected as the activation function that linked the hidden layer to the output layer.

Fourthly, a search for architecture was carried out, where the feature mask with the best fitness value was selected. Network trainings were subsequently conducted on the training dataset using the quick propagation algorithm. This algorithm was chosen because quick propagation is a heuristic modification of the backpropagation algorithm, which makes it faster when compared to the standard backpropagation algorithm.[53] The network established the relationship between the input and output layers by adjusting their weights, which was performed iteratively by the training dataset. In the training process, the data inserted into the network were multiplied by its respective weights based on their effect on the dependent variables in a series of iterations. The weights were calibrated via the validation process of the neural network, in which the number of hidden units was identified and the declination of the predictive capability of the neural network was detected. The network was able to generalize what it had learned with the same attributes of the input for the testing dataset.

Finally, a predicted value was generated and compared with the actual values, thus producing an error in each training layer. This error was used in the network training, and the iterative process was conducted to achieve the minimum error.[54] The outputs were back-transformed to the input data scale before the evaluation metrics were used.

Sensitivity analysis

The contribution of the different input variables to the network outcome was determined using sensitivity analysis with the inbuilt importance function in the AlyudaNeurointelligence software. Specifically, a sensitivity analysis was able to assess the level of the relative importance of each input variable (independent variable) toward the output variable (dependent variable) in the fitted models. Each input variable for sociodemographic factors through ANN1, environmental factors through ANN2, and both sociodemographic and environmental factors through ANN3, was evaluated in making predictions of the TB cases. Each input variable was subsequently ranked according to the increasing percentage of importance; this means the higher the percent value, the more important the variable. The percentages of the importance of input variables compared to the output variable can be described with the following equation:[55]

where Wij represents the weights between neuron i (=1, 2,.., m) and the hidden layer j (=1, 2,…, n), and Vjk represents the weights between hidden neuron j and output neuron k (=1, 2,…, l).

Model comparison and evaluation metrics

To evaluate and compare the accuracy of the predictions of the MLR and ANN models, adjusted R2 was employed in predicting the number of TB cases. In assessing the quality of the prediction for each ANN model, two types of evaluation metrics were computed in this study: absolute error (equation 3) and testing accuracy (equation 4) of the training models. Then, the best model was selected. The absolute error is the average value of the absolute difference between the actual and predicted values, in which the lowest error values are considered a better-trained network. Furthermore, the accuracy of the prediction models used the min–max accuracy, by dividing the minimum average value with the maximum average value among the averaged predictions and actual values, and multiplying it by 100.[56] The minimum–maximum accuracy represents the deviation of the predicted values from the actual values; the perfect accuracy is 100%.

where absolute error represents absolute error for training, validation, and testing datasets.

where min is the minimum average value of the actual and predicted number of TB cases and max is the maximum average value of the actual and predicted number of TB cases.

  Results Top

Sociodemographic characteristics of tuberculosis cases and environmental factors

In total, 3325 TB patients diagnosed from January 2013 to December 2017 were recruited in the study. The months that had the lowest TB cases were as follows: (i) December for 2013 (39 cases), 2014 (42 cases), and 2017 (48 cases); (ii) February and December for 2015 (41 cases); and (iii) July for 2016 (38 cases). Comparatively, the months that had the highest TB cases were as follows: (i) July for 2013 (61 cases); (ii) April for 2014 (71 cases) and 2015 (70 cases); and (iii) November for 2016 (65 cases) and 2017 (81 cases). The monthly TB cases fluctuated across the study period, with the minimum and maximum cases of TB occurring in July 2015 and November 2017, respectively. A moderate temporal pattern of TB cases was observed in Gombak over the 60-month period [Figure 3].
Figure 3: Reported monthly cases of tuberculosis in Gombak district, January 2013 to December 2017

Click here to view

Annually, the TB cases in Gombak from 2013 to 2017 were 574, 654, 679, 618, and 700 cases, accounting for 17.80%, 20.28%, 21.05%, 19.16%, and 21.71%, respectively, of the total cases during 5-year period. The years that had the minimum and maximum cases of TB were 2013 and 2017, respectively. Yearly TB cases generally presented a gradually increasing trend over the study period, with the average annual cases being 645 cases. The trend increased gradually from 2013 to 2015 and slightly decreased from 2015 to 2016, and reached a temporal peak in 2017 [Figure 4].
Figure 4: Reported yearly cases of tuberculosis in Gombak district, 2013–2017

Click here to view

The high TB cases' areas were mainly concentrated in Batu mukim, that is, 1466 cases whereas the low TB cases were in Setapak mukim, that is, 236 cases. To visualize the spatial trend, this study plotted the geographical distribution of TB cases between 2013 and 2017 at the mukim level in Gombak district [Figure 5]. Specifically, the highest number of TB cases was distributed in the southern part of the district, which is the most highly populated area. Furthermore, the lowest number of TB cases was consistently observed in the north-eastern part of the district, which is the hilly area with lots of forest. The distribution of disease demonstrated a relatively consistent trend in space and time.
Figure 5: Geographical distribution of tuberculosis cases in Gombak, 2013–2017

Click here to view

To increase the understanding on the basic characteristics of the reported TB cases during 2013–2017, descriptive statistics for the sociodemographic characteristic of TB cases were analyzed. As shown in [Table 1], higher TB cases were of male, that is, 65.40% and were of Malaysian nationality, that is, 86.73%. Most cases were categorized as unemployed, that is, 50.20% and not have a permanent income, that is, 60.16%. Only 2.29% of the TB cases were of health-care workers and 90.76% lived in urban areas. Furthermore, there was a predominance of nonsmokers, that is, 72.37%.
Table 1: Sociodemographic characteristics of tuberculosis cases in Gombak district, January 2013 to December 2017

Click here to view

The characteristics of air pollution and weather factors in this study over a 66-month period are summarized in [Table 2]. The monthly mean (range) of AQI was 60.17 (27.34–82.20), whereas the concentration of CO was 1.26 (0.90–1.85) ppm, that of NO2 was 0.03 (0.02–0.04) ppm, that of SO2 was 0.004 (0.002–0.02) ppm, and that of PM10 was 55.96 (32.70–99.13) μ/m3. The monthly median (interquartile range) was 64.04 (55.12–69.39) for AQI, whereas the concentration of pollutants was 1.22 (1.11–1.30) ppm for CO, 0.03 (0.03–0.04) ppm for NO2, 0.003 (0.003–0.005) ppm for SO2, and 53.48 (45.04–65.49) μ/m3 for PM10.
Table 2: Monthly air pollution and weather factors in Gombak district, July 2012 to December 2017

Click here to view

Relationship between sociodemographic and environmental factors with tuberculosis cases

TB prediction models were generated using MLR, that is, MLR1, MLR2, and MLR3 [Table 3] and [Table 4] and ANN, that is, ANN1, ANN2, and ANN3 [Table 4]. For sociodemographic factors, the models were referred to as MLR1 and ANN1, whereas environmental factors were represented as MLR2 and ANN2, and both sociodemographic and environmental factors together were indicated as MLR3 and ANN3. The overall results showed that the highest adjusted R2 value was identified through the ANN which was ANN1, that is, 0.35; ANN2, that is, 0.38; and ANN3, that is, 0.47. Regardless of the combinatory set of input variables, the MLR consistently generated models with the lowest adjusted R2 value through the MLR1, which was 0.14; MLR2, which was 0.09; and MLR3, which was 0.07.
Table 3: Multiple linear regression equation between sociodemographic and environmental factors with number of tuberculosis cases

Click here to view
Table 4: Comparison of adjusted R2 values obtained from multiple linear regression and the artificial neural of tuberculosis cases

Click here to view

Modeling algorithms

The ANN models were compared and designated as follows: ANN1 contained seven hidden layers, ANN2 contained one hidden layer, and ANN3 contained two hidden layers. The overall results demonstrated that the lowest values of absolute error for the training and testing were detected using ANN3, which were 3.03 and. 5.71, whereas for validation, they were identified using ANN1, that is, 4.36. Conversely, the highest value of absolute error for the training was detected using ANN1, that is, 4.12, whereas those for validation and testing were observed using ANN2, that is, 5.75 and 7.98, respectively. In short, the sequence from the lowest to the highest value of absolute errors between each model for the training was ANN3 < ANN2 < ANN1, while for the validation, it was ANN1 < ANN3 < ANN2 and for the testing, it was ANN3 < ANN1 < ANN2.

In general, the results demonstrated that the highest prediction accuracies for the training and testing were observed using ANN2, and were 99.99% and 98.22%; those for the validation were identified using ANN3, that is, 99.91%. On the other hand, the lowest prediction accuracies for the training were observed using ANN3, that is, 99.36%, while those for validation were identified using ANN2, that is, 95.05%, and those for testing were found using ANN1, that is, 94.82%. Briefly, the sequence from the highest to lowest prediction accuracies between each model for the training was ANN2 > ANN1 > ANN3, while for the validation, it was ANN3 > ANN1 > ANN2, and for testing, it was ANN2 > ANN3 > ANN1. All the ANN models had very high training, validation, and testing accuracies, above 94%. The comparison of the performance of the trained quick propagation algorithm is illustrated in [Table 5].
Table 5: Comparison of model performances in predicting tuberculosis cases

Click here to view

ANN1 performed better in predicting TB cases than ANN2, but the best was ANN3, which had the value of absolute error lower than 6 and prediction accuracy higher than 96%. Hence, ANN3 was selected as the most powerful predictor. The results revealed that nationality, residency, income status, CO (lag 2), NO2 (lag 2), SO2 (lag 1), PM10 (lag 5), rainfall (lag 2), temperature (lag 2), and atmospheric pressure (lag 6) were the most efficient combinatory set of variables to predict the TB cases, with a network architecture of 10-2-1. For ANN1, the predicted numbers of TB cases for years 2013, 2014, 2015, 2016, and 2017 were 562, 675, 663, 639, and 722; the predicted numbers of TB cases in the same time period were 566, 650, 702, 609, and 741 for ANN2, and 572, 646, 646, 629, and 697 for ANN3, respectively. The plotting graph for the prediction of monthly TB cases using each model is presented in [Figure 6].
Figure 6: Actual and predicted number of tuberculosis cases in Gombak for ANN1, ANN2, and ANN3 models, January 2013 to December 2017

Click here to view

The sensitivity analysis [Figure 7] revealed the ranking of the important input variables for the TB case prediction of each model in descending order, that is, from most important to least important, as follows: (i) ANN1: gender, smoking status, residency, nationality, income status, employment status, and health-care worker status; (ii) ANN2: SO2 (lag 1), NO2 (lag 2), temperature (lag 2), AQI (lag 1), CO (lag 2), wind speed (lag 4), relative humidity (lag 4), PM10 (lag 5), rainfall (lag 2), and atmospheric pressure (lag 6); and (iii) ANN3: SO2 (lag 1), CO (lag 2), rainfall (lag 2), residency, NO2 (lag 2), nationality, temperature (lag 2), income status, atmospheric pressure (lag 6), and PM10 (lag 5). In general, the most important input variables were gender for ANN1, that is, 22% and SO2 (lag 1) for ANN2, that is, 36% and ANN3, that is, 32%, while the least important input variables were health-care worker status for ANN1, that is, 8.53%, atmospheric pressure for ANN2, that is, 0.88%, and PM10 (lag5) for ANN3, that is, 1.37%.
Figure 7: Importance value of input variables for ANN1, ANN2, and ANN3 models

Click here to view

  Discussion Top

Geographical distribution of TB cases in Gombak revealed the presence of spatial heterogeneity. The spatial analysis of TB is an essential approach to assess the spread of the disease in specific geographic areas with high number of cases. Using GIS, the spatial data could be analyzed and interpreted, which helps health-care authorities in establishing more effective control strategies and in the monitoring and surveillance of TB transmission at the high-risk areas.[57] Furthermore, it is crucial to determine the factors influencing TB cases, whereby the spatial variability of each independent variable with TB cases was identified by GWR. Therefore, it is crucial to conduct geographically based screening on independent variables and use it as the datasets to develop prediction models of TB cases.[58] The capability of geospatial model, that is, GWR models to extract the datasets and fit in the ANN models makes it as one of the new prospects in the field of spatial epidemiology and machine learning approaches.

The lower adjusted R2 values for the MLR models, together with higher adjusted R2 values and accuracies for the ANN models, have confirmed that the relationship between sociodemographic and environmental factors with TB cases was nonlinear and the distribution of input variables was nonnormal. MLR analysis essentially predicts the relationship between independent and dependent variables by assuming that their relationships are linear. Therefore, this principle limits the capability of MLR to predict nonlinear relationships efficiently. The ANN, contrarily, is more superior in assessing large datasets with multivariate interactions and complex patterns.[59],[60] This could be attributed to the ability of the ANN to learn the data and construct models via adjustment of weight and without initial assumptions of linearity for the datasets.

The nonlinear information was extracted using the ANN models, that is, ANN1, ANN2, and ANN3, and each model displayed different capabilities in predicting TB cases according to the different characteristics of the dataset. The lowest absolute error and highest accuracy of the ANN architecture were improved when both the sociodemographic and environmental factors together were chosen as the independent variable, that is, the 10-2-1 architecture through ANN3. Thus, ANN3 exhibited the most powerful predictive capability, followed by ANN1 and ANN2. This suggests that a combination of sociodemographic and environmental factors is the most important predictor, and is highly associated with the occurrence of TB in Gombak, compared to sociodemographic and environmental factors individually. The values of absolute error were the highest in validation and testing using ANN2, and the prediction accuracy was the lowest for validation. It is envisaged that ANN2 was the weakest model in predicting the number of TB cases; hence, this implies low reliability. The amplification of the values of absolute errors between training and validation was related to the overfitting of the training model, probably from the high number of iterations and the small size of the training datasets.[61] Nevertheless, when assessing based on the error values, the overfitting could be considered inconsequential; therefore, the prediction models were stable and reliable.

The sensitivity analysis suggests that for targeting mitigation measures, priority consideration should be given to gender through ANN1 and SO2 (lag 1) through ANN2 and ANN3. Gender has been found to be an important factor that influences the prolonging of the duration of the cure period for TB patients in the studies of Nazar et al.,[62] and influences the risk of death, according to Kosgei et al.[63] In a modeling study of the short-term and long-term effects of pollutants, Liu et al.[64] found that SO2 was significantly associated with new TB infections, recurrent TB risk, and mortality. In a study that analyzed the relationship between SO2 and TB incidence, Yang et al.[65] observed that the number of TB cases increased by 0.08% when increment in the concentration of SO2 was by 1 μg/m3. Interestingly, the importance ranking of the SO2 (lag 1) was the same for ANN2 and ANN3, whereby it was the highest for these two models. This indicates that when the SO2 (lag 1) is fed into a model, this pollutant is the strongest contributor in the prediction of TB cases when compared to the other input variables.

Accordingly, the findings of the sensitivity analysis, outlined here in the importance ranking of the input variables, may lay the ground for targeted surveillance if the ANN approach to TB prediction developed in this study is not to be adopted by public health agencies. The ranking of the contribution of each input variable could also be used in future studies to estimate which variables could be ignored safely in the next analyses, which were less important predictors when the variables were removed from the model, as well as which essential variables must be maintained, which were more important predictors when the variables were removed from the model. The excluded variable in the fitted model that generates the highest mean absolute error indicates that it makes the maximum contribution in predicting TB cases, whereas the excluded variable that generates the lowest mean absolute error indicates that it makes the minimum contribution.

Many epidemiological studies have showed the association between sociodemographic factors with TB cases. Men commonly have higher risk to be infected with TB. The possible explanations include men majorly involve in alcohol consumption, smoking, and drug consumption, and are more likely to be incarcerated as compared to women. TB cases are usually concentrated in urban residences which benefit for transport services and high job opportunities, thus increasing the growth of population. Furthermore, patients living in urban residences are more likely to have complex social factors such as homelessness, incarceration, and drug abuse problem, and therefore could be related with the failure to seek for TB treatment.[66] Although this study found higher number of TB cases among Malaysian population, the presence of non-Malaysians infected with TB should not be neglected. Some of the illegal workers from high TB burden countries did not register for TB treatment in any health-care services in Malaysia because they do not have proper documentation and permits.[67] Living in the crowded housing condition particularly at their rented houses and makeshift shelters at the construction sites could contribute to the spread of TB. For example, Indonesian and Bangladeshi communities in Sungai Buluh and Rawang mostly live in squatters' settlements which are near their workplace.[68]

Unemployment is usually associated with unsuccessful TB treatment. Their low socioeconomic status prevents them from complying with follow-up appointments at the hospitals or clinics due to the cost of medical supplies and transportation. In addition, some of the low income groups are more likely to seek care in using traditional approaches which are cheaper. Hence, their financial burden stops them from seeking proper health-care services. Failure to ensure the actual occupation for the TB patients is one of the limitations for the data reported by the MOH, Malaysia. Some of the occupations such as sex work have confidentiality issues.[69] Patients might be reluctant to inform the health officers about their symptom due to their life-threatening consequences. Health-care workers such as doctors, nurses, and medical assistants have direct contact with TB patients; however, other staffs in health-care settings are also could be exposed to the infection. In 2012, the MOHM implemented a policy to screen all the health-care workers in the public sectors. Health-care workers who are exposed to patients with suspected or confirmed TB disease or dealing with specimen for TB diagnosis are advised to seek for TB screening.[2] The main reason for this occupational TB infection could be low awareness and poor commitment to practice and follow the guidelines of infection control practices in their workplace setting.[70] A person who smokes one packet per day, that is, 20 cigarettes may result in a daily inhalational iron exposure of about 6 μg, in which iron loading in the alveolar macrophages enhances the more susceptible condition to the growth of Mycobacterium tuberculosis.[71] Furthermore, smoking can reduce the immune response of pulmonary lymphocytes, impair mucociliary clearance, reduce the cytotoxic activity of natural killer cells, and modify pulmonary dendritic cell activity.[72] Improving compliance among smoking TB patients is a great challenge and should be addressed by support from families as well as by providing smoking cessation interventions.

Environmental exposure to adverse air pollution and weather is an important factor, but it is an underappreciated risk factor that may enhance the development of TB. A growing body of evidence from previous epidemiological studies has showing an association between environmental exposure and risk of TB.[73],[74],[75],[76] Nevertheless, evidence on the effect of a complex combination of environmental variables on TB cases is still limited. Machine learning techniques offer opportunities for developing algorithms that classify dependent variable through complex interactions among the independent variables. As mentioned previously, the contribution of AQI (lag 1), CO (lag 2), NO2 (lag 2), SO2 (lag 1), PM10 (lag 5), rainfall (lag 2), relative humidity (lag 4), temperature (lag 2), wind speed (lag 4), and atmospheric pressure (lag 6) was successfully modeled in ANN2 with removing AQI (lag 1), relative humidity (lag 4), and wind speed (lag 4), and the addition of nationality, residency, and income status was efficient in predicting TB cases in ANN3. This suggests that environmental factors are also important predictors and highly associated with the occurrence of TB in Gombak. Therefore, TB prevention and control will be more successful when local environmental exposures are taken into consideration.

Owing to the lack of research studies on infectious disease prediction models in Malaysia, this study serves as the basis for the improvement of future national mitigation programs. The accurate prediction of TB cases will be deemed successful when the predicted values are not much different than the actual values once intervention among the respective population is tested. In general, the newly developed models offer a significant contribution to the targeted strategies to control the transmission of TB in the Gombak district based on the inclusion of local data characteristics. If successful, these advantages could also be expanded to other regions with similar characteristics.

Nevertheless, despite the effectiveness of ANN in constructing various prediction models that represent a broad variety of perspectives, several limitations of the new models were observed in this study. Firstly, the models were developed using data from 2013 to 2017. Hence, the time frame should be extended to at least a 10-year period or more to obtain a stronger mathematical model and thus, develop a better understanding of the impact of sociodemographic and environmental factors with TB cases. Secondly, apart from sociodemographic and environmental factors, TB cases are also affected by many other factors, including sociological and immunological factors. Thus, including these factors may greatly improve the efficacy of the prediction model. Thirdly, as this study has only focused on TB cases in Gombak, additional studies are required to confirm whether the ANN1, ANN2, and ANN3 models are suitable for other areas. It is also recommended that studies be performed at smaller geographical scales, such as at the state or in the country for more dominant interventions. Finally, the study unit can be improved by aggregated population group as compared to individual level (per TB patient), in which the findings can be used to predict the number of TB cases on a spatial basis, that is, at different locations compared to temporal basis as shown in this study.

  Conclusions Top

This study found that the ANN was better in evaluating the relationship between the associated risk factors, that is, sociodemographic and environmental factors, and the number of TB cases compared to the regression analysis. The newly developed models, ANN1, ANN2, and ANN3, are potentially novel and valid tools that can assist the respective public health authorities in providing better control over the ongoing transmission of TB across Gombak. ANN3, in consideration of sociodemographic and environmental factors, revealed the best performance in predicting TB cases. By providing the ranking of the importance value for predictors using ANN1 models, mitigation actions could be taken at their earliest, and the authorities would thus be able to strategize for the most effective controlling method, one that would save cost, time, and workforce.

Ethical clearance

The use of secondary data in this study was approved by the Medical Research and Ethics Committee (MREC) of the Ministry of Health Malaysia and registered under the National Medical Research Registry (NMRR-17-3029-39236).

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Global Tuberculosis Report 2019. Geneva: World Health Organization, Licence: CC BY-NC-SA 3.0 IGO; 2020. Available from: https://www.who.int/teams/global-tuberculosis-programme/tb-reports. [Last accessed on 2021 Mar 15].  Back to cited text no. 1
Ministry of Health Malaysia. Annual Report 2018: Tuberculosis Control Programme in Malaysia. Ministry of Health; 2018.  Back to cited text no. 2
Ben Jmaa M, Ben Ayed H, Koubaa M, Hammami F, Damak J, Ben Jemaa M. Is there gender inequality in the epidemiological profile of tuberculosis? Tunis Med 2020;98:232-40.  Back to cited text no. 3
Tok PS, Liew SM, Wong LP, Razali A, Loganathan T, Chinna K, et al. Determinants of unsuccessful treatment outcomes and mortality among tuberculosis patients in Malaysia: A registry-based cohort study. PLoS One 2020;15:e0231986.  Back to cited text no. 4
Sweeney S, Vassall A, Guinness L, Siapka M, Chimbindi N, Mudzengi D, et al. Examining approaches to estimate the prevalence of catastrophic costs due to tuberculosis from small-scale studies in South Africa. Pharmacoeconomics 2020;38:619-31.  Back to cited text no. 5
Chia SZ, How KB, Chlebicki MP, Ling ML, Gan WH. A retrospective review of tuberculosis exposure among health care workers in a tertiary hospital. Am J Infect Control 2020;48:650-5.  Back to cited text no. 6
Goroh MM, van den Boogaard CH, Ibrahim MY, Tha NO, Swe, Robinson F, et al. Factors affecting continued participation in tuberculosis contact investigation in a low-income, high-burden setting. Trop Med Infect Dis 2020;5:124.  Back to cited text no. 7
Singh H, Ramamohan V. A model-based investigation into urban-rural disparities in tuberculosis treatment outcomes under the Revised National Tuberculosis Control Programme in India. PLoS One 2020;15:e0228712.  Back to cited text no. 8
Adegbite BR, Edoa JR, Achimi Agbo P, Dejon-Agobé JC, N Essone P, Lotola-Mougeni F, et al. Epidemiological, mycobacteriological, and clinical characteristics of smoking pulmonary tuberculosis patients, in Lambaréné, Gabon: A cross-sectional study. Am J Trop Med Hyg 2020;103:2501-5.  Back to cited text no. 9
Huang K, Yang XJ, Hu CY, Ding K, Jiang W, Hua XG, et al. Short-term effect of ambient temperature change on the risk of tuberculosis admissions: Assessments of two exposure metrics. Environ Res 2020;189:109900.  Back to cited text no. 10
Yang J, Zhang M, Chen Y, Ma L, Yadikaer R, Lu Y, et al. A study on the relationship between air pollution and pulmonary tuberculosis based on the general additive model in Wulumuqi, China. Int J Infect Dis 2020;96:42-7.  Back to cited text no. 11
Kuddus MA, McBryde ES, Adegboye OA. Delay effect and burden of weather-related tuberculosis cases in Rajshahi province, Bangladesh, 2007-2012. Sci Rep 2019;9:12720.  Back to cited text no. 12
Ghazvini K, Yousefi M, Firoozeh F, Mansouri S. Predictors of tuberculosis: Application of a logistic regression model. Gene Rep 2019;17:100527.  Back to cited text no. 13
Jue W. Prediction model of pulmonary tuberculosis based on gray kernel AR-SVM model. Cluster Comput 2019;22:4383-7.  Back to cited text no. 14
Liao Z, Zhang X, Zhang Y, Peng D. Seasonality and trend forecasting of tuberculosis incidence in Chongqing, China. Interdiscip Sci 2019;11:77-85.  Back to cited text no. 15
Rath S, Tripathy A, Tripathy AR. Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab Syndr 2020;14:1467-74.  Back to cited text no. 16
Mesquita DP, Gomes JP, Rodrigues LR. Artificial neural networks with random weights for incomplete datasets. Neural Process Lett 2019;50:2345-72.  Back to cited text no. 17
Wu RT, Jahanshahi MR. Deep convolutional neural network for structural dynamic response estimation and system identification. J Eng Mech 2019;145:04018125.  Back to cited text no. 18
Trägårdh E, Borrelli P, Kaboteh R, Gillberg T, Ulén J, Enqvist O, et al. RECOMIA-a cloud-based platform for artificial intelligence research in nuclear medicine and radiology. EJNMMI Phys 2020;7:51.  Back to cited text no. 19
Uttam S, Stern AM, Sevinsky CJ, Furman S, Pullara F, Spagnolo D, et al. Spatial domain analysis predicts risk of colorectal cancer recurrence and infers associated tumor microenvironment networks. Nat Commun 2020;11:3515.  Back to cited text no. 20
Festin PJ, Cortez RS, Villaverde JF. Non-Invasive Detection of Diabetes Mellitus by Tongue Diagnosis Using Convolutional Neural Network: In Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology; 2020. p. 135-9.  Back to cited text no. 21
Saba AI, Elsheikh AH. Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Saf Environ Prot 2020;141:1-8.  Back to cited text no. 22
Ahn H. Artificial intelligence method to classify ophthalmic emergency severity based on symptoms: A validation study. BMJ Open 2020;10:e037161.  Back to cited text no. 23
Abbasi T, Luithui C, Abbasi SA. A model to forecast methane emissions from topical and subtropical reservoirs on the basis of artificial neural networks. Water 2020;12:145.  Back to cited text no. 24
Lin H, Dai Q, Zheng L, Hong H, Deng W, Wu F. Radial basis function artificial neural network able to accurately predict disinfection by-product levels in tap water: Taking haloacetic acids as a case study. Chemosphere 2020;248:125999.  Back to cited text no. 25
Florence SE, Samsingh RV, Babureddy V. Artificial intelligence based defect classification for weld joints. IOP Conf Ser Mater Sci Eng 2018;402:012159.  Back to cited text no. 26
Kaafjeld F, Engebretsen E. EPIC: Neural networking by design. Offshore Eng 2017;42:26-7.  Back to cited text no. 27
Muhi SH, Abdullah HN, Abd BH. Modeling for predicting the severity of hepatitis based on artificial neural networks. Int J Intell Eng Syst 2020;13:154-66.  Back to cited text no. 28
Ismail M, Vardhan VH, Mounika VA, Padmini KS. An effective heart disease prediction method using artificial neural network. Int J Innov Technol Exploring Eng 2019;8:1529-32.  Back to cited text no. 29
Tong Z, Liu Y, Ma H, Zhang J, Lin B, Bao X, et al. Development, validation and comparison of artificial neural network models and logistic regression models predicting survival of unresectable pancreatic cancer. Front Bioeng Biotechnol 2020;8:196.  Back to cited text no. 30
Lai NH, Shen WC, Lee CN, Chang JC, Hsu MC, Kuo LN, et al. Comparison of the predictive outcomes for anti-tuberculosis drug-induced hepatotoxicity by different machine learning techniques. Comput Methods Programs Biomed 2020;188:105307.  Back to cited text no. 31
Mollalo A, Sadeghian A, Israel GD, Rashidi P, Sofizadeh A, Glass GE. Machine learning approaches in GIS-based ecological modeling of the sand fly Phlebotomus papatasi, a vector of zoonotic cutaneous leishmaniasis in Golestan province, Iran. Acta Trop 2018;188:187-94.  Back to cited text no. 32
Wang W, Guo W, Cai J, Guo W, Liu R, Liu X, et al. Epidemiological characteristics of tuberculosis and effects of meteorological factors and air pollutants on tuberculosis in Shijiazhuang, China: A distribution lag non-linear analysis. Environ Res 2021;195:110310.  Back to cited text no. 33
Li Z, Li Y. A comparative study on the prediction of the BP artificial neural network model and the ARIMA model in the incidence of AIDS. BMC Med Inform Decis Mak 2020;20:1-3.  Back to cited text no. 34
Census, 2018. Population Distribution by Local Authority Areas and Mukims. Department of Statistics Malaysia; 2019. Available from: https://www.selangor.gov.my. [Last accessed on 2020 Jul 09].  Back to cited text no. 35
Malaysia Meteorological Department. General Climate of Malaysia. Ministry of Science, Technology and Innovation, Kuala Lumpur; 2018. Available from: http://www.met.gov.my/. [Last accessed on 2019 Apr 29].  Back to cited text no. 36
Nur HA, Choy L. Analysis of land use and land cover changes in Gombak, Selangor using remote sensing data. Sains Malaysiana 2016;45:1869-77.  Back to cited text no. 37
Land and District Office of Gombak; 2020. Available from: https://www.selangor.gov.my/. [Last accessed on 2020 Jul 09].  Back to cited text no. 38
Ministry of Health Malaysia. Annual Report 2018: TB Control Programme in Malaysia. Kuala Lumpur: Ministry of Health Malaysia; 2019.  Back to cited text no. 39
Li Z, Mao X, Liu Q, Song H, Ji Y, Xu D, et al. Long-term effect of exposure to ambient air pollution on the risk of active tuberculosis. Int J Infect Dis 2019;87:177-84.  Back to cited text no. 40
Li XX, Wang LX, Zhang H, Du X, Jiang SW, Shen T, et al. Seasonal variations in notification of active tuberculosis cases in China, 2005-2012. PLoS One 2013;8:e68102.  Back to cited text no. 41
You S, Tong YW, Neoh KG, Dai Y, Wang CH. On the association between outdoor PM2.5 concentration and the seasonality of tuberculosis for Beijing and Hong Kong. Environ Pollut 2016;218:1170-9.  Back to cited text no. 42
Leung CC, Yew WW, Chan TY, Tam CM, Chan CY, Chan CK, et al. Seasonal pattern of tuberculosis in Hong Kong. Int J Epidemiol 2005;34:924-30.  Back to cited text no. 43
Asgharinia S, Petroselli A. A comparison of statistical methods for evaluating missing data of monitoring wells in the Kazeroun Plain, Fars Province, Iran. Groundw Sustain Dev 2020;10:100294.  Back to cited text no. 44
Hassim M, Yuzir A, Razali MN, Ros FC, Chow MF, Othman F. Comparison of rainfall interpolation methods in Langat River Basin. IOP Conf Ser Earth Environ Sci 2020;479-490;012018.  Back to cited text no. 45
Kriege DG. Two Dimensional Weighted Average Trend Surfaces for Ore Evaluation. In Proceedings Journal of the Southern African Institute of Mining and Metallurg, Proceedings Symposium Mathematical Statistics and Computer; 1966. p. 7-8.  Back to cited text no. 46
ESRI, 2016. Arcview GIS: The Geographic Information System for Interpolation. Environmental System Research, California; 2016.  Back to cited text no. 47
Puente CE, Bras RL. Disjunctive kriging, universal kriging, or no kriging: Small sample results with simulated fields. Math Geol 1986;18:287-305.  Back to cited text no. 48
Li J, Heap AD. A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecol Inform 2011;6:228-41.  Back to cited text no. 49
Mohidem NA, Osman M, Hashim Z, Muharam FM, Mohd Elias S, Shaharudin R. Association of sociodemographic and environmental factors with spatial distribution of tuberculosis cases in Gombak, Selangor, Malaysia. PLoS One 2021;16:e0252146.  Back to cited text no. 50
Karlik B, Olgac AV. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int J Artif Intell Expert Syst 2011;1:111-22.  Back to cited text no. 51
Zadeh MR, Amin S, Khalili D, Singh VP. Daily outflow prediction by multi layer perceptron with logistic sigmoid and tangent sigmoid activation functions. Water Resour Manag 2010;24:2673-88.  Back to cited text no. 52
Ghaffari A, Abdollahi H, Khoshayand MR, Bozchalooi IS, Dadgar A, Rafiee-Tehrani M. Performance comparison of neural network training algorithms in modeling of bimodal drug delivery. Int J Pharm 2006;327:126-38.  Back to cited text no. 53
Ahmadi P, Muharam FM, Ahmad K, Mansor S, Abu Seman I. Early detection of ganoderma basal stem rot of oil palms using artificial neural network spectral analysis. Plant Dis 2017;101:1009-16.  Back to cited text no. 54
Song K, Park YS, Zheng F, Kang H. The application of artificial neural network (ANN) model to the simulation of denitrification rates in mesocosm-scale wetlands. Ecol Inform 2013;16:10-6.  Back to cited text no. 55
Bitetti R. Simple Linear Regression: A Case Study in R 2018. Available from: https://rpubs.com/bitettir/simpleregression. [Last accessed on 2021 Apr 23].  Back to cited text no. 56
Selmane S, L'hadj M. Spatiotemporal analysis and seasonality of tuberculosis in Algeria. Int J Mycobacteriol 2021;10:234-42.  Back to cited text no. 57
[PUBMED]  [Full text]  
Hoffner S, Hadadi M, Rajaei E, Farnia P, Ahmadi M, Jaberansari Z, et al. Geographic characterization of the tuberculosis epidemiology in Iran using a geographical information system. Biomed Biotechnol Res 2018;2:213.  Back to cited text no. 58
Zare Abyaneh H. Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters. J Environ Health Sci Eng 2014;12:40.  Back to cited text no. 59
Anwar A, Mikami Y. Comparing accuracy performance of ANN, MLR, and GARCH model in predicting time deposit return of Islamic Bank. Int J Trade Econ Finance 2011;2:44-51.  Back to cited text no. 60
Tuite C, Agapitos A, O'Neill M, Brabazon A. A Preliminary Investigation of Overfitting in Evolutionary Driven Model Induction: Implications for Financial Modelling. In European Conference on the Applications of Evolutionary Computation. Vol. 27. Berlin, Heidelberg: Springer; 2011. p. 120-30.  Back to cited text no. 61
Nazar E, Baghishani H, Doosti H, Ghavami V, Aryan E, Nasehi M, et al. Bayesian spatial survival analysis of duration to cure among new smear-positive pulmonary tuberculosis (PTB) patients in Iran, during 2011-2018. Int J Environ Res Public Health 2020;18:54.  Back to cited text no. 62
Kosgei RJ, Callens S, Gichangi P, Temmerman M, Kihara AB, David G, et al. Gender difference in mortality among pulmonary tuberculosis HIV co-infected adults aged 15-49 years in Kenya. PLoS One 2020;15:e0243977.  Back to cited text no. 63
Liu Y, Zhao S, Li Y, Song W, Yu C, Gao L, et al. Effect of ambient air pollution on tuberculosis risks and mortality in Shandong, China: A multi-city modeling study of the short- and long-term effects of pollutants. Environ Sci Pollut Res Int 2021;28:27757-68.  Back to cited text no. 64
Yang J, Zhang M, Chen Y, Ma L, Yadikaer R, Lu Y, et al. A study on the relationship between air pollution and pulmonary tuberculosis based on the general additive model in Wulumuqi, China. J Glob Infect Dis 2020;96:42-7.  Back to cited text no. 65
Torres M, Carranza C, Sarkar S, Gonzalez Y, Osornio Vargas A, Black K, et al. Urban airborne particle exposure impairs human lung and blood Mycobacterium tuberculosis immunity. Thorax 2019;74:675-83.  Back to cited text no. 66
Dollah R, Abdullah K. The securitization of migrant workers in Sabah, Malaysia. J Int Migr Integr 2018;19:717-35.  Back to cited text no. 67
Ariffin F, Ahmad Zubaidi AZ, Md Yasin M, Ishak R. Management of pulmonary tuberculosis in health clinics in the Gombak district: How are we doing so far? Malays Fam Physician 2015;10:26-33.  Back to cited text no. 68
Suppiah PC, Kaur S, Arumugam N, Shanthi A. News coverage of foreign sex workers in Malaysia: A critical analysis. GEMA Online J Lang Stud 2019;19:136-152.  Back to cited text no. 69
Abebe G, Bonsa Z, Kebede W. Treatment outcomes and associated factors in tuberculosis patients at Jimma University Medical Center: A 5-year retrospective study. Int J Mycobacteriol 2019;8:35-41.  Back to cited text no. 70
[PUBMED]  [Full text]  
Zhang WZ, Butler JJ, Cloonan SM. Smoking-induced iron dysregulation in the lung. Free Radic Biol Med 2019;133:238-47.  Back to cited text no. 71
Alipour Fayez E, Moosavi SA, Kouranifar S, Delbandi AA, Teimourian S, Khoshmirsafa M, et al. The effect of smoking on latent tuberculosis infection susceptibility in high risk individuals in Iran. J Immunoassay Immunochem 2020;41:885-95.  Back to cited text no. 72
Denholm J. Seasonality, climate change and tuberculosis: New data and old lessons. Int J Tuberc Lung Dis 2020;24:469.  Back to cited text no. 73
Harries AD. Chronic kidney disease, tuberculosis and climate change. Int J Tuberc Lung Dis 2020;24:132-3.  Back to cited text no. 74
Zhang CY, Zhang A. Climate and air pollution alter incidence of tuberculosis in Beijing, China. Ann Epidemiol 2019;37:71-6.  Back to cited text no. 75
Zhang X, Ma SL, Liu ZD, He J. Correlation analysis of rubella incidence and meteorological variables based on Chinese medicine theory of Yunqi. Chin J Integr Med 2019;25:911-6.  Back to cited text no. 76


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]

  [Table 1], [Table 2], [Table 3], [Table 4], [Table 5]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Materials and Me...
Article Figures
Article Tables

 Article Access Statistics
    PDF Downloaded374    
    Comments [Add]    

Recommend this journal