Exploratory analysis of machine learning methods in predicting subsurface temperature and geothermal gradient of Northeastern United States

Geothermal scientists have used bottom-hole temperature data from extensive oil and gas well datasets to generate heat flow and temperature-at-depth maps to locate potential geothermally active regions. Considering that there are some uncertainties and simplifying assumptions associated with the current state of physics-based models, in this study, the applicability of several machine learning models is evaluated for predicting temperature-at-depth and geothermal gradient parameters. Through our exploratory analysis, it is found that XGBoost and Random Forest result in the highest accuracy for subsurface temperature prediction. Furthermore, we apply our model to regions around the sites to provide 2D continuous temperature maps at three different depths using XGBoost model, which can be used to locate prospective geothermally active regions. We also validate the proposed XGBoost and DNN models using an extra dataset containing measured temperature data along the depth for 58 wells in the state of West Virginia. Accuracy measures show that machine learning models are highly comparable to the physics-based model and can even outperform the thermal conductivity model. Also, a geothermal gradient map is derived for the whole region by fitting linear regression to the XGBoost-predicted temperatures along the depth. Finally, through our analysis, the most favorable geological locations are suggested for potential future geothermal developments.

flux and temperature-at-depth maps. Jordan et al. (2016) conducted a thorough analysis to explore the associated risks and potentials of prospective geothermal resources in the states of New York, Pennsylvania and West Virginia. Even though most geothermally active regions are located in the western United States (near Earth's tectonic plate boundaries), Jordan et al. (2016) showed that the stored energy in the low-temperature geothermal regions in the northeast could be utilized for many direct-use applications. Although Snyder et al. (2017) illustrated that myriad industrial and residential direct-use applications of geothermal energy could result in reduction of electricity consumption, there are not many geothermal sites in northeastern states due to a high financial risk. Heat flux and temperature-at-depth are two most important geothermal parameters, which have extensively been investigated through physics-based models.
In the previous geothermal studies, the generalized thermal conductivity model has been adopted to compute the heat flow associated with BHT data points (Blackwell and Richards 2010;Cornell University 2015;Frone and Blackwell 2010;Jordan et al. 2016;Stutz et al. 2012;Tester et al. 2006). To use this model, first the measured bottom-hole temperature is corrected (through various available correlations (Deming 1989)) and is used to calculate the temperature gradient through the following relation: Next, the geological formation thickness and thermal conductivity values are approximated at each well location's latitude and longitude mainly from Correlation of Stratigraphic Units of North America (COSUNA) (Childs 1985). Then, average thermal conductivity is calculated between surface and the well's depth (Stutz et al. 2012). Finally, the heat flux is calculated through the following equation: The above formula is oversimplified and only represents the main theoretical framework of the physics-based model, which is used in geothermal energy studies. Despite physics-based model's long-time applicability, they all have some underlying assumptions that could result in uncertainties and, therefore, inaccurate predictions. Some of the assumptions are explained by (Stutz et al. 2012) and (Blackwell and Richards 2010). In particular, there is no easy-to-use method to independently measure the heat flux parameter; it is only approximated through the thermal conductivity model using the BHT data as shown in Eq. (2).
In addition to the geothermal energy industry, subsurface temperature is an extremely important parameter in the oil and gas industry (Bassam et al. 2010;Forrest et al. 2005;Khan and Raza, 1986;Moses, 1961). Characteristics of hydrocarbons are greatly dependent on the temperature and they must be approximated to be used in reservoir and drilling simulations. In practice, it is common to use geothermal gradient maps to obtain the geothermal gradient value at the desired location and then calculate the subsurface temperature at the depth of interest (Forrest et al. 2005; Khan and Raza, 1986).
Machine learning and geostatistics have been used in the variety of applications to help investors make more confident decisions. Due to the inaccessible nature of the (2) Q s =k dT dz .
geothermal energy, there is a considerable amount of risk and uncertainty associated with the exploration (Witter et al. 2019), drilling (Lukawski et al. 2016) and production (Bloomquist et al. 2012). There are few comprehensive surveys that focused on analyzing the associated risks to provide insights about the potential of developing geothermal sites (Jordan et al. 2016;Young et al. 2010). Machine learning has been an emerging technology that helped the geothermal energy field in the mentioned stages (Assouline et al. 2019;Beardsmore 2014;Faulds et al. 2020;Rezvanbehbahani et al. 2017;Shi et al. 2021;Tut Haklidir and Haklidir 2020). In the next section, we briefly review the studies which applied machine learning successfully in the fields of geothermal exploration and drilling.

Exploration stage
Recent machine learning advancements in some of the closely related fields of geology and geoscience have tremendously helped the geothermal energy industry in the exploration and drilling stages. For example, applications of machine learning in characterization of geomechanical properties (Keynejad 2018), automated fault detection and interpretation (Ma et al. 2018;Zhang et al. 2014), geophysical data inversion (Araya-Polo et al. 2018) and categorizing different lithofacies (Hall 2016). Perozzi et al. (Perozzi et al. 2019) took it further and proposed machine learning schemes to accelerate geological interpretations (specifically from well-logs) and, consequently, reducing the geothermal exploration costs. Rezvanbehbahani et al. (2017) proposed a machine learning approach to estimate the geothermal heat flux (GHF) in Greenland using the global GHF data provided by the International Heat Flow Commission (Gosnold and Panda 2002). For modeling, Gradient Boosted Regression Tree method was used with an average 15% relative error, RMSE and r 2 of 0.14 and 0.75, respectively. In that study, even though the authors provided a preliminary map to annotate most favorable locations in Greenland in terms of geothermal potential, however, wellbore bottom-hole temperature data were not utilized. In another effort, machine learning was used to map very shallow geothermal potential (Assouline et al. 2019). In shallow depths, geothermal energy can be a very good source to provide thermal energy for residential areas (Vieira et.al. 2017). Assouline et al. used Radom Forrest to predict three important thermal variables that are crucial in analyzing the geothermal potential of the region. These variables include (1) temperature gradient, (2) thermal conductivity, and 3) thermal diffusivity throughout Switzerland.
Another interesting study was conducted which primarily focused on developing a probabilistic modeling approach to identify the underlying risks in the field of geothermal resource exploration and the application of machine learning in the geothermal energy industry (Beardsmore 2014). An open-source software was developed named "Obsidian" which is capable of joint inversion of numerous geophysical datasets with probabilistic outputs. This study had access to a rich dataset containing formation characteristics, local temperature info and multiple case studies located in different regions of Australia. In addition to 3D temperature-at-depth maps, they were able to generate a 3D probabilistic map where each given point represents the probability of having granite rock type. The combination of the two mentioned maps was intended to directly help investors choose the right depth, latitude and longitude with the highest success probability.

Drilling stage
After finding the prospective geothermally active regions, geothermal wells are drilled for production. Drilling stage can comprise up to 45% of the total cost of the geothermal project (Muhammad 2019). Machine learning has helped the industry to efficiently design this stage from different aspects. Drilling optimization considerations in geothermal wells can be categorized into (1) reducing drilling time and (2) minimizing operational failures. This subject is shared between geothermal and oil and gas industries where drilling operations are remarkably similar. There are myriad studies where machine learning techniques have successfully addressed the mentioned issues and provided reliable solutions to optimize the drilling stage (Barbosa et al. 2019;Hegde et al. 2020;Gray 2017, 2018;Noshi and Schubert 2018). Recently, the Department of Energy has funded a project with the theme of application of deep machine learning to optimize drilling operations (specifically for geothermal wells) which was awarded to Oregon State University with collaboration with one more US university, one DOE National Laboratory, in addition to four geothermal and oil and gas companies from Iceland, US and Norway (DOE, 2019). In the first-year report of this study, the major effort was made around four primary tasks (well data gathering, feature engineering, data repository development, and preliminary machine learning model testing). It was mainly found that more extensive data from bit life cycle and bottom-hole assembly (BHA) are needed to improve the machine learning models. Finally, they compared different machine and deep learning models to predict important drilling parameters and it was found that Random Forrest model outperforms others as number of inputs increases. There was an extra effort to include the lithological information (mainly from mud log data) by dummy encoding and text embedding to, potentially, increase the accuracy (Carbonari et al. 2021).
In this study, we provide an alternative solution of using machine learning methods for predicting subsurface temperature using BHT data from more than 20,750 oil and gas wells in the northeastern United States. Furthermore, the physics-based and machine learning models are compared through an extra dataset containing vertical temperature profile of 58 wells in the state of West Virginia. Finally, we provide the geothermal gradient map using the validated XGBoost model for the northeast region of the United States.

Case study
The Marcellus formation is one of the highest potential hydrocarbon prospects in the United States and is located throughout the northern Appalachian Basin. For several decades, thousands of wells have been drilled in this region which contain, at least one temperature measurement (usually at the final depth). For our analysis, we have used a dataset with raw and corrected BHT, surface temperature, well identification number (API), latitude, longitude, and geological setting information (including layer thickness and conductivity) and many other information from 20,750 oil and gas wells in the northeast. This dataset (Cornell University 2015) has been developed and reported as part of a DOE funded research grant led by Cornell University. In Fig. 1, we show the geospatial spread of the well locations (of the dataset). In the right plot, the scatter points are referred to 20,750 well locations of the main dataset and the shaded area depicts the region where temperature predictions are provided by our study. The left plot in Fig. 1 is a magnified view of the West Virginia state region where the blue points represent a new set of well locations where we had more than one temperature measurement for each well. In fact, for many wells, subsurface temperature measurements were available along hundreds of meters within the well. We primarily used this dataset for further verification of our geothermal gradient predictions.

Dataset-1 summary
In Table 1, a summary of important parameters (after outlier removal) is provided. We have used 55 features that are included in Table 1. Among the variables, the geological characteristics are included through the multiplication product of each formation conductivity and thickness (6-55). This is consistent with the thermal conductivity theory (Eq. (2)). At each well's latitude and longitude, there are up to 49 formation layers where each layer has specific thickness and conductivity.

Dataset-2 summary
We also exclusively gathered data for additional 58 wells across the West Virginia region (annotated by blue points on Fig. 1). In this dataset, for each well, temperature profile is provided within a depth interval (with the mean and standard deviation of 1167 and 511 m, respectively). We obtained this dataset from West Virginia Geological and Economical Survey (West Virginia Geological and Economical Survey Website n.d.). The digitized data were available in the LAS file format where temperature measurements (along with other geological parameters) were reported at different depths. We primarily used it for comparing our modeling results with those from the physics-based model. We refer to this source as the temperature-profile dataset throughout this paper. Among

BHT correction methods
For BHT correction, the authors (Jordan et al. 2016) divided the Appalachian Basin into three regions (West Virginia, Pennsylvania Rome Trough and Allegheny Plateau) and developed exclusive correction correlations based on available information at each of these regions (for example, in Allegheny Plateau region, information about drilling fluids were accessible to the authors in contrast to the West Virginia section where drilling fluid data were not available). For each region, a small set of equilibrium well-log temperature measurements were statistically evaluated and a new set of appropriate BHT corrections were proposed. In West Virginia region, a Generalized Least Square (GLS) regression model was fitted through Eq. (3). For Pennsylvania Rome Trough, no statistically significant relation was found with depth and therefore no adjustment was applied. Fortunately, for Allegheny Plateau, the drilling fluid data were available, and the correlation equations were proposed for different fluids as shown below.

Outlier removal approach
For preprocessing, we removed outliers (101 data points) using the common 3σ-rule method where data outside the three standard deviation are considered outliers (Lehmann 2013;Pukelsheim, 1994;Watanabe et al. 2019) using the heat flux parameter (Fig. 2).
The reported temperatures in the temperature-profile dataset are prone to errors and we were required to correct them. Even though there are myriad temperature-correction methods, we decided to use the correction methodology reported by (Jordan et al. 2016) to be consistent with their method. This allowed us to compare our results to those reported by the physics-based model in (Jordan et al. 2016). Since all wells in the temperature-profile dataset are located in the West Virginia region, we decided to use Eq. (3).

Machine learning models
In this section, we provide a thorough summary of the machine learning models that have been used in this study to estimate subsurface temperature and geothermal gradient. We decided to use multiple algorithms to train our regression models, including Deep Neural Networks (DNN), Ridge regression (R-reg) models and decision-tree-based models (e.g., XGBoost and Random Forest).
In this paper, we compare the results of four machine learning algorithms. These algorithms are different in nature and it is extremely important to appropriately compare their accuracies and errors. For each algorithm, we primarily focused on developing (4) �T Alle. Pt. Air = 0.0104 1090 3 + z 3 0.33 − 1090 , Z < 2500m, (5) �T Alle. Pt. Mud = 0.0155 1660 3 + z 3 0.33 − 1660 , Z < 4000m.

Fig. 2
Heat-flow histogram after outlier removal the best performing model. This not only applies to hyper-parameter tuning, but also to the data preprocessing. In particular, we standardized the input features for Ridge Regression and DNN. For XGBoost and Random Forest models, we did not observe any improvement after standardizing the features and, therefore, we did not decide to standardize the input features. The tunned hyper-parameters are reported in the GitHub repository (Shahdi and Lee 2021). Figure 3 illustrates the developed machine learning pipeline which has been used for this study. In the data preprocessing section, outliers are removed, and features are scaled (for R-reg and DNN). Next, hyper-parameters related to each model are tuned using cross-validation. At the end, the final model is also evaluated using cross-validation. This process is repeated for all models.

Ridge regression
In our dataset, there are uncertainties (noise) associated with the BHT data potentially from temperature logging tools, and/or the BHT correction correlations, etc. We used Ridge regression as one of the candidate machine learning models. Despite its simplicity, it is robust to overfitting (regulated by a penalty term known as L2 Regularization) (Hoerl and Kennard 1970). (Wyffels et al. 2008) showed how Ridge Regression is robust to noise and overfitting in reservoir computing and signal processing applications. In another study, it was shown how Ridge Regression can be a superior solution when the multi-collinearity problem between independent variables exists comparing to other complex models (Morgül Tumbaz and İpek 2021). Baruque et al. (Baruque et al. 2019) successfully used Ridge regression for a geothermal application where heat exchanger where hyper-parameter α is a positive number that specifies the trade-off between the ordinary least squares (OLS) and regularization terms. In our implementation, we initially standardized the inputs (with BHT targets) and then fed them into the hyperparameter tunning section. We used the grid-search method to search for the best alpha (shown in Table 2).

XGBoost and Random Forest
Ensemble modeling approach is a process where numerous base models are generated to estimate an outcome. The base models are independent and diverse and tend to decrease the generalization error of the prediction. This methodology exploits the wisdom of crowds to make an approximation. Even though there are multiple base models associated with an ensemble model, it behaves as a single predictor. Typically, a weighted average of all base models' predictions will be reported as the final outcome (Vijay and Bala 2014). Random forest and XGBoost are both ensemble models which have widely been used for regression and classification problems. Random Forest constructs multiple decision trees at the time of training and provides the average estimation of individual trees (Breiman 2001). Whereas in XGBoost, the estimators (trees) are sequentially added to the ensemble model to improve the accuracy by adding a base learner to correct the shortcomings of the already existing base models. In XGBoost, the shortcomings are determined by gradients (Li 2016). In this study, target imbalance problem is present within our dataset since ninety-sex percent of BHT data correspond to the shallower wells (< 2000 m) . On the other hand, the deeper wells contain valuable information with higher temperature values which should not be removed (or be considered as outliers). We mainly used ensemble-based algorithms including Random Forest (Liaw and Wiener 2002) and XGBoost (Chen and Guestrin 2016) because they are believed to work relatively well in a case where target imbalance exists (Moniz et al. 2017). In addition, treebased models usually improve the accuracy by decreasing the variance in the prediction (6) θ ridge = argmin θ y − X� 2 2 + α� 2 2 ,  (Polikar 2012). XGBoost and Random Forest are both tree-based methods which have been successfully applied in geosciences (Gul et al. 2019;Hall 2016;Sun et al. 2020). Single decision tree is often referred to as a weak classifier as it can be susceptible to overfitting (Ho 1998). Random Forest builds an ensemble of multiple decision trees (weak classifiers) in parallel and takes the mean of the predictors for the prediction. Furthermore, during the ensemble construction, random features or columns are dropped while learning every decision tree, so that every tree is de-correlated from other trees as much as possible. XGBoost, on the other hand, builds decision trees in a sequential manner. XGBoost keeps adding decision trees at every step, making a fine separation in space to predict the response variable (Chen and Guestrin 2016). Every new step considers the previous steps which result in accuracy improvement after each iteration. XGBoost is a library that allows XGBoost to be run in parallel in terms of computing.

Deep neural network (DNN)
DNN is a network of connected processing elements (neurons) which are placed in multiple layers and is used to solve classification and regression problems. This is done through a learning process where the model parameters get adjusted in the training phase. In the training stage, the errors are propagated back in the network resulting in updating the model parameters (weights). This process continues till no further improvement is observed in the errors (Maind and Wankar 2014). We developed a deep neural network (DNN) architecture to predict the subsurface temperature. In our features, we include the thermal conductivity and thickness values of up to 55 formation layers for each well. In this relatively large feature dimension, we decided to use DNN to capture the non-linearity between these geological characteristics and bottom-hole temperatures. Bassam et al. (Bassam et al. 2010) was among the first studies that evaluated the application of a shallow artificial neural networks (ANN) in formation temperatures in geothermal wells. In that study, collected BHT logs (during long-shut-in times) have been used for training and validation. Kalogirou et al. (Kalogirou et al. 2012) generated ground temperature map at shallow depths by considering land configuration using ANN.
Deep neural networks attempt to capture the relationships between inputs and outputs using a deep assembly of hidden layers of neurons, where every neuron in a hidden layer receives signals (or activations) from neurons in the previous layer, and transmits activations to all neurons in the subsequent layer. DNN models can capture high amounts of non-linearity using a large (or deep) number of inter-connected hidden layers. We tried different DNN architectures and finally picked a four-layer DNN as illustrated in Fig. 4. In the input layer, the number of nodes is the same as feature numbers followed by two hidden layers where each layer contains 50 nodes. Arrows correspond to connections among nodes and are associated with learnable edge weights. In addition, we selected ReLU activation function in our architecture. For the last neuron at the output layer, the weighted responses from the neurons at the second hidden layer are fed into a linear activation function and the final prediction for temperature is obtained. In Fig. 5, one neuron of the hidden layer is illustrated with the given inputs.
In Table 2, we included the values that are used for hyper-parameter tuning for Ridge-Regression, Random Forest and XGBoost. For DNN, we did not perform hyper-parameter tuning in the same fashion (mainly due to the computational time). We examined tens of different architectures and reached to one illustrated above.

Feature space interpolation
Temperature-at-depth maps have extensively been used in geothermal energy studies to illustrate the temperature distribution at a given depth. In this study, we also provide temperature-at-depth maps at different depths in the northeastern United States. This allows investors to have another source of temperature prediction map for any potential future development. In addition, the new machine learning temperature maps can be compared to those from the thermal conductivity model to locate the similarities and differences. A simple concave hull algorithm was used to obtain a tight boundary around the given data points. To avoid sharp edges, we derived average values for the boundary   Fig. 1). We initially used an online source code (Dwyer n.d.) and made major modifications to meet our project's needs.
For constructing the subsurface temperature prediction map, the features should be available within different locations (with varying latitude and longitude). Therefore, we interpolated the required features (shown in Table 1) throughout the northeastern region using a Gaussian kernel weighted k-nearest neighbor regression model. These interpolated features are then fed into the trained machine learning models to generate the predicted temperature-at-depth maps. We chose KNN regression method since it is simple and is expected to perform well in our region of interest due to high concentration of wells. We used cross-validation for hyper-parameter tuning of the KNN method (K = 3 and kernel width = 0.037) using 20,750 data points.

Results and discussion
We trained the proposed machine learning models using the main dataset and observed that even though only single temperature measurement points (at each well location) were used for training, the machine learning models successfully predicted underground temperatures. Among the machine learning models, XGBoost and Random Forest outperformed other models and provided more accurate results. For further verifications, we compared the XGBoost, DNN and physics-based model's predictions versus the subsurface temperatures obtained from 58 additional wells in the temperature-profile dataset. This was important because unlike the main dataset, the temperature-profile dataset comprises temperature measurements within depth intervals. This allows us to investigate the machine learning model predictions versus depth. Fortunately, the results show that machine learning models predictions were in close agreement with the measured data.

Temperature-at-depth result analysis
After training and tuning hyper-parameters, we evaluated the accuracy of each model using the test data for using cross-validation. As shown in Fig. 6 and Table 4, XGBoost and Random Forest perform the best among other machine learning models. Statistical hypothesis tests (t tests) were performed. The comparisons of XGBoost with Ridge and DNN suggest that there is sufficient evidence to reject the null hypothesis and the observed differences between XGboost and the other two models in the regression accuracy is likely due to the differences in the models. However, the result of the hypothesis test on XGBoost and Random Forest suggests that there is insufficient evidence to reject the null hypothesis. Table 3 summarizes the p values for the tests.
We then used the trained models to predict subsurface temperature at three different depths (Z = 1000, 2000, 3000 meters) in the northeastern United States. In Fig. 7, temperature predictions are plotted using XGBoost models. For comparison purposes between the physics-based and machine learning subsurface temperature predictions, we used KNN method (k = 8 and width = 1 determined from crossvalidation) for temperature interpolation for the physics-based model. To be more elaborate, in the main dataset, at each well's location, the predicted physics-based underground temperatures were provided along the depth. We used this data and KNN interpolation method to approximate the physics-based values at different latitudes, longitudes and depths.

Generalizability analysis
As discussed earlier, the target imbalance problem was present in our dataset since fewer data points were available for depths below 2000 m (or BHT larger than 60 °C). We conducted an experiment to compare XGBoost accuracy for well-represented and underrepresented data points in a test set. In Fig. 8, average percentage error (APE) versus depth is plotted for the test set where well represented and underrepresented data are illustrated by different colors. Furthermore, Fig. 9 shows the target distributions of the same test set (with one-to-one match with data points in Fig. 8). Next, we compared the mean absolute percentage error (MAPE) for wellrepresented and underrepresented test data and found both values to be remarkably similar (with less than 2% difference). Through this empirical analysis, we confirmed the generalizability of the XGBoost model.

Temperature-profile prediction
In our analysis, we decided to use the corrected temperature-profile dataset (described in "Drilling stage" Section) to evaluate XGBoost and DNN accuracies against the thermal conductivity model. Jordan et al. reported the predicted subsurface temperatures (from the physics-based model) across the depth for each well's latitude and longitude in the main dataset. The size of the available predicted temperature data is 2075*500 where each well had 500 temperature prediction values at different depths. We used KNN regression model (using the mentioned data) to interpolate temperature-profile predictions for the physics-based model at the new locations (in the temperature-profile dataset). In the following schematic, we illustrate the procedure that we have used to compare predictions from machine learning and the physics-based models. After analyzing the results, the mean absolute errors of XGBoost, DNN, and physicsbased models were calculated to be 7.3, 7.27, and 8.76, respectively, for the 58 wells. These numbers show that machine learning models can be comparable, in terms of accuracy, to the physics-based thermal conductivity model. It is important to note that we have used multiple interpolations to be able to perform such comparison (Fig. 10). Therefore, there is some level of uncertainties associated with the reported numbers.
For illustration purposes, we include six temperature-profile predictions (in Fig. 11), which are fair representatives of the remaining cases. Among all plots, we could see that the thermal conductivity model performs relatively better in tracking the true temperature data in 11.3 and 11.4. On the other hand, both XGBoost and DNN models provide more accurate results in 11.1 and 11.6. Nevertheless, there are some cases where all models fail to follow the actual data. For example, in plot 11.2, we could see that neither physics-based nor machine learning models predict the temperature profile accurately. Temperature-profile prediction plots of other wells are included in our GitHub repository (Shahdi and Lee 2021). Among machine learning predictions, DNN and XGBoost predictions follow very similar trends even though DNN curves are smoother and have less variation with depth. This is expected because decision-tree-based models tend to show such discrete predictive behavior when used for regression.
In Tables 4 and 5, we include each well's API well identification number with the distance from the closest well in the main dataset. The shown plots are from the wells that are close to at least one of the wells in the main dataset. This is important because it shows that the interpolated temperature values for the physics-based predictions are reliable and close to those reported by the original study (Jordan et al. 2016).

Geothermal gradient map
It is very popular to use geothermal gradient maps to predict the subsurface temperature at the desired location. In this study, we provide the geothermal gradient map for the northeastern United States.
Similar to the plots (shown in Fig. 11), we generate temperature-profile predictions for 28,000 locations across the region and then fit a linear regression line to the temperature data for each location. These 28,000 wells are defined symmetrically throughout the region of interest (bounded by the concave hull algorithm which is shown in Fig. 1). This    was necessary for generating a continuous temperature gradient map. Through our analysis, we found that the fitted lines accurately represent the predicted temperatures with average R 2 of 0.97. The reported slopes are equal to the associated geothermal gradients and are illustrated in Fig. 12. The second map in Fig. 12 is a snapshot of an interactive Folium map within our region of interest. In Fig. 13, areas with predicted geothermal gradient higher than 27 • C km (obtained from Random Forest, XGBoost and DNN) are annotated. All three model predictions recommend similar areas in West Virginia and New York states to have high values for temperature gradient. We cautiously suggest these machine learning guided prospective regions for future geothermal developments.
Next, we calculated the mean absolute errors between the geothermal gradients predicted using different models (e.g., physics-based, XGBoost and DNN) and measured temperatures for the temperature-profile dataset (as shown in Table 6).

Conclusion
The goal of this paper is to highlight the importance and applicability of machine learning methods in producing reliable predictions of important geothermal parameters from the rich volumes of data available from geothermal sites. It is critical to understand that this paper does not claim to prove that machine learning models are ubiquitously superior to conventional physics-based models in geothermal energy research. In this study, we explored the applicability of four machine learning models in predicting subsurface temperatures in northeastern United States using bottom-hole temperature data and geological information from 20,750 wells. It was shown that XGBoost and Random Forest outperformed all other models, with only 3.21 • C and 3.25 • C mean absolute error. Furthermore, we compared the predictions from machine learning and physics-based models to the measured temperature data obtained from an extra dataset with 58 wells in the state of West Virginia and showed that XGBoost can successfully predict the temperature at different depths. Lastly, we provided a geothermal gradient map for the corresponding region which can be used as a quick tool to calculate the underground temperature at any desired location and depth. In the map, eastern West Virginia along with portions of southwestern New York state show the highest potential.
We believe that this study provides a complementary analysis for geothermal energy exploration for future investments. Furthermore, oil and gas industry can benefit tremendously from this paper too. The presented machine learning models can be incorporated in reservoir and drilling simulators for more accurate subsurface temperature predictions, and consequently, more reliable fluid properties characterization.