Machine learning to identify geologic factors associated with production in geothermal fields: a case-study using 3D geologic data, Brady geothermal field, Nevada

In this paper, we present an analysis using unsupervised machine learning (ML) to identify the key geologic factors that contribute to the geothermal production in Brady geothermal field. Brady is a hydrothermal system in northwestern Nevada that supports both electricity production and direct use of hydrothermal fluids. Transmissive fluid-flow pathways are relatively rare in the subsurface, but are critical components of hydrothermal systems like Brady and many other types of fluid-flow systems in fractured rock. Here, we analyze geologic data with ML methods to unravel the local geologic controls on these pathways. The ML method, non-negative matrix factorization with k-means clustering (NMFk), is applied to a library of 14 3D geologic characteristics hypothesized to control hydrothermal circulation in the Brady geothermal field. Our results indicate that macro-scale faults and a local step-over in the fault system preferentially occur along production wells when compared to injection wells and non-productive wells. We infer that these are the key geologic characteristics that control the through-going hydrothermal transmission pathways at Brady. Our results demonstrate: (1) the specific geologic controls on the Brady hydrothermal system and (2) the efficacy of pairing ML techniques with 3D geologic characterization to enhance the understanding of subsurface processes.

the volume of rock that transmits fluids at rates suitable for power production or direct use is much smaller than the volume of rock that does not transmit fluid (or transmits at sub-commercial rates or temperature). This presents a significant challenge to efficient exploration, development, and management of these renewable energy resources.
Compartmentalization of the fluid-flow system may be associated with a variety of geologic characteristics. For instance, spatial changes in fracture permeability throughout a fault network, and/or permeability variation in the stratigraphic succession may control compartmentalization. The purpose and innovation of this study is to reveal the geologic factors that influence the compartmentalization of fluid flow this hydrothermal system. We evaluate three-dimensional (3D) geologic characteristics through an unsupervised machine learning (ML) method called non-negative matrix factorization with k-means clustering (NMFk). Specifically, NMFk is applied to a suite of geologic factors that have been calculated along production, injection, and non-productive wells at Brady geothermal field in northwestern Nevada. The ML results indicate that the macro-scale faults and the ~ km-scale step-over in the fault system are closely spatially associated with production wells relative to injection wells and non-productive wells. Mapping the 3D distribution of these factors in geothermal prospects, developed geothermal fields, and other types of fluid-flow systems may help promote more efficient resource development and management.

The Brady geothermal system
Brady geothermal field is located ~ 80 km northwest of Reno, Nevada, USA (Fig. 1). Brady has seen geothermal electricity production since 1992 and research or exploration since at least 1959 (Benoit and Butler 1983). The hydrothermal system supplies hot fluid to two power stations and a direct-use vegetable drying facility. Electricity production capacity at Brady is 26.1 MWe, and ~ 7 MWth is delivered to the drying facility (Ayling 2020). Fluids are produced from two levels in the subsurface ~ 300-600 m depth and ~ 1750 m depth. Temperatures of produced fluid have been ~ 130-185 °C during this time (Benoit 2014, and based on Nevada Division of Minerals, publicly available data http:// www. nbmg. unr. edu/ Geoth ermal/ Produ ction Injec tion/ index. html), though temperatures as high as 219 °C have been measured (Shevenell et al. 2012). These relatively high temperatures occur at relatively shallow levels (as shallow as 300-600 m depth for some production wells) as a result of convective upwelling driven by temperaturerelated differences in fluid density, and/or hydraulic head driven fluid-flow through hot rock (advection). In either case, relatively high heat flow in the Basin and Range physiographic province, which is associated with Miocene-to-recent crustal thinning (Lachenbruch and Sass 1977;Blackwell 1983) provides the heat. The fluids circulate through transmissive pathways in the rock. At Brady, these have been primarily attributed to a network of fractures within a step-over in the Basin and Range-type normal fault (e.g., Wernicke 1992;Colgan et al. 2006) called the Brady fault (Faulds et al. 2003(Faulds et al. , 2010a(Faulds et al. , b, 2017Siler et al. 2018;Siler and Pepin 2021;Fig. 1). The production well field; an elliptical, ~ 3 km wide × 6 km long (across strike × along strike, relative to the north-northeast fault strike) temperature anomaly; surface geothermal features include active fumaroles, silica sinter, silicified sediments, and warm ground (Kratt et al. 2006;Lechler and Coolbaugh 2007;Faulds et al. 2010aFaulds et al. , b, 2017; and diffuse degassing of anomalous CO 2 , H 2 S, and Rn are also centered on the Brady step-over (Jolie et al. 2015a(Jolie et al. , b, 2016. The stratigraphic section at Brady consists of metamorphic basement rocks at ~ 1400 m and deeper in the well field, overlain by Oligocene-to-Late Miocene volcanic rocks from ~ 1400 to ~ 300 m depth, and Late Miocene-to-Holocene sedimentary rocks from ~ 300 m depth to the surface. The Brady fault is a west-dipping, north-northeaststriking system of normal faults that cuts this stratigraphic section (Fig. 1). The stepover of the Brady fault zone (Faulds et al. 2010a(Faulds et al. , b, 2017Siler and Faulds 2013;Siler Fig. 1 Map of Brady geothermal area. The fault strands that constitute the Brady fault (data from Faulds et al. 2017) are shown in yellow. Contours represent modeled temperature, a linear interpolation of the temperature data at 750 m depth. These are the same data as the modeltemp variable. Wells are colored by their usage (production, injection, and non-productive) for depths shallower than 750 m. The general geometry of the step-over is shown to the left of the fault system. Interstate 80 (orange line) is shown for reference et al. 2016) is an area where parallel but non-collinear strands come together (e.g., Sanderson 1991, 1994;Fossen and Rotevatn 2016). The southern segment of the Brady fault zone steps to the left to meet the northern segment ( Fig. 1; Faulds et al. 2017). Production wells are located in the step-over, injection wells are both in the stepover and ~ 1.5 km away, at the north-northeast extent of the thermal anomaly (Fig. 1).

Methods
An existing 3D geologic map of Brady, synthesizing geologic map, downhole lithologic and structural data from well cuttings and core, and geophysical data (Siler and Faulds 2013;Jolie et al. 2015b;Siler et al. 2016Siler et al. , 2021Witter et al. 2016) was used to develop a suite of geologic variables that may control the distribution and localization of transmissive pathways and production-grade hydrothermal fluid flow. The 14 different variables described below are calculated from the 3D geologic map and projected to 47 production, injection, and non-productive wells within the field (Fig. 1).

Geothermal well data
The Brady well dataset is much denser at shallow levels than at deep levels (just eight of 47 wells extend to ~ 1750 m, the deeper of the two geothermal reservoirs). The well data at deeper levels are too sparse to be used in the NMFk analysis. As a result, this analysis focuses on the shallow (~ 300-600 m deep) reservoir, where the data are sufficiently dense for NMFk. 750 m is used as the cutoff depth to ensure that the full length of all wells that produce from the shallow reservoir is included in the dataset. There are nine production wells and six injection wells that have been used for production or injection at depths of less than 750 m since the geothermal power station was brought online in 1992. Wells producing from the shallow reservoir account for ~ 57% of the total produced volume (June 1992-August 2019; based on publicly available data Nevada Division of Minerals http:// www. nbmg. unr. edu/ Geoth ermal/ Produ ction Injec tion/ index. html). We consider the remaining 32 wells to be non-productive wells. Four of these remaining 32 wells are used for production or injection; however, these wells produce or inject in the deeper reservoir. At depths less than 750 m they are not open to flow to or from the formation. These four deeper production wells are therefore considered "nonproductive" for our purposes. The other 28 wells have never been used for production or injection so we assume they have sub-commercial temperatures and/or flow rates. Each of the 14 geologic factors described below is calculated at 1-m intervals along all 47 wells. The resultant database of 336,784 entries (24,056 locations with 14 variables) is used as input data for NMFk analysis. These are the input data for the NMFk analysis. Figure 2 shows the values for each of the 14 variables for one of the production wells.

Fault factors (faults, curve, td, ts, faultnear)
For each of the 32 faults defined by the 3D geologic map (Siler and Faulds 2013;Siler et al. 2016Siler et al. , 2021Witter et al. 2016), a 30-m-wide fault zone is generated. This zone approximates the effective width of secondary faulting and fracturing around each fault. Though not constrained by field data, this width is consistent with empirically derived fault zone widths for km-long faults, like the Brady fault zone (Scholz et al. 1993;Anders and Wiltschko 1994). The fault variable has a value of '1' where a well is located within a 30-m-wide fault zone and '0' for well intervals not located within a fault zone. The curve variable is the along-strike and down-dip curvature calculated along each fault. The td and ts variables are the dilation tendency and slip tendency, respectively, for each fault. These values are calculated using methods of Morris et al. (1996) and Ferrill et al. (1999) and a local stress model calculated at Brady (Jolie et al. 2015b). The 30-m-wide fault zone for each fault is populated with the calculated curve, ts, and td values. The td, ts, and curve values are proxies for the occurrence of conductive fractures in each fault zone. Segments of faults with a high value for curve are postulated to be associated with accentuated faulting and fracturing as a result of stress loading at the highly curved fault segments (e.g., Sibson 1994), and may therefore preferentially host fluid flow. Dilation tendency (td) and slip tendency (ts) are the ratios of the resolved normal stresses and Fig. 2 The 14 variables used in this study along one of the Brady production wells. These are the input data for the NMFk algorithm. Cool colors correspond to low values and warm colors correspond to high values for each variable. For the binary variables (fault and goodlith), red corresponds to a value of one (a fault or the producing lithology), purple corresponds to a value of zero. For faultnear and contactnear warm colors indicate nearness to faults or contacts the ratio of normal stress to shear stress on faults, respectively. Fault segments that are either highly dilatant (high td) and/or stress loaded for slip (high ts) are likely to host fluid flow (Ferrill et al. 2019(Ferrill et al. , 2020. For all wells, the faultnear variable is calculated as the difference between the distance to the nearest 3D mapped fault plane and the maximum distance to a fault in the dataset. This is done so that high faultnear values occur at intervals of wells that are near to faults (e.g., see Fig. 2), in the same way that high values for the other variables occur where hydrothermal processes are expected.

Fault network factors (faultdense, faultintdense)
Areas in the subsurface with especially dense faulting and fracturing are expected to have relatively high permeability, and thus host hydrothermal circulation. The spatial density of fault planes (faultdense) and the spatial density of the lines of intersection between faults and the lines of termination of faults (faultintdense) are calculated as faults per unit volume and fault intersections or terminations per unit volume, respectively (Alberti 2011; Siler et al. 2021). Fault intersections and terminations represent structural discontinuities, where stresses become concentrated and accentuated fracturing is expected (Peacock and Sanderson 1991;Fossen and Rotevatn 2016). Similarly, areas with many closely spaced faults are also expected to have a relatively high density of fractures, and either may host high permeability.

Stress and strain factors (dilation, normal, coulomb)
The step-over in the Brady fault is an important factor controlling the presence of hydrothermal circulation at Brady (Faulds et al. 2003(Faulds et al. , 2010a(Faulds et al. , b, 2016(Faulds et al. , 2017(Faulds et al. , 2018(Faulds et al. , 2021Jolie et al. 2015a, b;Siler and Faulds 2013;Siler et al. 2016). Stress and strain become concentrated at the step-over when slip occurs on the Brady fault, and the location of the stress and strain perturbation is largely concomitant with the production well field and the local temperature anomaly (Siler et al. 2018). The 2D modeled dilation (dilation), normal stress reduction (i.e., unclamping of a fault) (normal) and coulomb shear stress increase (coulomb) as a result of 1 m normal slip on the Brady fault are calculated at 250-m-depth intervals from the surface to 750 m depth (Stein et al. 1992;King et al. 1994;Lin and Stein 2004). dilation, normal and coulomb values between the calculated depth slices are linearly interpolated, approximating the volumetric dilation, normal stress reduction, and Coulomb shear stress increase in the study area. The stress and strain perturbations that occur with fault slip result in zones of accentuated secondary faulting and fracturing and may be important factors in localizing hydrothermal circulation in the step-over (Siler et al. 2018).

Stratigraphic factors (contactnear, unitthick, goodlith)
In addition to the above geologic and structural variables, permeability associated with stratigraphic factors may also play an important role in localizing hydrothermal circulation. Stratigraphic contacts can be manifested as zones of breccia. These brecciated contact zones in successions of volcanic rocks, like those that occur at Brady, may have matrix porosity and permeability that are important aspects of the flow system. The distance from the nearest stratigraphic contact (contactnear) is calculated as the difference between the distance to the nearest stratigraphic contact along each well and the maximum distance to a contact in the dataset. In this case, high values of contactnear would be expected to correlate with hydrothermal fluid flow. Alternatively, relatively thick geologic units, i.e., relatively large, intact volumes of rock distal to stratigraphic contacts, may focus strain on a relatively small number of dominant, high-aperture fractures. Areas with high values for the thickness of each stratigraphic unit (unitthick) from the 3D geologic map could be favorable for localizing hydrothermal circulation in this case. Though the 3D geologic map (Siler et al. 2021) contains simplified stratigraphy relative to the more detailed geologic map and cross-section published for the Brady area Faulds et al. 2012Faulds et al. , 2017, the stratigraphic contacts from the 3D geologic map used to calculate contactnear and unitthick, represent major lithologic boundaries. It is probable that if there were a stratigraphic effect on the production zones, it would come from these boundaries between the major lithologic packages. The ~ 300-600-m-depth production reservoir at Brady occurs in Miocene mafic to intermediate volcanic rocks. It is possible that these specific stratigraphic units have high matrix porosity and permeability and/or are favorable for developing highly transmissive fracture systems when faulted relative to other lithologic units. The goodlith variable is '1' for well intervals with these stratigraphic units and '0' for intervals in other units.

Temperature (modeltemp)
Advection or convection is much more efficient means of heat transport than conduction. Higher temperatures, therefore, are expected within or near transmissive fluid-flow conduits. Equilibrated temperature logs from 39 deep (as deep as ~ 2 km) geothermal wells and 79 shallow (~ 150 m) temperature gradient wells (Shevenell et al. 2012) were used to build a 3D temperature model (Siler et al. 2021). The modeled temperature (modeltemp) is projected to each of the 47 wells.

Machine learning methods: NMFk
Machine learning (ML) approaches are mainly classified as unsupervised or supervised; the difference being whether the algorithm is formally trained by using labeled data (supervised) or not trained (unsupervised). Both supervised and unsupervised methods have been employed to address geothermal-related concepts. Faulds et al. (2020), applied supervised methods to regional-scale data to reveal hidden hydrothermal systems. Supervised methods have also been used to inform geothermal reservoir modeling (Gudmundsdottir and Horne 2020;Beckers et al. 2021), predict loss circulation zones in wells (Kiran and Salehi 2020), and identify small-scale fractures in seismic reflection data (Zheng et al. 2021). Unsupervised ML has been applied to identify hidden geothermal signals in both regional-scale data (Vesselinov et al. 2020a, b;Ahmmed et al. 2020a, b, d) and local-scale data (Vesselinov et al. 2019;Ahmmed et al. 2020c;Siler and Pepin 2021). Unsupervised learning was chosen for this study to infer the natural structure present within the dataset so that new information regarding the geologic controls on hydrothermal processes may readily be discovered.
The NMFk algorithm we employ combines two unsupervised machine learning (ML) methods, non-negative matrix factorization (NMF) and customized k-means clustering. NMF factorizes a non-negative data matrix, X, into two components W and H, where W is a location matrix and H is an attribute matrix.
Given a non-negative data matrix X = [x 1 , . . . , x n ] ∈ R m×n , each column of X is a variable/sample vector, where m and n are the number of locations and attributes, respectively. NMF factorizes or decomposes X based on the user-specified number of dimensions k into W and H matrices by minimizing the following loss function (Lee and Seung 1999): where � • � F denotes Frobenius norms. H can be considered as a basis matrix of X that is optimized for the linear approximation of X. Because only a few basis vectors represent all data vectors, good approximation vectors are those that capture the latent structure of X.
After completion of NMF process, 1000 estimated H are clustered into k clusters using k-means clustering that has been customized using blind source detection (Alexandrov and Vesselinov 2014). Because k is unknown in k-means clustering, the algorithm consecutively examines specified k by obtaining 1000 H for each feature/variable. During clustering, the similarity between two variables is assessed according to the cosine norm. After clustering, the Silhouette values (Rousseeuw 1987) are calculated and used to estimate a particular choice of k. The Silhouette value quantifies how similar an object is to its own cluster compared to other clusters and varies from − 1 to + 1; positive values indicate that the object is well matched to its own cluster and poorly matched to neighboring clusters. The combination of the least L and the highest Silhouette value is used to determine the optimal number of clusters, or hidden signals. If k is low, the Silhouette value will be high, but so may be L because of under-fitting (e.g., Lever et al. 2016). For high k, the Silhouette value will be low and the solution may be over-fit (e.g., Lever et al. 2016). So, the best estimate for k is a number that optimizes both the L and Silhouette values.
Other existing matrix factorizations include singular value decomposition (SVD) (Klema and Laub 1980) and principal and independent component analyses (PCA and ICA) (Wold and Geladi 1987;Comon 1994;Ritchie and Pepin 2020;Siler and Pepin 2021). There are a few advantages of NMFk over these other tools. For instance, NMFk handles both real and categorical variables and datasets with missing entries, yet still provides interpretable results (Alexandrov and Vesselinov 2014; Vesselinov et al. 2018). For geological applications of ML like this one, the ability to handle categorical data (e.g., the faults and goodlith variables) and datasets with missing entries is critical. These parameters may not be constrainable with numerical values in all locations of interest, yet still constitute an important part of the solution. In our case, the categorical faults variable is a key part of the solution. Our dataset does not have any missing entries, so this advantage of NMFk was not utilized in this study.

Results
As outlined in "Machine learning methods: NMFk" section, NMFk analysis reveals hidden associations within a dataset. In our case, these associations characterize interdependencies among geologic attributes and production, injection, and non-productive well locations within the analyzed 3D domain. Results were calculated for k of values (1) L = X − WH T 2 F , of 2, 4, 5, and 6. The 2-cluster result was under-fit. The 3-cluster result is rejected by the NMFk algorithm because of its low Silhouette value. The 4, 5, and 6 cluster results are all robust solutions based on their L and Silhouette values. Numbers of clusters larger than 6 were over-fit. For each of the 4, 5, and 6 cluster solutions the H matrices show similar weighting patterns between variables (Fig. 3). We chose to interpret the 4-cluster solution herein, since the smaller number of clusters is more easily interpretable in the framework of the geologic controls of hydrothermal fluid flow at Brady. Figure 3A shows the 4-cluster H (attribute)-matrix. Below, we use 'signal' to refer to the H-matrix and W-matrix columns (S1, S2, S3, and S4). The signal weight for each factor is calculated by the NMFk algorithm. In addition to defining the four signals, the NMFk results define "cluster" for each well. Figure 4 shows the W (location)matrix, the four signals (S1, S2, S3, S4) relative to each of the 47 geothermal wells, and the cluster (A, B, C, or D) that is defined by the NMFk algorithm. Figure 4A, B is the same matrix, Fig. 4A is sorted by the cluster label, Fig. 4B is sorted by S2 value. For both Figs. 3 and 4, warm colors indicate a relatively high (strong) weight between the signal and the variable (Fig. 3) or the signal and the well (Fig. 4) and cool colors indicate a relatively low weight. Figure 5 shows the Brady well field and the cluster Fig. 3 H (attribute) matrix for 4-signals, 5-signals and 6-signals. The 4-signal solution, which is interpreted here, is highlighted. Warm colors indicate that the variable has a high weight with that particular signal; cool colors indicate that the variable has a low weight with that specific signal (See figure on next page.) Fig. 4 W (location) matrix. The four signals from the NMFk results relative to the 47 geothermal wells: A sorted by well clusters, B sorted by S2 values. Warm colors indicate that the variable has a high weight with that particular signal; cool colors indicate that the variable has a low weight with that particular signal. The well usage (production, injection or non-productive) and the well cluster labels (A, B, C, or D) are listed Siler et al. Geotherm Energy (2021)  that each well falls into. Figure 6 shows a biplot of S1 vs S2. These two signals most effectively separate the production wells from the injection and non-productive wells. On Fig. 6, the wells and variables are plotted by their S1 and S2 values from Figs. 3A, 4, respectively. The production wells (red) have relatively high S2 values and relatively low S1 values. The variables (plotted as asterisks) that plot in the same quadrant, i.e., also have relatively high S2 values and relatively low S1 values, are those that most Fig. 5 Map of the Brady well field and fault system. Wells are shown by their use (production, injection, or non-productive) and their cluster (A, B, C, or D) as assigned by the NMFk results. Six of the nine production wells are in Cluster C (triangles). Cluster C is highlighted with a white halo effectively separate the production wells from the injection wells and non-productive wells (Fig. 6).

Discussion
The NMFk results show that six of the nine wells that have been used for geothermal production at Brady from the shallow (~ 300-600 m depth) reservoir (June 1992-August 2019) fall in cluster C (Figs. 4A and 5). Cluster C is associated with relatively high (red) S2 values, and relatively low values (green) for S1, S3, and S4 (Fig. 4B). Additionally, all 9 of the production wells fall within the 17 highest S2 values (Fig. 4B). This further indicates that relatively high S2 values are strongly associated with production wells relative to the other wells. Additionally, the NMFk results indicate which fault factors, fault network factors, and stress and strain factors are more dominant in S2 (and therefore predominate along the production wells) relative to injection wells and non-productive wells.

Fault factors
The occurrence of faults intersecting a well, i.e., the faults variable, is the predominant faulting related factor associated with S2 (Figs. 3A and 6). This is evident in Fig. 6 on which faults plots in the upper left quadrant, with relatively high S2 and relatively low S1; a similar pattern is observed for the majority of the production wells. Though faultnear also has high S2 values, it also has relatively high values for S1, S3, and S4 (Fig. 3A), so it less distinctly related to S2 relative to faults. On Fig. 6 this is evident from faultnear plotting in the upper-right quadrant, farther to the right relative to the production wells. These results suggest that the presence or absence of distinct, macro-scale fault zones is strongly related to production wells, more so than the other fault factors such as nearness to faults (faultnear), the curvature of faults (curve), slip tendency (td), or dilation tendency (ts).

Fault network factors
The 3D spatial density of fault planes (faultdense) is the predominant fault network factor related to S2 and the production wells (Figs. 3A and 6). This is evident in Fig. 6 on Fig. 6 Biplot of signal-1 (S1) vs signal-2 (S2). Production wells (red) are enclosed by the red dashed polygon. The production wells and the faults, dilation, and faultdense variables have relatively high S2 values and relatively low S1 values indicating that these variables control the separation of the production wells from the rest of the data set which faultdense plots in the upper left quadrant, with relatively high S2 and relatively low S1, similar to the majority of the production wells. Though faultindense has a high S2 value, it also has relatively high S1 and S3 values (Fig. 3A), and plots to the right of the production wells relative to faultdense. This indicates that fault density is more strongly related to production wells than fault intersection density.

Stress and strain factors
Dilation occurring as a result of modeled fault slip (dilation) is the predominant stress/ strain factor related to S2 and the production wells (Figs. 3A and 6). This is evident in Fig. 6 in which dilation plots in the upper left quadrant, with relatively high S2 and relatively low S1, similar to the majority of the production wells. These results indicate that dilation is more strongly related to production wells relative to normal or coulomb, the other stress/strain network factors examined herein.

Stratigraphic factors
Overall, no stratigraphic factor is particularly strongly related to the production wells.
Of the three stratigraphic factors, the thickness of geologic units (unitthick) has high S2 values (Figs. 3A and 6) relative to the nearness to geologic contacts (contactnear), and the specific geologic units that are associated with geothermal production (goodlith). However, S1 values for unitthick are high relative to the production wells (unitthick plots in the upper right in Fig. 6), so unitthick appears to be less strongly related to the production wells relative to dilation, faultdense, and faults.

Temperature
Temperature (modeltemp) has relatively low values for all signals. This suggests that the modeled temperature is not significantly higher or lower along any subset of wells relative to the other wells.

Geologic controls on hydrothermal processes at Brady
The NMFk results suggest that there are two dominant characteristics of the geologic structure that control hydrothermal processes at < 750 m depth at Brady: the distinct, macro-scale faults and the step-over in the Brady fault system. The macro-scale faults are those that are mappable in 3D using geologic and geophysical evidence (Siler et al. 2021). Interestingly, this relatively simple fault variable, the binary occurrence or nonoccurrence of a 30-m-wide fault zone crossing a well (Fig. 2), is more closely related to the production wells than the static stress state of the faults (td or ts), the curvature of the faults (curve), or the nearness to the mapped fault planes (faultnear). The step-over in the Brady fault ( Fig. 1) is the other dominant geologic factor controlling hydrothermal processes at Brady. This concept is well established (Faulds et al. 2003(Faulds et al. , 2010a(Faulds et al. , b, 2016(Faulds et al. , 2017Kratt et al. 2006;Lechler and Coolbaugh 2007;Jolie et al. 2015a, b;Siler et al. 2018), but this study and a similar unsupervised machine learning study (Siler and Pepin 2021) directly link the 3D geologic attributes of the step-over to the production wells and production intervals. The geometry and location of the step-over controls the spatial density of fault planes (faultdense) because faults are most dense in the stepover (Fig. 1). The step-over also controls the dilatational strain that occurs as a result of modeled fault slip on the Brady fault (dilation) (Siler et al. 2018). The NMFk results suggest these two factors are more effective indicators of step-over's control on hydrothermal processes relative to the other stress and strain variables (normal and coulomb) and the spatial density of fault intersections (faultintdense). A principal component analysis (PCA) study conducted on a similar dataset but examining the entirety of the Brady production wells (rather than < 750 m depth as we do here) also indicate the macroscale faults and 3D density of faults are the dominant controls on geothermal production zones (Siler and Pepin 2021). Interestingly, this study suggests that dilation is more strongly associated with geothermal production than coulomb stress change, whereas Siler and Pepin (2021) show the opposite. The magnitude of dilation occurring as a result normal fault slip is highest at shallow levels and decreases with depth. This may explain why this study, which focuses on the shallow reservoir (< 750 m), reveals dilation as a primary control, whereas Siler and Pepin (2021), which includes analysis to ~ 1750 m depth suggests that changes in coulomb shear stress are a primary control.
Of the stratigraphic factors, thicker geologic units (unitthick) the strongest influence hydrothermal processes. This may indicate that faults cutting through thicker geologic units preferentially transmit the high flow rates necessary for geothermal production relative to faults cutting thinner units. However, based on the ML analyses, this control appears to be tertiary to the macro-scale faults and the step-over. The modeltemp variable is not strongly related to production wells relative to the other wells. It is likely that our extrapolation of the existing temperature data does not sufficiently resolve advective or convective relative to conductive heat transport, and thus modeled temperature is relatively ineffective for resolving discrete fluid-flow pathways. In future work, a different strategy for considering temperature data is needed.

Conclusions
Non-negative matrix factorization with k-means clustering (NMFk) analyses was conducted on a 3D geological dataset from Brady geothermal field to elucidate the geologic characteristics that control hydrothermal circulation in the shallow (~ 300-600 m depth) geothermal reservoir. These analyses show that known, macro-scale faults, i.e., those that have been mapped in 3D based on geological and geophysical evidence, are strongly associated with the production wells at Brady. Geologic factors that occur most prominently within the Brady step-over, such as high spatial densities of faults, and dilatation brought on by modeled fault slip, are also strongly associated with production wells relative to other wells. These results indicate that the shallow hydrothermal reservoir at Brady is hosted by relatively prominent faults and that locations where such faults lie within the subsurface projection of the step-over, i.e. the volume of rock with relatively high fault and fracture density and where fractures tend to dilate as a result of periodic fault slip, are especially well suited for geothermal production. These two factors, macro-scale faults and the Brady step-over, together, and not either independently, control the presence of the Brady hydrothermal system that has been developed for electricity production and direct thermal uses.
The NMFk methodology successfully differentiates production wells from among a larger number of non-productive wells using these geologic data. This suggests that these geologic parameters may be assessed at other geothermal sites in order to help evaluate site prospectively and/or reservoir processes. In sites with limited subsurface data, the 3D geologic factors may be less well constrained, but analysis similar to what has been presented herein may still indicate areas of the subsurface that are likely to have the appropriate geologic characteristics for fluid-flow. These types of analysis may also be effective as training data for using NMFk or supervised machine learning techniques to identify areas within an unexplored volume of the subsurface that have the geologic characteristics that are expected to host productive geothermal wells.