The first step was to determine the boundaries of the study area. There are various definitions of the Lublin trough: some tectonic divisions of Poland split it into multiple units. The definition used by the authors follows the outline by Pożaryski (Pożaryski 1974). Once the area was strictly described by the vector file from the project, meteorological stations collecting interesting data were sought to provide reliable data on weather conditions over a long time period. The significant time period was assumed to be 15 years [1st Jan 2005 to 31th Dec 2019; elsewhere, for similar considerations, 10 years (Chicherin 2020)], as these years are properly covered by available meteorological stations and constitute the longest common log period for stations providing simultaneous availability of the required data in or around the Lublin trough. Also, because of climate change, older data may exhibit deviations compared to modern years (e.g. Chicherin 2020), which might cause model distortion. An outline of the study area and the location of useful meteorological stations are presented in Fig. 1.
Temperature and wind speed data were obtained from the Polish Institute of Meteorology and Water Management—National Research Institute database, whilst irradiation data were downloaded for exactly the same coordinates from the Copernicus Atmosphere Monitoring Service (CAMS) database. The time series was checked for validity and the few gaps were filled with respective hourly values from their closest temporal vicinity. The shortest gaps were filled with averaged values of respective measurements from the closest vicinity. The total number of gaps in all datasets (five parameters in hourly meteorological datasets over 15 years in 10 locations) was less than 2500, which is less than 0.2% of all records. The number of selected meteorological stations was limited by data availability among all meteorological stations. Continuous series were encoded in a form useful in the next step.
The data and results for meteorological stations within the Lublin trough and in its surroundings were analysed statistically and for illustration were presented as maps (Figs. 2 and 3).
To assess climate factors influencing the effective use of geothermal resources in the Lublin trough, a dedicated artificial neural network (ANN) was deployed by the authors to simulate heat demand in the considered meteorological stations. ANN is a non-deterministic tool for modelling that provides results more varied than those obtained from a heating curve; thus, the results are congruent to actual heat demand fluctuations. The ANN was fed with input data provided by Geotermia Mazowiecka S.A., the geothermal heating plant in Mszczonów, on the nearby northern boundary of the research area, which was considered to probably experience similar geothermal and climate conditions and to exhibit similar citizen habits. The abundant available datasets include time series of an entire representative district heating providing heat for various types of heat consumers—single-family houses, multi-family houses, swimming pools and commercial receivers were used. Datasets for the years 2014–18 regarding heat demand along with external temperature, solar irradiation and wind speed from the nearest meteorological station were used to train the ANN. The teaching population was chosen for the stability of observations and homogeneity.
In recent years, the use of machine learning techniques has gained immense popularity in engineering and scientific applications (Mosavi et al. 2019). The most commonly used tools include artificial neural networks (ANN), which attempt to map the structure and operation of the human brain (Beale et al. 1996). From the large variety of different types of ANNs, good performance is usually obtained with relatively low computational effort by using a standard multilayer perceptron (MLP). ANNs have also been widely applied in the area of heating systems, both for load prediction (Song et al. 2020) and for system control (Guelpa et al. 2019).
For the purpose of the analysis conducted in this work, an ANN was created in Matlab 2019a by applying the fitnet function. As input data, the authors considered a four-year, hourly time series of meteorological parameters (humidity, wind speed, air temperature and global horizontal irradiation, obtained from the Institute of Meteorology and Water Management—National Research Institute [IMWM-NRI]) and dummy variables in the form of the ‘hour of the day’ and ‘month of the year’. In total the input consisted of 40 explanatory variables. As output, the authors used the hourly observed heat demand recorded in the town of Mszczonów. The data were normalised. For the purpose of creating the ANN, the data were divided randomly (Matlab dividerand) into teaching (70% of samples), validation (15% of samples) and testing (15% of samples) subsets. The Levenberg–Marquardt was selected as the teaching algorithm due to its fast convergence. As activation the sigmoid and linear functions were selected in the hidden and output layers, respectively. The number of neurons in the hidden layer is an issue for which there are many potential answers (Sheela and Deepa 2020). Considering the above, it was decided to test the performance of ANNs with the number of neurons in the hidden layer ranging from 1 to 20, where the upper bound was established based on the rule proposed by Mukhopadhyay and Mukhopadhyay (Mukhopadhyay and Mukhopadhyay 2018). From the array of developed neural networks the one with the lowest mean absolute percentage error (MAPE) was selected. In this particular case, an ANN with 10 neurons in the hidden layer was selected, with an MAPE of 9.32%. Its performance is presented in Fig. 4.
Once the ANN was ready, it processed previously prepared datasets for meteorological stations in or nearby the study area. The results were normalised. The nominal heat demand (Dnominal = 1.0) was set as 3200 kW, which is a rounded-out value observed at − 20 °C and average for its immediate vicinity (± 0.5 °C). Nominal design external temperature for the entire Lublin trough is − 20 °C according to Polish norm PN-EN 12,831, for which all heat loads in this region should be calculated. Such an approach has previously been presented and widely discussed for general evaluation of geothermal systems (Ciapała et al. 2019), and is also presented hereunder in Appendix 1. Normalised demand/production may also be found elsewhere (Kryzia et al. 2020).
Based on ANN results, ramp-up and ramp-down rates were calculated separately as Eq. 1 (Jurasz and Ciapała 2018):
$$ R = D_{{\left( {{\text{i}} + 1} \right)}} - D_{{\left( {\text{i}} \right)}} $$
(1)
Since there is no clear technical limit of heat capacity covered by the geothermal source, for peak source dimensioning it was arbitrarily assumed that the first instance of a higher-than-median value for which the following equation is valid is the point at which the peak heat source should begin to operate (indexes for ordered time series). Thus, the maximal geothermal capacity is assessed according to Eq. 2 and the peak source size is set according to Eq. 3. Peak source covers only the part of demand exceeding the capacity of the geothermal well. The value of 0,00,025 within the equation was set arbitrarily based on load curves analyses:
$$ C_{{{\text{max}}}}^{{{\text{geothermal}}}} = \max D_{{\left( {\text{i}} \right) }} :D_{{\left( {\text{i}} \right) }} + 0,00025 < D_{{\left( {{\text{i}} + 10} \right)}} $$
(2)
$$ C^{{{\text{peak}}}} = 1 - C_{{{\text{max}}}}^{{{\text{geothermal}}}} $$
(3)
This criterion selects points in the steepest part of the duration time graph representing values reached only in peaks, without similar values occurring in the neighbourhood. For values chosen in this procedure, capacity factors were calculated for the entire period of observation. The use of capacity factors for the assessment of partly loaded energy systems is known and accepted (Kies et al. 2016). Calculations followed Eq. 4, which is modified version known from elsewhere (Mines et al. 2015):
$$ {\text{CF}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \min (P_{i} ;P_{{{\text{max}}}}^{{{\text{geothermal}}}} )}}{{n \cdot P_{{{\text{max}}}}^{{{\text{geothermal}}}} }} $$
(4)