SciELO - Scientific Electronic Library Online

vol.25 issue4Mapping wind power density for Zimbabwe: a suitable Weibull-parameter calculation methodA systems approach to urban water services in the context of integrated energy and water planning: A City of Cape Town case study author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand



Related links

  • On index processCited by Google
  • On index processSimilars in Google


Journal of Energy in Southern Africa

On-line version ISSN 2413-3051
Print version ISSN 1021-447X

J. energy South. Afr. vol.25 n.4 Cape Town Nov. 2014


Application of multiple regression analysis to forecasting South Africa's electricity demand



Renee Koen; Jennifer Holloway

Decision Support and Systems Analysis Research Group, Built Environment Unit, CSIR, Pretoria, South Africa




In a developing country such as South Africa, understanding the expected future demand for electricity is very important in various planning contexts. It is specifically important to understand how expected scenarios regarding population or economic growth can be translated into corresponding future electricity usage patterns. This paper discusses a methodology for forecasting long-term electricity demand that was specifically developed for applying to such scenarios. The methodology uses a series of multiple regression models to quantify historical patterns of electricity usage per sector in relation to patterns observed in certain economic and demographic variables, and uses these relationships to derive expected future electricity usage patterns. The methodology has been used successfully to derive forecasts used for strategic planning within a private company as well as to provide forecasts to aid planning in the public sector. This paper discusses the development of the modelling methodology, provides details regarding the extensive data collection and validation processes followed during the model development, and reports on the relevant model fit statistics. The paper also shows that the forecasting methodology has to some extent been able to match the actual patterns, and therefore concludes that the methodology can be used to support planning by translating changes relating to economic and demographic growth, for a range of scenarios, into a corresponding electricity demand. The methodology therefore fills a particular gap within the South African long-term electricity forecasting domain.

Keywords: long-term forecasting, South African electricity demand



1. Introduction

In a developing country such as South Africa, understanding future patterns of electricity usage is very important in various planning contexts. The future national demand for electricity is an important consideration for electricity providers who need to plan to have sufficient and secure supply of electricity (Imtiaz et al., 2006 and Soontornrangson et al., 2003). Investment in electricity generation capacity, whether using fossil-based or renewable energy sources, is largely motivated through the anticipated long-term need for electricity (Ulutaş, 2005; Doriana; Franssen and Simbeck, 2006). South Africa has a growing population which creates an increasing need for housing and services, as well as a need to expand economic activity both to accommodate new entrants into the labour market and to address current high unemployment levels. Therefore, virtually all types of national and local planning, public or private, requires consideration of the implications on future electricity needs in order to establish whether there is sufficient electricity supply capacity in the country to support future plans.

This paper discusses a methodology for forecasting long-term electricity demand that was initially used for assisting with strategic planning for the South African branch of a multi-national company. The initial modelling objective expressed by the company was to be able to determine potential fluctuations in the future national demand for electricity, i.e. the amount of electricity required from the national grid, in order to assess the impact of that on their own business plans. This required a methodology that could determine the effect of possible changes in various political, demographic or economic patterns on the future national electricity consumption patterns. Therefore, forecasts using extrapolation of past trends would potentially not suffice, and a type of scenario-based forecasting was foreseen to be more appropriate.

A methodology was developed that satisfied these needs expressed by the company and that produced scenario forecasts that could be successfully incorporated into their strategic planning processes. During the development of this methodology, various electricity forecasting studies published locally and internationally were consulted, but it was found that a scenario-based methodology using multiple regression models to forecast electricity demand in various electricity usage sectors had not been applied before. Furthermore, the extensive collection of public domain data and the interrogation process applied in order to create a usable dataset out of the various information sources had not been found in any other study. In addition, the South African electricity demand and supply patterns, and the driving forces behind them, are different from that of other countries. Therefore, this methodology could be viewed as unique, both locally and internationally.

The methodology described in this paper was applied successfully within the company it was developed for, and the same basic methodology has subsequently been used to support planning of electricity supply needs within the public sector.

This paper first discusses the objectives of the forecasting and then provides more details with regard to the data collection and validation, as well as the modelling methodologies used. This is followed by descriptions of the models used and the forecasts derived from the models. The paper concludes with a discussion of the chosen methodology as it compares to approaches used in other studies, as well as comments about the usefulness of the methodology.


2. Forecasting objectives

The main objective to be met by the forecasts, as expressed by the initial client, was to estimate the future demand for electricity from certain expected changes in the national economy and demography. This meant that a forecasting model (or set of models) had to be developed that would be able to translate aspects such as economic growth or decline into subsequent growth or decline in electricity usage. The focus was therefore placed on the development of a model(s) that could quantify historical patterns of relationships between electricity usage and the relevant economic and demographic variables, using data available in the public domain. Ultimately, the objective was to use such a model(s) to derive future electricity usage patterns from these quantified relationships once expected future values for the relevant economic or demographic variables had been estimated.

Since these objectives required the quantification of historical patterns as a basis for future forecasts, a statistical modelling and forecasting approach seemed appropriate. However, in a statistical modelling approach it would not only be important to consider the correlation between electricity usage and the variables used to predict electricity usage, but also to consider the correlation of the predictor variables with each other. Such correlation between predictor variables is called multi-collinearity or near-linear dependence (Montgomery, Peck and Vining, 2006). Including variables that are highly correlated with each other as predictors in the same model can be problematic when that model has to be used for forecasting. This is especially true for scenario forecasting, since there are no guarantees that the historical relationships between variables would be maintained when creating future scenario inputs. For instance, one may want to purposefully create a scenario in which variables do not follow the same patterns as in the past. If a model exhibits high multi-collinearity, violating such relationships in the created scenario inputs could then invalidate the model's outputs. Therefore, an important consideration of the methodology development was to ensure that the models would be developed in a way that they were statistically valid, i.e. that multi-collinearity between the predictor variables used to estimate electricity demand would be managed correctly.


3. Forecasting methodology overview

In developing a statistical model(s) to use as a basis for scenario forecasting, an attempt was first made to use predictor variables to forecast future annual demand for electricity at a national level. Data on various external factors, such as Gross Domestic Product (GDP), population, electrification of households, major industrial projects (using start-up and shut-down dates) and climate variables, as well as relevant derivations and transformations of these variables, were collected and analysed. A particular problem experienced with trying to forecast the total national demand, however, was the consistent problem of multi-collinearity that was measured in any model that contained GDP and population, or transformations of these two variables. This made it difficult to develop an appropriate model that would support forecasts from scenarios that contained both of these variables.

Instead, the approach was adapted by breaking the total electricity consumption up into sectors of electricity usage, forecasting the consumption per sector and then combining these sector forecasts into a total annual forecasted demand for the country. Losses also had to be estimated and incorporated into the forecasted total. In this manner, each sector could have its own set of drivers (predictors) that were appropriate for the electricity consumption in that sector, and sectoral models could be derived that had acceptable levels of multi-collinearity. Consultation with different experts in the field of electricity consumption forecasting also confirmed that forecasts via different electricity usage sectors give better results than directly forecasting total consumption values. The main challenge was to find reliable historical values for sector consumption to use as a basis for these forecasts, together with historical values of potential predictor variables. The problems encountered during data collection and verification are discussed in more detail in section 4, but it may be summarised by saying that reliable data on electricity consumption per sector is very difficult to find.

Multiple regression modelling was chosen as the forecasting technique for each sector as this has been noted to be the most appropriate statistical technique for long-term forecasting (Makridakis, Wheelwright, and Hyndman, 1998). There have been previous reported studies where multiple regression has been used for long-term forecasting of electricity consumption in other countries, either for total consumption (Bianco, Manca and Nardini, 2009; Mohamed and Bodger, 2005; Egelioglu, Mohamad, and Guven, 2001) or for a specific sector (Al-Ghandoora et al, 2008). These models used different drivers for forecasting, as appropriate within the specific country, but employed the same regression technique as discussed in this paper. Details on the regression models are provided in section 5.


4. Electricity demand and 'driver' data

A very important component of the regression modelling involved the collection of appropriate data for the relevant variables required.

Data on national electricity consumption in South Africa from 1978 to 2010 was obtained from Statistics South Africa (Stats SA), from the series of monthly publications under the Statistical Release series P4141 - Generation and consumption of electricity, available from the website (StatsSAweb). The specific time series used is defined as the 'electricity available for distribution in South Africa'. Although this data is not broken down into electricity usage sectors, it was a valuable dataset to use to reconcile other datasets containing sector data. In order to obtain an adequate data series on electricity consumption per sector, it was necessary to obtain data from various sources as no single source could give data per sector from 1972 until present. When the forecasting methodology was initially developed, and subsequently updated, the following potential sources of sector data were identified:

  • Statistics South Africa (some sectors, 1964-1984) (StatsSAweb)
  • Rand Afrikaans University (RAU) (now University of Johannesburg) (1972-1991) (Cooper and Kotze, 1992)
  • Department of Minerals and Energy (DME), 1998 Digest (now Department of Energy) (1990-1997) (Cooper, 1998)
  • Department of Minerals and Energy (DME), 2000 Digest (now Department of Energy) (1992-2000) (DIGEST2002)
  • Department of Minerals and Energy (DME), 2006 Digest (now Department of Energy) (1992-2004) (DIGEST2006)
  • Department of Minerals and Energy (DME) Energy Balance Spreadsheets (now Department of Energy) (1992-2009) (ENERGYBALANCES)
  • National Energy Regulator of South Africa (NERSA), Electricity Supply Statistics of South Africa (1996-2006) (ESS)
  • South Africa Energy Statistics (Vol. 1 and Vol. 2) (1950-1993) (SAES1, SAES2)
  • Eskom Annual Reports (1997-2010) (Eskom)
  • Eskom Statistical Yearbooks (1957-1996) (Eskom Year Book)

The various sources had to be checked against each other and against the Stats SA national totals. Although the total consumption figures were roughly consistent between the various sources, there were large discrepancies for many of the data points (years) at the individual sector level. This is partly due to the mismatch between the number of sectors used in each source, but also to the inconsistency between sector definitions used in the different sources for some sectors. In order to use and compare data from the different sources, the sectors had to be aligned to one common set of sectors, and for this purpose, the sectors used by NERSA, and also reported by Eskom, were used as the standard. Although the Eskom Annual Reports provide a reflection of the electricity distribution by sectors, these sectors are broken down only for Eskom's direct customers and not for the electricity that is redistributed by the municipalities. Eskom has a large sector for redistributors that is listed in their reports, but this electricity consumption cannot be broken down by Eskom into the typical economic activity sectors. However, for the purposes of our sector forecasting and in the absence of suitable data from other sources, certain sector data was estimated using Eskom's electricity consumption per sector together with the historical estimates of Eskom's percentage share in the sector, as reported by NERSA.

The data discrepancies between the different data sources were considered and potential reasons were sought for these discrepancies. Where the reasons for discrepancies could not be ascertained, representatives from the data sources were contacted in order to obtain clarity on definitions and to gain understanding as to reasons for differences between sources. Examples of data issues that arose from the discussions include:

  • The NER (now NERSA) data was collected from Eskom and municipalities, but Eskom has a different financial year to that of the municipalities, with the result that the data from the two sources are not aggregated over the same periods for the year being reported.
  • Municipalities did not use the NER categories on their own systems, and therefore every year they had to match their data to the categories provided by NER before submitting the data, often leading to the same municipal client being classified differently in different years. This was particularly true of the commerce and manufacturing sectors.
  • The Platinum mining sector was initially not included under mining in all sources, but was classified by some as industrial based on the platinum processing plants found on mining sites.
  • Data from DME (now Department of Energy) was not always consistent across years when certain users were first classified into a 'non-specified other' category and later allocated to industry, commerce and residential. If a data version was published before this re-allocation, the sector data would be incorrectly reflected in the published version.
  • In one or two sources, changes were made to definitions without adjusting the data 'backwards' to match the definition change.

After an extended period of data checks and consultation, the most reliable data series to use for each sector was selected, mostly using a combination of sources. All the recommended sector data series, once confirmed, were added together, distribution and transmission losses were added, and when checked against the Statistics SA national consumption figures the recommended total was found, for the majority of the years, to be within 1% of the total national electricity consumption, with only a few years differing by an amount close to 2.5% of the total. Note that the NERSA and Eskom data had to be adjusted for those years where the financial year did not coincide with the calendar year, with the former being the time period used for reporting and the latter being the time period used for our analysis. Adjustments were also made to the NERSA data to align it better with data published in Eskom's annual reports.

The graphs in Figures 1-5 provide an illustration of the various data sources consulted, and the relative differences between them. The thick black line indicates the data pattern that was considered to be a reliable estimate for the sector and these patterns were therefore used as a basis for the forecasting.

Although most sources provide 'Commerce' and 'Manufacturing' (also referred to as 'Industrial') sectors, definitions differed widely between them, and even between different years of the same source. Consultation with representatives of the data sources also confirmed that differentiating between commerce and manufacturing within municipal customers was problematic and could change from year to year. Furthermore, most sources contain a 'general' category, and the definition of this category was also found to be inconsistent between sources. However, Figure 3 shows that when data on 'commerce', 'manufacturing' and 'general' sectors were combined for each of the various sources, the differences between the sources were reduced.

It can therefore be seen that the collection and selection of appropriate data for electricity consumption per sector from public domain sources was not a trivial task. However, it was considered necessary to develop a consistent and reliable set of historical data on which to apply the chosen methodology.

Data on predictor variables were also collected, but there were fewer sources for these, and sources generally had consistent patterns. Therefore, this data collection process is not discussed in as much detail as the electricity sector consumption data. Predictor data could be sourced from Statistics South Africa and the Reserve Bank of South Africa, though the electronic data download facilities on their websites (StatsSA2, SARB) and also from the Chamber of Mines (CoMines). Data on rail freight ton-kms was previously obtained from Spoornet, but has since 2003 been difficult to obtain.


5. Model development

An important step in the model development was to select a range of appropriate economic and demographic variables that could potentially affect electricity usage in the sectors, or could be proxies for the patterns observed in the sectors' electricity usage. The next step was then to collect the historical data for these variables, as discussed in the previous section. Potential variables included population figures, GDP or Gross Value Added (GVA) values per sector, mining production volumes, and so on. By investigating the strength of the statistical relationship between each of the potential predictor variables and the electricity usage per sector, a smaller subset of these variables could be identified for inclusion in the final set of sector models.

The methodology followed to derive the final forecasts of total electricity consumption consequently involved an aggregation of several regression forecasts, with each sector having its own regression model. Scenarios were used to quantify the future values of various predictor variables that were identified during the regression modelling phase. Each scenario produced its own set of sector forecasts that could be added together and adjusted for estimated losses, on both distribution and transmission, to create forecasts for the total annual demand at a national level. The advantage of being able to visualise the electricity forecasts for each economic sector in a scenario, in addition to the total electricity forecast, is that one can assess the relevance and compatibility of the models and their outputs to the scenario descriptions.

In order to determine the statistical validity of the various regression models used for each electricity usage sector, the following factors were considered:

  • The model had to be a statistically acceptable quantification of the relationships in the historical data, which meant that the included predictor variables had to be as few as possible but had to provide a good overall description of the electricity usage values over the period of historical data available. The goodness of fit was measured with the R2 (correlation coefficient) and adjusted R2 measures: the higher the R2, the better the fit. (Note that relationships were assumed to be linear, and if non-linear relationships were found a relevant transformation, such as a logarithmic transformation, were applied to linearise the relationship).
  • Residual patterns for the various model options were also considered. Residuals are defined as the difference between the values predicted from the model for a particular year and the electricity usage actually measured in that year. Very large residuals or residuals that seem to show a pattern that is not random could be an indication that the model does not fit well or that an important predictor variable was not included in the model.
  • Models had to be selected in which the predictor variables showed low levels of multi-collinearity. This is measured with the condition index value - the lower the condition index, the