Supervised Machine Learning for Predicting SMME Sales: An Evaluation of Three Algorithms

The emergence of machine learning algorithms presents the opportunity for a variety of stakeholders to perform advanced predictive analytics and to make informed decisions. However, to date there have been few studies in developing countries that evaluate the performance of such algorithms—with the result that pertinent stake-holders lack an informed basis for selecting appropriate techniques for modelling tasks. This study aims to address this gap by evaluating the performance of three machine learning techniques: ordinary least squares (OLS), least absolute shrink-age and selection operator (LASSO), and artificial neural networks (ANNs). These techniques are evaluated in respect of their ability to perform predictive modelling of the sales performance of small, medium and micro enterprises (SMMEs) engaged in manufacturing. The evaluation finds that the ANNs algorithm’s performance is far superior to that of the other two techniques, OLS and LASSO, in predicting the SMMEs’ sales performance.


Introduction
Today's organisations, both small and large, handle increasingly large amounts of data, and the amounts are expected to continue to grow exponentially (Cheriyan et al., 2018;Ndikum, 2020). Ndikum (2020) notes that human beings generate and store in excess of 2.5 quintillion bytes of data daily. Inevitably, the availability of such huge amounts of data has provided an impetus for organisations to harness efficient and flexible methods to conduct predictive analytics and inform data-driven future plans (Bajari et al., 2015;Leo et al., 2019;Obaid et al., 2018).
Machine learning techniques are attracting the interest of numerous stakeholders, including private-sector entities seeking the means to intelligently exploit their data to aid decision-making and enhance their competitive advantage in the market (Dod & Sharma, 2010;Krishna et al., 2017;Tsoumakas, 2019). Kolkman and Van Witteloostuijn (2019) explain that machine learning enables businesses to perform advanced predictive modelling to an extent not possible with traditional statistical techniques (Leo et al., 2019;Van Liebergen, 2017). Machine learning has been widely embraced for a variety of purposes, including financial modelling, health and safety analysis, medical diagnosis, and fraud detection (Crane-Droesch, 2017;Enkono & Suresh, 2020;Gholizadeh et al., 2018;Mohammed et al., 2016). Machine learning techniques have also been embraced for predicting market demand and consumer behaviour (Bajari et al., 2015;Sekban, 2019;Tsoumakas, 2019;Venishetty, 2019). The power of machine learning has attracted significant interest from numerous players, including business owners, data scientists, and econometricians (Bajari et al., 2015;Sekban, 2019;Venishetty, 2019).
Sales predictions are one of the most important elements of business operations, including for small-sized firms seeking to sustainably increase sales in order to enhance their chances of survival (Sekban, 2019;Venishetty, 2019). The rise of advanced data analytics techniques provides SMMEs with opportunities to conduct sales performance predictive modelling (Krishna et al., 2017;Tsoumakas, 2019). However, despite their significant contribution to predictive analytics, machine learning techniques have not yet been fully exploited in small enterprises' research and practice. The existing literature provides very few studies on SMMEs' use of machine learning in developed countries or in developing countries such as South Africa (Bauer, 2020;Haataja, 2016;Kolkman & Van Witteloostuijn, 2019;Te, 2018).
In respect of machine learning algorithms, Ryll and Seidens (2019) note that the extant literature lacks an evaluation of the various algorithms' effectiveness. The result is that stakeholders are likely to arbitrarily select an algorithm, without any scientific basis for their choice. Identification of the best-performing predictive techniques for particular settings and purposes would provide stakeholders with bases for deciding which to use.

Supervised Machine Learning for Predicting SMME Sales
To address this gap in the South African context, our study evaluated the performance of three supervised machine learning algorithms that can be used to conduct sales predictive modelling: OLS, LASSO, and ANNs. The algorithms' ability to predict SMME sales performance was evaluated using a panel dataset of manufacturing SMMEs in South Africa's KwaZulu-Natal (KZN) Province.

Machine learning
According to Ryll and Seidens (2019), the concept of machine learning, despite its growing popularity, remains ill-defined in extant literature. The authors define it as a process through which a system interacts with its environment in such a way that the system's structure changes and, owing to structural alterations, the interaction process changes as well. Shalev-Shwartz and Ben-David (2014) assert that machine learning is the detection of meaningful data patterns by algorithms in an automated way, essentially indicating that machine learning techniques endow programs with the ability to "learn" and adjust accordingly. This conception aligns with that of Goodfellow et al. (2016), who define machine learning as the ability of artificial intelligence (AI) systems to acquire knowledge by gleaning patterns from raw datasets. Lantz (2019) conceives machine learning as being concerned with techniques that process and transform data into actionable intelligence. Mohammed et al. (2016) describes machine learning as the enablement of machines to learn without explicit programming.
A key advantage of various machine learning techniques like ANNs is that they are non-parametric, i.e., they do not require features in the dataset to be normally distributed, as do some classical statistical modelling approaches (Kolkman & Van Witteloostuijn, 2019;Van Liebergen, 2017). This flexibility allows algorithms to learn, adapt, and in the process uncover subtle insights in data (Leo et al., 2019).
Research has shown that organisations which adopt machine learning algorithms for predictive modelling will benefit in many ways, including more effective strategic planning, resource optimisation, risk management, and inevitably enhanced competitive advantage (Cheriyan et al., 2018;Kolkman & Van Witteloostuijn, 2019;Leo et al., 2019). Krishna et al. (2017) have found that algorithms can be used to accelerate business performance and achieve long-term goals. One of the main areas in which machine learning techniques have been used is in sales performance predictive modelling (Sekban, 2019;Tsoumakas, 2019;Venishetty, 2019). This is because sales directly impact enterprise survival and long-term growth (Bauer, 2020;Sekban, 2019).

Supervised machine learning
Machine learning techniques can be of either a supervised or unsupervised nature. Unsupervised techniques are used when dealing with unlabelled datasets (Mohammed et al., 2016;Venishetty, 2019). In unsupervised learning, the interest is more in the structure of the dataset as it is analysed, without specifying a response variable to predict (Aziz & Dowling, 2019;Mohammed et al., 2016;Van Liebergen, 2017).
Supervised techniques are used when features in the dataset are labelled and the target variable is known and specified (Ryll & Seidens, 2019;Venishetty, 2019). In this study, the techniques used fall under the supervised paradigm.
Under the supervised machine learning paradigm, tasks are grouped into either classification or regression (Venishetty, 2019). Classification can be used, for instance, to predict (in this case) whether an SMME will grow (1) or not grow (0) in the next year, and this type of task is commonly termed a binary classification. On the other hand, regression tasks involve the prediction of a continuous variable, like (in this case) the prediction of an SMME's sales.
To ensure enhanced model performance, the common practice is to conduct data partitioning, i.e., dividing the data into two separate parts, commonly known as the training and test datasets (Bauer, 2020). Training data, which is labelled and thus "seen", is used for model-building, and the test data, which is unlabelled and thus "unseen", is used for model validation or testing (Mohammed et al., 2016;Te, 2018). This partitioning allows algorithms fitting well on training data to be checked to make sure they are not "overfitting" when applied to the test data (Mohammed et al., 2016). (Some algorithms might fare well on the training (seen) data but poorly on the test (unseen) data, and this is known as overfitting.) The training dataset is made up of input vector X and output vector Y, both of which have labelled features. In the training phase, algorithms learn to approximate a function to produce which is also denoted . Thus, through using different algorithms, as per Equation (1) below, a mapping function from X to Y is learned.
Based on Equation (1), is the error term independent of the explanatory variables, and despite the performance of the mapping function this error cannot be reduced.

Supervised machine learning tools
Choosing an appropriate algorithm for any given task is not a trifling decision but an important one, because the results from the selected technique will influence and guide decision-making. As argued by Venishetty (2019), there is no "one-size-fitsall" machine learning technique for every problem and thus there is a need to evaluate and identify an appropriate algorithm for a given task. Various machine learning techniques have been used to solve regression problems such as sales modelling. OLS, LASSO, and ANNs are among the most extensively used algorithms for such learning tasks (Casella et al., 2017;Lantz, 2019;Melkumova & Shatskikh, 2017;Shalev-Shwartz & Ben-David, 2014).

Ordinary least squares (OLS)
The OLS technique, which is also generally referred to as the linear regression technique, is valued mainly for its ability to learn efficiently. It has been found to provide linear predictors that are not only intuitive and easily interpretable, but also perform reasonably well in fitting data in different natural learning problems (Casella et al., 2017;Shalev-Shwartz & Ben-David, 2014). This form of predictive technique is normally used in traditional statistical modelling when ascertaining causal relationships between response variables and dependent variables (Aziz & Dowling, 2019). In essence, this technique attempts to choose the slope and the intercept that minimise the sum of the squared errors-or, as described by Lantz (2019), to minimise the distance between the predicted and the actual target variable.
Expressed in mathematical terms, the goal of OLS regression modelling is to minimise the error (e), also known as the sum of squared residuals, which is the difference between predicted value and the actual value y as per Equation (2): As can be noted in Equation (2), in order to eliminate negative values, the error values are squared and summed across all data points.
Key shortcomings with OLS are its linearity assumption between the response and predictor variables and its inability to deal with collinearity (Kolkman & Van Witteloostuijn, 2019;Van Liebergen, 2017). Nonetheless, OLS is one of the most popular techniques in academic research. Kolkman and Van Witteloostuijn (2019) describe OLS as the empirical "workhorse" in academia. The algorithm was included in this study as the traditional benchmark so as to enable cross-method comparisons with the two other algorithms evaluated.

Least absolute shrinkage and selection operator (LASSO)
The LASSO method is mainly used to achieve simultaneous parameter estimation and model selection in regression analysis (Muthukrishnan & Rohini, 2016). This algorithm zero weights covariates with low explanatory power and allows one to work with an interpretable parsimonious model (Aziz & Dowling, 2019;Leo et al., 2019;Melkumova & Shatskikh, 2017). Casella et al. (2017) find that the LASSO technique performs better than OLS, and another related technique called ridge regression, in predictive analytics. The LASSO technique shares similarities with OLS, save that, unlike the latter, LASSO employs the penalty function. In essence, LASSO is a simple OLS technique with feature selection and regularisation embedded in it. Following Muthukrishnan and Rohini (2016), we defined our LASSO estimates as per Equation (3) below: Based on Equation (3), λ ≥ 0 is a tuning parameter, and when λ = 0 the penalty has no effect and LASSO will produce similar estimates to those of least squares. However, λ→ ∞, forces some of the coefficient estimates to zero, thereby performing forward-looking variable selection. LASSO effectively deals with the problem of collinearity among predictors by selecting only one and shrinking other variables to zero, thereby producing stable and accurate predictions (Casella et al., 2017;Muthukrishnan & Rohini, 2016).

Artificial neural networks (ANNs)
ANN algorithms are inspired by the structure of the internal functioning of the human brain and nervous system (Shalev-Shwartz & Ben-David, 2014). The technique aims to solve problems by mimicking the human brain, through learning from past experiences and then making use of those learnings as a basis for making future decisions. This technique differs from traditional statistical techniques in that it is non-parametric, i.e., it makes no presumptions on the data distribution (Youn & Gu, 2010). ANN algorithms have become popular for implementing machine learning (Krishna et al., 2017) owing to their ability to yield an effective learning paradigm that produces excellent performance on various learning tasks (Shalev-Shwartz & Ben-David, 2014). The neural network is a network of connected nodes, and for each node, inputs are summed before being linearly transformed. Equation (4) presents an ANN mathematically: The motivations for the adoption of this technique include its flexibility-in increasingly complex data structures-in addressing outliers, missing data, multicollinearity, and nonlinearities (Gepp & Kumar, 2012;Merkel et al., 2018). The advantage of ANN algorithms lies in their versatility, as they can be applied to virtually any learning task, Supervised Machine Learning for Predicting SMME Sales be it regression, classification, or even unsupervised learning tasks (Leo et al., 2019;Youn & Gu, 2010). The class of ANN we used in this study is the multilayer perceptron (MLP), which is also referred to as a multilayer feedforward network (Lantz, 2019).

Existing comparative f indings on the three tools
Findings reported in the existing literature shown that, generally, in terms of predictive performance across different fields, ANN algorithms perform better than OLS. Nghiep and Al (2001) find that compared to the OLS technique, ANNs performed better in predicting residential property value. This finding is in line with the Farahani et al. (2016) study, which evaluates the performance of ANNs and OLS techniques to predict car sales and finds ANNs superior. Ahangar et al. (2010) establish the superior performance of ANNs compared to OLS in predicting the stock price of listed companies. Croda et al. (2019) establish that ANNs have a very high predictive accuracy compared to traditional statistical techniques in sales forecasting, even when presented with a small dataset. Accordingly, alternative methods aiming to improve OLS, such as the LASSO technique as per Equation (3), have been established (Casella et al., 2017;Tibshirani, 2011). Ratnasena et al. (2021) find that compared to ANNs, the LASSO technique more accurately predicts the condition of tapes in sampled US cultural heritage institutions. Das et al. (2018) find that LASSO performs better than both ANNs and (as expected) OLS in predicting rice yields in India. Castelli et al. (2020) find that LAS-SO is more accurate than ANNs in predicting online property trends in Bulgaria. Utilising European Environmental Agency air pollution data, Chen et al. (2019) find that both LASSO and OLS have superior predictive performance compared to the ANNs in predicting the annual average concentration of fine particle and nitrogen dioxide across Europe. Strandberg and Låås (2019), using data on Swedish companies, find that ANNs perform significantly better than LASSO in predicting sales performance.
Droomer and Bekker (2020), utilising a large database of US online grocery stores, find that ANNs outperform other modern and complex algorithms like XGBoost in predicting customers' purchasing behaviour. Croda et al. (2019), using a small Mexican chemicals wholesaler dataset, establish that ANNs produce highly accurate sales predictions. Wang et al. (2019) demonstrate the high accuracy of ANNs in predicting the annual sales of Taiwanese manufacturing enterprises. Penpece and Elma (2014) show that ANNs produce sales predictions that are close to the actual data of Turkish retail stores.

Study design and methodology
The study used R version 3.6.3, an open source software developed by the R Development Core Team (2019).

Dataset preparation
The three-year longitudinal dataset, containing information on 191 manufacturing SMMEs in KwaZulu-Natal Province for the years 2015 to 2017, was accessed from McFah Consultancy, a Durban-based company focusing on business and tax advisory services for SMMEs. The majority of the SMMEs (61%) in the dataset were from eThekwini Metropolitan Municipality (greater Durban), followed by King Cetshwayo District (11%), uThukela District (10%), uMgungundlovu District (7%), iLembe District (3%), Amajuba District (3%), Ugu District (2%), Zululand District (2%), uMzinyathi District (1%) and uMkhanyakude District (1%). There were no SM-MEs from Harry Gwala District. The data had the following features: sales, owner's gender, enterprise location, owner's year of birth, total assets value, permanent employees, temporary employees, digital marketing medium use, website use, enterprise registration type, and registration year. Three macroeconomic variables were also included in the dataset: gross domestic product (GDP) and unemployment statistics from Statistics South Africa (2018), and the purchasing managers' index (PMI) from the Bureau for Economic Research (n.d.).

Target variable
Since the interest, for this study, was in evaluating the predictive potency of OLS, LASSO, and ANNs with respect to enterprise performance, it was important to define the target variable based on the dataset. In line with previous studies, enterprise performance was proxied by sales (Buyinza, 2011;Panda, 2015;Phillipson et al., 2019), which we coded as LogSales.

Independent variables
Total assets were coded as LogTA, total number of permanent workers as Pemp, number of temporary workers as Temp, and labour productivity, which was measured by sales per employee, was coded as Prod. The SMME owner's gender-proxied by 1 for male and 0 for female-was coded as Gen. The SMME owner's age, which was measured as the difference between the panel dataset period (2015 to 2017) and year of birth, was coded as EntAge. Having a website-proxied by 1 for enterprises with a website and 0 for those without-was coded as Web. The company's age, which was coded as CoAge, was measured as the difference between the panel data period and the year of registration. The SMME's registration type, which was the legal structure of the participating enterprises, was defined by 1 for limited liability registered enterprises and 0 for other, and this variable was coded as Reg. For digital marketing, coded as DigMkt, the dummy variable 1 was used for those using one or more of three digital marketing platforms (Facebook, Twitter, and Instagram) and 0 for those not using any of these platforms. SMME location, coded as Loc, was proxied by 1 for those based in eThekwini Metropolitan Municipality and 0 for those located in district municipalities.
Three additional polynomial features were constructed to assess the nonlinear effects of these variables on enterprise performance. These were the owner's age squared (EntAge2), the SMME's age squared (CoAge2), and temporary workers squared (Temp2).
Finally, three external variables were used: the national annual economic growth rate, coded as GDP; the national unemployment rate, coded as Unemp; and the purchasing managers' index, coded as PMI, calculated as the average annual PMI rate for each of the three years between 2015 and 2017. Exploratory analysis showed that the dataset was not stationery; to address this, we followed Curran-Everett (2018) by log transforming all continuous variables (i.e., sales, total assets, permanent workers, temporary workers, productivity, owners' ages, and SMMEs' ages). Consequently, the transformation stabilised the variance of all continuous variables.

Hypothesis-testing
Model-building was done after conducting hypotheses tests to establish variables with an impact on sales performance. Hypothesis testing is an important step in model building, as this enables the identification of key factors which impact the target variable (Punam et al., 2018). The benefit of this step is that the data features selected for training the algorithm are those that best explains sales performance, and irrelevant features, which tend to adversely impact model accuracy due to data redundancy, are removed. Furthermore, a model built using important variables tends to minimise the challenge of overfitting, the model training time is significantly reduced, and overall, the model performs better when applied to real world problems (Venishetty, 2019). An in-depth literature review was thus conducted, and Table 1 provides the 13 hypotheses that were derived for empirical investigation to identify features with a significant effect on SMMEs' sales performance that were then used for model-building.  (Amran, 2011;Bardasi et al., 2011;Essel et al., 2019) H2: Entrepreneur's age has a significant nonlinear effect on enterprise performance (Amran, 2011;De Kok et al., 2010;Kaunda, 2013) H3: Labour productivity has a significant effect on enterprise performance (Bellone et al., 2008;Bigsten & Gebreeyesus, 2007;Esteve-Pérez & Mañez-Castillejo, 2006) H4: Permanent employees have a significant effect on enterprise performance (Clinebell & Clinebell, 2007;Pauka, 2015;Thorsteinson, 2003) H5: Temporary employees have a significant nonlinear effect on enterprise performance (Chadwick & Flinchbaugh, 2016;Pauka, 2015;Roca-Puig et al., 2012;) H6: Total assets have a significant effect on enterprise performance (Al-Ani, 2013; Gupta et al., 2013;Maggina & Tsaklanganos, 2012) H7: SMME's age has a significant nonlinear effect on enterprise performance (Coad et al., 2018;Loderer & Waelchli, 2010;Rijkers et al., 2010) H8: Limited liability registration type has a significant effect on enterprise performance (Adegbite et al., 2007;Muriithi, 2017;Small Business Project, 2014;) H9: Usage of digital media platforms significantly impacts enterprise performance (Camilleri, 2018;Jobs & Gilfoil, 2014;Parsons, 2013) H10: Website usage has a significant effect on enterprise performance ( Jobs & Gilfoil, 2014;Meroño-Cerdan & Soto-Acosta, 2005;Parsons, 2013) H11: National unemployment rate has a significant effect on enterprise performance (Halicioglu & Yolac, 2015;Huggins et al., 2017) H12: GDP growth rate has a significant effect on enterprise performance (Egbunike & Okerekeoti, 2018;Klapper & Richmond, 2011;Motoki & Gutierrez, 2015) H13: PMI has a significant effect on enterprise performance (Harris, 1991;Koenig, 2002) Note: Based on the consulted literature, three of the variables-entrepreneur's age, temporary workers and SMME's age-were each expected to have a nonlinear effect on performance. This suggests that over time, unlike the other 10 variables, each of these three was expected to have a turning point, i.e., a point when the variable switches from having a negative effect to having a positive effect (or vice versa) on firm performance.
To empirically test the above hypotheses, the random effects within between (REWB) panel data modelling technique (Bell et al., 2019) was used. The distinct advantage of REWB over other techniques, such as fixed effects or random effects, is that the former simultaneously captures both micro and macro associations of the independent variables on the target variable (Bell & Jones, 2015;Bell et al., 2019).
The hypotheses-testing step was important as it enabled us to identify drivers with a significant impact on the target variable, and to drop those without any material effect (Punam et al., 2018;Cheriyan et al., 2018). Eventually a total of 11 variables (including three polynomial features) were found to have a significant impact on enterprise sales performance: • Prod, Pemp, Temp, Temp2, LogTA, CoAge and Unemp at 1% significance level; • CoAge2 and DigMkt at 5% significance level; and • EntAge and EntAge2 at 10% significance level.
These identified variables were then used in building the machine learning models for OLS, LASSO, and ANNs, which were then evaluated to establish which one has the superior sales predictive accuracy.

Data partitioning, sales performance modelling, evaluation
The next step was the dataset partitioning, which, as discussed above, is one of the critical elements in machine learning. For this study, following a related study (Delen et al., 2013), a 70:30 split ratio was used to generate the training and test datasets. Using these two datasets, three sales performance predictive models-one each for OLS, LASSO, and ANNs-were built and evaluated. Figure 1 provides graphical representations of the predictive performance of each of the three algorithms-OLS, LASSO, and ANNs-on the test dataset. (The OLS algorithm was fit using the plm function in R. The LASSO algorithm was fit using the glmnet function in R, and 10-fold cross validation was performed to identify the optimal tuning parameter λ. The neuralnet function in R was used to fit the ANN algorithm, and the model with 2 neurons provided the best output and thus was used for further computations on the test dataset.)

Findings and analysis
The comparison shows that the OLS and LASSO algorithms' predictive performances are highly similar, and neither fits the data nearly as well as the ANN algorithm, which performs extremely well. Thus, the visualisations indicate that the ANN algorithm provides a more accurate sales predictions than do the other two algorithms. We also more formally evaluated each technique's predictive performance using five established model evaluation metrics: mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute scaled error (MASE), and median absolute error (MDAE) (Casella et al., 2017;Hyndman & Koehler, 2006;Kolkman & Van Witteloostuijn, 2019;Muthukrishnan & Rohini, 2016;Punam et al., 2018;Tsoumakas, 2019). For each of the assessment metrics, the lower the value the better the algorithm's performance in predicting SMMEs' sales. The formal evaluation of the predictive models is presented in Table 2. The assessment, as per Table 2, above shows that the ANN algorithm clearly outperforms other machine learning algorithms, as shown by the very low MSE, RMSE, MAE, MASE, and MADE values compared to the other two algorithms. The worst performing, as was expected, was OLS, with LASSO compared to the former showing some improvement on all the assessment metrics. Based on the assessment above and graphical analysis as per Figure 2, the ANN algorithm was thus selected as the best performing machine learning algorithm for sales predictive modelling.
Further to the evaluation of the predictive models, we also computed variable importance for each algorithm, as per Figure 3. From the graphical presentations it was clear across the three models that, generally, productivity and permanent workers are the two most important variables that positively influence SMMEs' sales performance. However, in respect of those variables which negatively influence sales performance, the results were mixed.

ANN
The ANN technique identified the excessive utilisation of temporary workers (Temp2) as negatively impacting on sales, while OLS indicated the opposite and LASSO showed no impact. Another feature which generated conflicting importance ratings was digital marketing, with OLS and LASSO highlighting it as having a negative impact on sales performance, while the ANN algorithm indicated that it had a positive effect.
The mixed findings show the importance of selecting the proper technique based on an objective criterion such as predictive accuracy. In this case, SMME owners would benefit most from exploiting ANN algorithms.

Conclusion and recommendations
The assessment found that the ANN approach was far superior to the other two machine learning approaches across all the assessment metrics, with the LASSO technique coming a distant second. The superior performance of the ANN algorithm, despite the inclusion of non-linear factors, shows the algorithm's versatility in identifying and incorporating functional relationships among variables in its predictive modelling process. This is in line with the assertion by Youn and Gu (2010) that, owing to their less restrictive assumptions when engaging with the dataset, ANNs tend to provide more accurate and reliable predictions than other algorithms.
More specifically, and in line with the other existing literature to date, this study provides stakeholders in the SMME sector with a basis for selecting ANNs to conduct sales predictive modelling and to inform strategic decision-making that can drive sustainable SMME growth. It is recommended that governments and other pertinent stakeholders develop and make available sales predictive applications powered by ANNs to manufacturing SMMEs in order to assist them in conducting predictions and developing data-driven plans. It is also recommended that future studies utilise larger datasets, covering periods longer than three years, to evaluate ANNs, and to compare ANNs' predictive performance with that of other complex techniques such as deep learning and support vector machines.