A survey of automated financial statement fraud detection with relevance to the South African context

Mongwe, Wilson T.; Malan, Katherine M.

doi:10.18489/sacj.v32i1.777

Servicios Personalizados

Articulo

Traducción automática

Indicadores

Accesos

Links relacionados

Citado por Google
Similares en Google

Otros
Otros

Permalink

South African Computer Journal

versión On-line ISSN 2313-7835
versión impresa ISSN 1015-7999

SACJ vol.32 no.1 Grahamstown jul. 2020

http://dx.doi.org/10.18489/sacj.v32i1.777

RESEARCH ARTICLE

A survey of automated financial statement fraud detection with relevance to the South African context

Wilson T. Mongwe; Katherine M. Malan

Department of Decision Sciences, University of South Africa; Wilson T. Mongwe wilsonmongwe@gmail.com (corresponding), Katherine M. Malan malankm@unisa.ac.za

ABSTRACT

Financial statement fraud has been on the increase in the past two decades and includes prominent scandals such as Enron, WorldCom and more recently in South Africa, Steinhoff. These scandals have led to billions of dollars being lost in the form of market capitalisation from different stock exchanges across the world. During this time, there has been an increase in the literature on applying automated methods to detecting financial statement fraud using publicly available data. This paper provides a survey of the literature on automated financial statement fraud detection and identifies current gaps in the literature. The paper highlights a number of important considerations in the implementation of financial statement fraud detection decision support systems, including 1) the definition of fraud, 2) features used for detecting fraud, 3) region of the case study, dataset size and imbalance, 4) algorithms used for detection, 5) approach to feature selection / feature engineering, 6) treatment of missing data, and 7) performance measure used. The current study discusses how these and other implementation factors could be approached within the South African context.
CATEGORIES: • Computing methodologies ~ Machine learning • Applied computing ~ Economics

Keywords: financial statement fraud, automated fraud detection, machine learning, corporate auditing

1 INTRODUCTION

A 2018 study by audit firm PricewaterhouseCoopers (2018) states that 77% of South African companies surveyed have experienced some form of economic crime. According to the report, South Africa had the highest percentage of economic crime in the world in 2018, with Kenya and France coming in at positions two and three, respectively. One of the economic crimes reported to have been experienced by the companies is accounting fraud, a subset of which is management fraud (or financial statement fraud). In South Africa, accounting fraud experienced by the companies surveyed increased from 20% in 2016 to 22% in 2018 (Ibid., 2018). This suggests that accounting fraud is becoming more common in South Africa. This poses a risk to the stability of South African capital markets.

Accounting fraud cases prominent over the last two decades include Enron, which was an American natural gas company that used creative accounting to make it appear as if the firm was growing, but eventually lost over $60 billion in market capitalisation from January 2001 to January 2002 when the allegations of fraud emerged (Healy & Palepu, 2003; Macey, 2004); Steinhoff, which is a South African retailer that lost over R200 billion in market capitalisation in a space of two weeks after it emerged that accounting fraud had allegedly been perpetrated by the management of the firm (Cronje, 2017). Furthermore, the Public Investment Corporation, which manages the pension fund assets for South African government employees, lost at least R19 billion due to its direct and indirect investments in Steinhoff (Donnelly, 2018; Presence, 2018).

Financial statements are important for various stakeholders such as regulators, tax authorities, investors and creditors, who use these statements to make decisions. For example, investors use financial statements as an input into their investment decisions, creditors use the financial statements to decide if they can grant the company a loan and tax authorities use financial statements to determine how much tax the company should pay to the government. Thus, the information contained in the financial statements has to be accurate and reflect the true financial standing of the company. Any inaccuracies in the financial statements can result in large losses to various stakeholders as highlighted in the Enron and Steinhoff cases mentioned above.

Understanding the nature of financial statement fraud is not only important for South Africa, but the world economy at large. Due to globalisation and the free flow of capital across countries, a majority of companies now have a presence across multiple jurisdictions across the globe. Thus, the effect of financial statement fraud occurring in one country can easily spread to multiple countries around the world. This can have a disastrous impact on the world economy. An example of this is how the collapse of Steinhoff led to Mattress Firm, the largest mattress retailer in the USA, almost filing for bankruptcy (Crotty, 2018b).

The different kinds of fraud that can be present in a financial statement include, but are not limited to: the manipulation of the company's earnings and cash flows (J. L. Perols & Lougee, 2011; Schilit, 2010), intentional omission of significant information (e.g. large expenditure) and the misapplication of accounting principles and policies in the preparation of the financial statements (Zhou & Kapoor, 2011). Typically, such fraud is perpetrated by or with the knowledge of the management of the firm (Kirkos et al., 2007; Macey, 2004; Price-waterhouseCoopers, 2018; C. Spathis et al., 2002). The management of the firm might engage in fraudulent activities to increase their personal rewards such as job security, salaries and bonuses (C. Spathis et al., 2002). In some instances, the management fraud is perpetrated with the knowledge of the firm's external auditor, as was the case with Enron and its external auditor Arthur Anderson (Macey, 2004).

In practice, the detection of fraud in financial statements is usually left to the external auditors of the firm (Moepya et al., 2016). However, according to the International Standard on Auditing 240 (International Federation of Accountants (IFAC), 2009), auditors are tasked with ensuring that the information contained in the financial statements do not contain any material intentional or unintentional misstatements. It remains the responsibility of the management of the company, and not that of the external auditors, to ensure that the financial statements are free from any fraud (Chong, 2012; International Federation of Accountants (IFAC), 2009; Kassem & Higson, 2012; Kirkos et al., 2007). Since the management of the firm know about the limitations of a normal external audit, they may act in a manner that deceives the external auditors (C. Spathis et al., 2002). Thus, automated analytical procedures can be useful to stakeholders in the detection of financial statement fraud.

Over the past decade, data mining techniques have been successfully applied in detecting fraud in credit card transactions, telecommunications, computer intrusion, health care insurance claims and in automobile insurance claims (Abdallah et al., 2016; Kou et al., 2004). Data mining based fraud detection methods have also proven to be useful as a means to detect fraud present in financial statements (Fanning & Cogger, 1998; Glancy & Yadav, 2011; Gupta & Gill, 2012; Kirkos et al., 2007; Lin et al., 2003; Moepya et al., 2016; J. L. Perols & Lougee, 2011; Ravisankar et al., 2011; C. T. Spathis, 2002; Yao et al., 2018). Data mining based approaches are preferable to statistical approaches as one does not need to make assumptions about the statistical distribution of the data under investigation (Kirkos et al., 2007). Data mining methods can be useful decision support tools for auditors to flag companies which may have perpetrated fraud in their financial statements. This is particularly important as auditors may need to audit many companies at the same time. Using an automated approach can reduce the turn around time of the audits and improve the quality of the audits.

The current study contributes to knowledge by organising the literature found in the automated financial statement fraud (FSF) detection domain and by identifying gaps in the research. This work differs from other surveys (Sharma & Panigrahi, n.d.; Wang, 2010) in the FSF detection literature in that this survey focuses on all aspects of the FSF detection problem; from how the fraud is defined (e.g. based on audit opinions), to how the performance of the different FSF detection methods is assessed (e.g. using accuracy). This work thus extends the existing surveys by looking at the other dimensions of the problem, and thus presenting a more complete picture of the important aspects of the problem. This should aid practitioners when they deploy automated FSF detection models in practice.

This paper is organised as follows: Section 2 presents a background to the FSF domain, Section 3 provides a thorough literature survey and Section 4 presents the findings of the survey. Finally, Section 5 provides conclusions and possible future research directions.

2 BACKGROUND TO FINANCIAL STATEMENT FRAUD

This section provides the background into financial statement fraud. An overview of the contents of annual reports is presented, a discussion on the audit process (which is followed by the firm's external auditors) is provided and then the nature of financial statement fraud is presented using a real world example.

2.1 Overview of annual reports

It is a common requirement across the world for companies to publish their annual reports. The role of the annual reports is to present and discuss the financial health of the company. This allows the various stakeholders to make informed decisions about the company. The typical annual report for a South African company consists of the following sections: 1) statement of responsibility by the board of directors, 2) comments from the company secretary, 3) the executive report, 4) audit and risk committee report, 5) the independent auditors report on the financial statements and 6) the annual financial statements of the company.

The financial statements of the company consist, at a high level, of the following statements:

• Statement of comprehensive income (also known as the income statement)-this statement details the revenue and expenses incurred by the firm over the past financial year. The higher the revenue and the smaller the expenses, the better the financial health of the company.

• Statement of financial position (also known as the balance sheet)-this statement lists the assets and the liabilities (or debt) of the firm. The company is in good financial standing if the assets are much larger than the liabilities.

• Statement of changes in equity-this statement details how the shareholders' equity has changed over the past year. It will include, amongst others, details as to how much has been paid out as dividends, and how many new shares have been issued by the company over the past financial year.

• Statement of cashflows-this statement contains the actual cash received and actual cash spent during the past financial year. The higher the cash inflows and the lower the cash outflows, the better the financial health of the company.

• Notes to the annual financial statements-this section of the financial statements provides a more detailed break down of the line items contained in the statements listed above. This is to help the users of the financial statements to fully understand how the numbers contained in the statements came about, so that the users can independently reproduce the numbers.

The statements above are usually summarised and analysed using financial ratios. The financial ratios are used by stakeholders as inputs to their decision making process. As an example, the ratio of debt to assets (calculated as liabilities divided by assets) represents the number of times the assets cover the liabilities. A ratio that is less than one indicates that the company is in good financial standing.

2.2 The financial statement audit process

In South Africa, and in many other countries across the world, the public companies' published financial statements are usually audited by independent external auditors. External auditors are employed by the company's board of directors on a contract basis. The audit process is overseen by the company's audit committee, which is composed of the company's board members.

The role of the external auditors is to ensure that the firm's financial statements are free from any material misstatements either due to error or fraud. The misstatements relate to financial statements that are either incomplete or incorrect in some form. As mentioned in Section 1, it is not the responsibility of the auditors to detect fraud in the financial statements. However, should the external auditors suspect that fraud has been committed by the firm, the process is passed on to forensic auditors who can then establish whether fraud has been perpetrated or not.

At a high level, the process followed by the external auditors in auditing the company's financial statements involves 1) obtaining all the financial statements from the firm, 2) asking the firm's management to provide supporting documents if such documentation is deemed necessary by the auditors and 3) providing an auditing report detailing the external auditors audit opinion.

The audit opinion expressed by the external auditors on the financial statements of South African companies falls broadly into the following categories:

• Clean audit opinion-this is when the external auditors did not find any material misstatements in the financial statements. Note that a company can receive a clean audit, but could still have committed fraud in the financial statements.

• Qualified audit opinion-this is when the auditor has found material misstatements in the financial statements for specific amounts, or there was not sufficient evidence provided by the management for the auditors to asses whether specific amounts included in the financial statements are not materially misstated. This however, does not necessarily mean the company has committed fraud.

• Adverse audit opinion-this is when material misstatements are not limited to specific amounts, or the misstatements affect the majority of the financial statements. As with receiving a qualified audit opinion, this does not necessarily imply that fraud has been committed.

2.3 Financial statement fraud - the Enron example

Enron was founded in 1985 and began as an American natural gas company, and later expanded into an energy trading business. Enron's share price grew over 311% from the early 1990s to 1998 (Healy & Palepu, 2003). A year before its collapse, Enron was rated, in Fortune magazine's survey, the most innovative company in America (Bratton, 2002; Healy & Palepu, 2003).

The Enron fraud, perpetrated by its management, can be summarised as follows (Bai et al., 2008; Healy & Palepu, 2003):

• It manipulated its earnings through mark to market accounting. This allowed Enron to recognise income from long term energy contracts before it actually materialised. When the income did not materialise, it was moved to special purpose entities (companies that are separate from Enron). In this way, Enron was manipulating its income and cash flow statements.

• It entered into debt through special purposes entities, resulting in the debt not showing on its balance sheet. This is what is known as off balance sheet transacting. In this way, Enron was manipulating its statement of financial position to hide its debt levels from investors.

The Enron example illustrates how financial statement fraud can occur within a company. It highlights that the nature of financial statement fraud involves the intentional manipulation of the company's accounts in order to deceive the various stakeholders, and likely with the direct involvement of the company's management.

3 AUTOMATED FINANCIAL STATEMENT FRAUD DETECTION

This section presents a thorough literature survey of automated FSF detection. Section 3.1 discusses the key issues that should be considered when implementing automated FSF detection decision support systems and then Section 3.2 provides the literature survey, which focuses on the identified implementation issues.

3.1 Implementation issues

This section outlines the issues one should consider when implementing automated financial statement fraud detection. The issues highlighted below were chosen as these were the recurring themes in the FSF literature.

3.1.1 Fraud definition and data features used

The definition of FSF used is important because the broader the definition, the more fraud instances that will be present in the sample and vice versa. For example, one could define a fraudulent financial statement as one that has received a qualified or adverse audit opinion from an external auditor. This is the easiest definition to use in the South African context as all firms listed on the Johannesburg Stock Exchange (JSE) are required to have their financial statements audited by an external auditor (Johannesburg Stock Exchange, 2015). Another definition of FSF is to define firms as fraudulent if they were investigated and found guilty by authorities. These investigations include those undertaken by, amongst others, the Securities and Exchange Commission (SEC) in the USA, Capital Markets Board of Turkey (CMBT) in Turkey, China Securities and Regulation Commission (CSRC) in China and the Financial Sector Conduct Authority (FSCA) in South Africa. The investigations by the authorities tend to take long (with conclusions occurring many years later after the original fraud was perpetrated), which may not be ideal for investors attempting to minimise losses by not investing in firms that commit FSF. Using this definition could potentially result in smaller fraud instances in the resultant data set, but would be more rigorous when compared to using qualified or adverse audit opinion as the FSF definition. The definition of FSF could also extend beyond the legal definition of fraud and incorporate the ethical practices of the firm. For example, firms that engage in earnings management, which involves manipulating the earnings of the company but within the accounting policies (and thus legal), could be considered as being fraudulent if the ethical aspect is considered in the FSF definition.

Another important consideration in the implementation of automated FSF detection is the data features that are used in building the FSF detection models. The data features can be structured numerical data (which would be extracted from the financial statements) or unstructured text data, or even a combination of both these data features. The data used could be limited to the financial statement data only, or it could also include other parts of the annual report such as the comments by the management of the firm, or even financial news published in the media about the company. Furthermore, one has to decide if data from listed or private companies, or a combination of both, should be used in building the decision support tool. Data for listed companies is easier to retrieve compared to data for private companies. The chosen data features will have an impact on the type of models that can be used to build the decision support tool.

3.1.2 Data issues

In the FSF domain, the number of fraud instances are orders of magnitude fewer than the number of non-fraudulent cases. This is because FSF is a rare event, and it is very difficult to detect due to its nature changing through time (known as concept drift). The rarity of FSF leads to a data or class imbalance problem. The general approaches to dealing with the data imbalance problem include minority class over-sampling or under-sampling the majority class (Chawla et al., 2002). Another approach to deal with the data imbalance problem is to use cost sensitive learning, where different weights are placed on false negatives (classifying a firm as non-fraudulent when it fraudulent) compared false positives (classifying a company as fraudulent when it is not fraudulent) when training the FSF models (Moepya, Akhoury, & Nelwamondo, 2014).

The data set typically consists of financial statements from different firms over a specific time period. The size of the financial statement data sample used to build the decision support tool is important because the more data one has to build the models, the more likely that the models will be generalisable and thus perform well on unseen data instances. Although using a lot of data to build models is the ideal situation, in practice one may not have access to all the data that is needed.

Data retrieved from data vendors (e.g. Bloomberg and Reuters) often includes missing data for one or more of the attributes (Kiehl et al., 2005; Moepya et al., 2016). The data may be missing for various reasons such as technical glitches in the system, or the data does not actually exist in reality. How one deals with missing data for FSF detection is important because if one simply deletes records that have missing data, one would be throwing away possibly useful information. In addition, and importantly, one does not want to add data that does not exist in reality. For example one could impute (that is, replacing a missing value with an estimate) a dividend number for a company when it was missing from the database, while the company did not issue any dividend for the financial year under consideration (Moepya et al., 2016).

3.1.3 Methods used

The methods used to build FSF decision support systems are mostly either statistical or machine learning (data mining) based approaches. The statistical approaches make assumptions about the distribution of the data, while the machine learning approaches do not. The FSF detection approaches can be either supervised (where the data used is labelled) or unsupervised (where the data is not labelled). For the supervised learning case, the FSF problem is treated as a classification problem, while for the unsupervised learning case it is treated as a clustering problem. The approach taken will depend on, amongst other factors, the availability of the labelled data, the performance of the models and on the computational complexity of the methods.

Some of the common supervised classification methods used in the FSF detection literature are listed in Table 1. On the other hand, the unsupervised learning approaches used in the FSF literature are: self organising map (SOM), which is an unsupervised neural network; k-means clustering, which groups objects that have similar characteristics; growing hierarchical self organising maps (GHSOM), this being an extension of SOM; and latent Dirichlet allocation, a topic model which uses a Bayesian approach to extract topics from text.

In addition, one could create the FSF detection decision support system by building an ensemble of models. The ensemble may be created by stacking the different models on top of each other so that the outputs from one model are the inputs into the next model, training the models on the same inputs, and then combining their outputs using some criteria. The ensemble approach should typically perform better than the individual models if the individual models are independent of each other.

3.1.4 Feature selection and engineering

The variables that are available to be used as inputs in an FSF detection model are numerous, resulting in the problem being highly dimensional. Correctly selecting the variables to use as inputs for the models is important because it can influence the performance of the models. Using a subset of the most important features as opposed to the set of all features can improve classifier performance and reduce computational complexity (Chandrashekar & Sahin, 2014). There are broadly two categories of feature selection approaches: filter based feature selection, where the variables are ranked and the significant ones selected, and wrapper based feature selection, where the classifier performance is used in selecting the features (Guyon & Elisseeff, 2003). Examples of feature selection methods used in the FSF literature include statistical techniques such as analysis of variance (ANOVA), t-tests, Krustall-Wallis test, Manh-Whitney test and chi-squared test.

In addition, instead of selecting the most significant variables, one could reduce the dimensionality of the problem by using feature engineering approaches such as principal component analysis (PCA).

3.1.5 Performance measures

Choosing the performance measure of a fraud detection model is important because the performance measure selected should take into account the salient features of the problem domain. The FSF domain has high class imbalance and the cost of incorrectly classifying a company as not fraudulent when it is fraudulent is high compared to the cost of classifying a company as fraudulent when it is not fraudulent. Thus, the performance measure must take these factors into account.

The following definitions are used when classifying data instances:

true positive (TP): where a fraudulent firm is correctly classified as fraudulent,

false negatives (FN): where a fraudulent firm is classified as non-fraudulent (this is also known as a Type II error when expressed as a probability),

false positive (FP): where a non-fraudulent firm is classified as fraudulent (this is also known as a Type I error when expressed as a probability), and

true negative (TN): where a non-fraudulent firm is correctly classified as non-fraudulent. Examples of performance measures used in the FSF detection domain are given in Table 2.

Other examples of performance measures include receiver operating curve (ROC) as well as area under the curve (AUC). ROC plots the true positives from the model on the y-axis against the false positives on the x-axis. AUC is the area under the ROC, and represents the average miss-classifications rate. AUC is useful as a performance measure when the costs of classification are unknown, which is the case for the FSF domain (Bradley, 1997; Gaganis, 2009).

3.2 Literature survey

In this section a thorough, but not exhaustive, literature survey of the FSF detection domain is provided. The survey has been organised into Table 3 based on the topics discussed in Section 3.1. The table has been ordered by year, with the oldest papers being displayed first and the recently published papers being displayed last. Within a given year, the papers are ordered alphabetically by surname of the first author.

In Table 3 the referenced paper is the first entry in the table and is in bold font. The data issues row is expressed as country of the study, size of the financial statement data set used and the percentage of fraudulent instances present in the sample. We also indicate on the data issues row if any matching of fraudulent to non-fraudulent companies was performed. The feature selection row consists of both feature selection and feature engineering approaches. If the feature selection approach was not discussed in the referenced paper, then that entry is left out. Note that the missing data treatment variable is not included as a row in Table 3 as very few papers deal with this issue directly. It is however discussed in the findings presented in Section 4.

4 FINDINGS

In this section we present the findings from the FSF detection literature survey that was provided in Section 3.2. The findings are organised around the topics discussed in Section 3.1. The results of the literature survey are first presented, and then the remainder of this section discusses the results.

Tables 4 to 6 provides a summary of some of the aspects of FSF detection as reflected in the studies surveyed in this paper with a visual representation of changes over time. Table 4 shows three different definitions of FSF and the total number of studies from the survey that have used each definition. In the table, each digit '1' under a time period indicates a study using the given definition of fraud. Similarly, Tables 5 and 6 show the data feature type and the most commonly used detection methods, respectively. In Table 6, the digit '2' is used to indicate two studies using the given method. Note that most studies used multiple methods, so the totals do not correspond to the number of studies for each period. Table 7 summarises the overall findings from the survey, where the percentages in the brackets show the proportion of studies in the survey that used the particular approach.

4.1 Fraud definition and data features

The most common definition of FSF used in the literature is investigations by authorities, such as those conducted by the SEC in the USA. This definition is used by 63% of the surveyed literature.

The second most common definition of fraud is receiving a qualified audit from an external auditor. This definition is used by 23% of the papers surveyed. In and of itself, receiving a qualified audit opinion from an auditor is not an indication of fraud, it merely indicates that there are material misstatements that were found in the financial statements. These misstatements could be from error or fraud. Moreover, this definition is not appropriate within the South African context as there have been numerous cases in South Africa where auditors were caught wanting, as was the case with Steinhoff (Cotterill, 2018; Crotty, 2018a). Thus, if one wants to capture instances of fraud, a much more rigorous definition, such as judgements or investigations against fraudulent companies by authorities would be more appropriate. This definition would also reduce the human subjectivity of using audit opinions, which are determined by the auditors, as the definition of financial statement fraud. However, this more rigorous definition would be difficult to use in South Africa as there are no detailed public data sets of FSF released by the regulators as is the case with the SEC in the USA. Note however that the South African FSCA does publish a report of fines issued against companies on its website, but it is not as detailed as that of the SEC. Thus, an unsupervised learning approach, which does not require a labelled data set, would more appropriate in the South African context. The unsupervised approach has been used on companies listed on the Taiwan and the Chinese stock exchanges and provided promising results (Deng & Mei, 2009; Huang, Tsaih, & Lin, 2014; Tsaih et al., 2009). As these are emerging markets, using an unsupervised approach could potentially provide good results when applied to a South African data set as South Africa is also an emerging market.

A majority of the companies used in the surveyed FSF literature are listed companies, with the use of data from private companies being limited. No governmental entities or non-profit organisations were considered in the literature surveyed. The most common data feature used are numerical data. The numerical data are in the form of financial and non-financial variables. As shown in Table 5, financial ratios are used by 52% of the papers surveyed. Financial ratios are often preferred because financial ratios summarise the financial statements of the company. Any form of manipulation of the financial statements of the company will often translate into either smaller or larger financial ratios than expected.

A combination of financial and non-financial data is used by 31% of the studies. Incorporating the non-financial data, often in the form of corporate governance variables, has often improved the detection of FSF than when using financial ratios alone (F. H. Chen et al., 2014; Gaganis et al., 2007). The use of text data in detecting FSF is relatively small at 13% and has been limited to text from the annual reports. In addition, Dong et al. (2018) used social media data to detect FSF and found that using this data can aid in FSF detection. Using more of the text in the financial report, including the composition of directors and how these directors and companies link to other companies, could prove to be useful in detecting FSF. Another approach would be to include financial news reports about the company over the past year leading up to the release of the financial statement. This approach has not been explored in the literature. Incorporating external data sources such as news would be particularly useful in South Africa as instances of fraud are usually published in the media sometime before the authorities announce that they are investigating a particular company for fraud.

As shown in Table 5, combining text and financial variables in the detection of FSF has only been explored since 2017, with good results (Dong et al., 2018; Yao et al., 2018). This approach is still relatively new and is yet to be considered within the South African context.

4.2 Data issues

To deal with the data imbalance problem in their data sets, 71% of the authors in the literature match fraudulent companies with one or more non-fraudulent firms of similar profile. This results in a balanced data set and is a form of under-sampling the majority class. One of the reasons for using the matching principle is that using a non-random sample leads to better information content (Gaganis, 2009). This however leads to less data than available being used. In addition, since the sample is balanced, one can achieve an accuracy of 50% by making random predictions (Alden et al., 2012). A more direct approach to deal with the data imbalance problem using cost sensitive learning is explored in Moepya, Akhoury, and Nelwamondo (2014). In this paper the authors use different weights for false positive and false negatives, and show that this cost sensitive approach results in an increase in the detection of the minority class, albeit at a cost of lowering the overall classification accuracy.

Most of the data used in the literature was of USA companies, followed by companies in Taiwan. In recent times, more of the data sets being used have been of companies from China, Taiwan and other emerging markets. Except for the USA, very few data sets from developed markets have been used in FSF detection research.

The median financial statement data set size used in the surveyed literature is 190, with the largest data set used being 49 039. This shows that the majority of the studies use relatively small data sets. The majority of the data sets were small because of the matching of fraud firms to non-fraud firms. Since FSF is a rare event, this results in the overall data size being small. This could affect the generality of the models that have been built to detect FSF in previous studies.

Most of the papers (in excess of 90%) do not explicitly mention if there was missing data, and how they dealt with the missing data if present. Of the papers that mention that there was missing data, the record or feature that has missing data is simply deleted (Kiehl et al., 2005; Moepya, Akhoury, & Nelwamondo, 2014; J. Perols, 2011; J. L. Perols & Lougee, 2011; Yao et al., 2018) . A study that tackles the missing data problem explicitly was conducted by Moepya et al. (2016). In this paper data imputation methods are explored on financial statements of companies listed on the JSE. This paper shows that missing data imputation has a role to play in the detection of FSF. A comprehensive study of missing data imputation combined with wrapper feature selection and unsupervised learning approaches is yet to be explored in the FSF literature.

4.3 Methods used

A variety of machine learning and statistical methods, discussed in Section 3.1.3, have been explored in literature to solve the FSF detection problem. As shown in Table 6, the most common methods used are neural networks (21%), logistic regression (18%) and SVM (13%). The use of SVM to detect FSF has been on the increase and has been heavily used in the last few years. Logistic regression and neural networks have consistently been applied to detect FSF over the past two decades. On the other hand, the use of discriminant analysis has been on the decline. From the survey, it appears that the use of statistical methods is on the decrease, while the use of machine learning models has been on the increase. This can be attributed to the fact that one does not need to make distributional assumptions about the data to use machine learning methods, and thus authors prefer the machine learning methods over the statistical methods (Kirkos et al., 2007).

The main focus in the literature has been on supervised learning approaches, with 97% of the studies applying supervised classification. The different supervised models have also been extensively compared for performance (Gaganis, 2009; Gaganis et al., 2007; Katsis et al., 2012). From the studies surveyed, it was found that there is no overall best method, with different methods outperforming on different data sets. This result is in line with the No-Free-Lunch theorem for learning algorithms that states that no completely general-purpose learning algorithm can exist, so one can assume that there exists no best machine learning algorithm for all problem instances (Wolpert, 1996). To our knowledge, unsupervised learning approaches have only been explored in Y.-J. Chen (2015), Deng and Mei (2009), Huang, Tsaih, and Lin (2014), Huang, Tsaih, and Yu (2014), Tsaih et al. (2009). A thorough comparison of unsupervised learning methods has not been conducted in the FSF literature.

There is an additional aspect relating to the interpretability of fraud detectors that is largely ignored in the literature. Artificial neural networks are the most widely used method in the FSF detection literature, but due to their black-box nature are notoriously lacking when it comes to transparency (Ghorbani et al., 2019). The internal structure is often too complex to analyse, so it is usually not possible to connect the input features to the output. This means that it is not clear on what basis the model predicts fraud. Decision tree based approaches are less popular and may give lower model accuracy than neural networks, but have the advantage that the learnt model can easily be interpreted by decision makers (Perner, 2011). An added advantage of this is that the model can lead to a better understanding of the nature of financial statement fraud that could help in the design of interventions to prevent fraud. In addition, the interpretability of a model is an important aspect that can influence the willingness of industry to adopt automated approaches and should be investigated further in the context of FSF detection.

4.4 Feature selection and performance measures

From the FSF literature surveyed, 69% of the studies used some form of feature selection, while the remaining papers did not apply any feature selection techniques. The most common approach, applied by 97% of the papers that do feature selection, are filter based approaches. To our knowledge, a wrapper based approach, using a genetic algorithm, has only been applied in the FSF detection domain by J. Perols (2011) on a USA data set. Moepya et al. (2016) have already found that using filter based feature selection, combined with missing data imputation, can improve the performance of the models on a South African data set. The wrapper based selection has not been used in the South African context, and could prove to be useful in detecting FSF in South Africa.

The most common performance measure used, which is used by 35% of the papers surveyed, is classification accuracy. However, the classification accuracy performance measure is not appropriate in the context of FSF detection because in practice different costs or weights are placed on incorrectly classifying a company as not being fraudulent when it is fraudulent; compared to incorrectly classifying a company as being fraudulent when it is not. In addition, the FSF domain has a high class imbalance with fraud cases being very rare compared with non-fraud instances. More appropriate performance measures to use for the FSF detection domain are ROC and AUC as used in Moepya, Akhoury, and Nelwamondo (2014), J. L. Perols and Lougee (2011) amongst others. A comprehensive study to determine what the optimal performance measure for the FSF detection domain is yet to be conducted.

4.5 Other factors to consider

A majority of the papers simply ignore the time series element of FSF detection, except possibly during the division of the data set into training and test data sets as done in Gaganis (2009) amongst others. The time series approach is important so as to avoid any forward looking biases (by using information from the future to predict fraud that happened in the past) in the FSF models, and would allow the incorporation of temporal patterns in the data. As financial statements are published on an annual basis, there are possible trends over time in the data that could be incorporated into the models and could improve the detection of FSF. As an example, none of the papers considered using the audit opinion from previous time periods as input into their decision support system. The papers that take account and use the temporal patterns of financial statements to detect FSF are Alden et al. (2012), Chai et al. (2006), Hoogs et al. (2007), Kiehl et al. (2005). These studies show that temporal patterns can be useful in detecting financial statement fraud. The three studies all use genetic algorithms to build their decision support systems. Other computational intelligence approaches for the time series element in the detection of FSF have not been explored in the literature.

The papers surveyed did not discuss in detail how to optimally present the results from the decision support systems that they design. The types of outputs that the decision support system could provide are binary classification, ranking of the financial statements, and indicating the probability of FSF. The majority of the studies treated the output of the decision support system as a binary result. This would not be very useful to stakeholders as some sort of ranking or probability of fraud would provide more information, and would allow the stakeholders, e.g. auditors, to know how to allocate their resources based on the ranking or flags provided by the decision support system. A paper that implements the ranking approach using fuzzy logic is Chai et al. (2006).

5 CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS

The current study explored the literature on automated financial statement fraud (FSF) and highlights which factors are relevant for the South African context. There have been numerous papers in this field over the last three decades, with most of the papers being written in the last decade. From the literature it was found that the following themes were important for implementing automated decision support systems for the detection of FSF:

• definition of fraud used,

• data features used,

• data region, size and imbalance,

• methods used,

• feature selection,

• missing data treatment, and

• performance measures

This paper highlights the fact that neural networks, SVM and logistic regression supervised classification methods are the most common approaches to FSF detection found in the literature, with filter based feature selection methods being the common methods used. A majority of the papers use financial ratios as the data for the model building and do not take into account, or simply delete, missing data that may be present in the data sample.

It was found that the majority of the FSF literature used companies investigated by authorities as the definition of a fraudulent company. Most of the data sets used matched fraudulent firms with non-fraud companies of similar profile. Although the majority of the papers used the filter based feature selection methods, only one considered wrapper based feature selection methods. Other papers used feature engineering techniques to reduce the dimensionality of the input space. The most common performance measure used in the literature was found to be classification accuracy, which is not appropriate to the FSF domain as fraud instances carry more weight than non-fraud instances. Thus performance measures that balance both precision and recall, such as the other measures covered in Section 3.1.5, would be more appropriate for the FSF detection domain.

This survey has revealed that there has been minimal research on applying automated methods for detecting FSF in South Africa, with Moepya, Akhoury, and Nelwamondo (2014), Moepya et al. (2016), Moepya, Nelwamondo, et al. (2014) being the only authors to apply machine learning based approaches for FSF detection in the South African context. The definition of fraud used in these studies was qualified audit opinions. Given the number of times that auditors have been found wanting in South Africa (Donnelly, 2018), another approach would be to formulate the FSF problem as an unsupervised learning problem. This would remove the reliance on using labelled data from auditors.

The models that have been applied in South Africa for FSF detection are SVN, logistic regression, naïve Bayes, kNN and random forest. Another model class that may be worth considering are ANNs, which have been successfully applied to the FSF detection problem in other countries. However, ANNs have the disadvantage of being less interpretable than the other models that have already been applied in the South African context.

The only data type used in the South African context is financial ratios. The use of text from the annual reports, and perhaps combined with the financial ratios, has not been explored in South Africa. In addition, incorporating external data sources such as news would be particularly useful in South Africa as instances of fraud are usually published in the media sometime before the authorities announce that they are investigating a particular company for fraud.

Overall, the thorough survey of the automated FSF detection literature revealed the following gaps in the literature:

1. The time series element of the financial statement fraud detection has not been comprehensively explored in literature. This approach is important to avoid any forward looking biases in the FSF models, and would allow the incorporation of temporal patterns in the data.

2. The use of alternative data sources such as financial news, social media data and text from the annual reports could improve the detection of FSF.

3. A comprehensive study of missing data imputation combined with wrapper feature selection and unsupervised learning approaches is yet to be explored in the FSF literature.

4. A comprehensive study is yet to be performed to assess what the optimal performance measure is for the FSF detection domain.

5. The interpretability of fraud detectors is an important consideration that is largely ignored in the literature.

6. Meta-learning the appropriate models to use in the detection of FSF has not been considered in the literature.

7. Using multiple data sets from different countries and comparing the performance of the different FSF detection models has not been considered in the literature.

References

Abdallah, A., Maarof, M. A., & Zainal, A. (2016). Fraud detection system: A survey. Journal of Network and Computer Applications, 68, 90-113. https://doi.org/10.1016/jjnca.2016.04.007 [ Links ]

Alden, M. E., Bryan, D. M., Lessley, B. J., & Tripathy, A. (2012). Detection of financial statement fraud using evolutionary algorithms. Journal of Emerging Technologies in Accounting, 9(1), 71-94. https://doi.org/10.2308/jeta-50390 [ Links ]

Amara, I., Amar, A. B., & Jarboui, A. (2013). Detection of fraud in financial statements: French companies as a case study. International Journal of Academic Research in Accounting, Finance and Management Sciences, 3(3), 40-51. https://doi.org/10.6007/ijarafms/v3-i3/34 [ Links ]

Ata, H. A., & Seyrek, I. H. (2009). The use of data mining techniques in detecting fraudulent financial statements: An application on manufacturing firms. Suleyman Demirel University Journal of Faculty of Economics and Administrative Sciences, 14(2), 157-170. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.823.1630&rep=rep1&type=pdf [ Links ]

Bai, B., Yen, J., & Yang, X. (2008). False financial statements: Characteristics of China's listed companies and CART detecting approach. International Journal ofInformation Technology & Decision Making, 7(02), 339-359. https://doi.org/10.1142/s0219622008002958 [ Links ]

Boumediene, S. L. (2014). Detection and prediction of managerial fraud in the financial statements of Tunisian banks. Accounting & Taxation, 6(2), 1-10. http://www.theibfr2.com/RePEc/ibf/acttax/at-v6n2-2014/AT-V6N2-2014-1.pdf [ Links ]

Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7), 1145-1159. https://doi.org/10.1016/s0031-3203(96)00142-2 [ Links ]

Bratton, W. W. (2002). Enron and the Dark Side of Shareholder Value. SSRNElectronic Journal. https://doi.org/10.2139/ssrn.301475

Chai, W., Hoogs, B., & Verschueren, B. (2006). Fuzzy ranking of financial statements for fraud detection, In 2006 IEEE International Conference on Fuzzy Systems, IEEE. https://doi.org/10.1109/fuzzy.2006.1681708

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28. https://doi.org/10.1016/j.compeleceng.2013.11.024 [ Links ]

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953 [ Links ]

Chen, F. H., Chi, D.-J., & Zhu, J.-Y. (2014). Application of random forest, rough set theory, decision tree and neural network to detect financial statement fraud - taking corporate governance into consideration, In Intelligent Computing Theory, Springer International Publishing. https://doi.org/10.1007/978-3-319-09333-8_24

Chen, S. (2016). Detection of fraudulent financial statements using the hybrid data mining approach. SpringerPlus, 5(1), 1-16. https://doi.org/10.1186/s40064-016-1707-6 [ Links ]

Chen, Y.-J. (2015). On fraud detection method for narrative annual reports, In The Fourth Intenational Conference on Informatics & Applications (ICIA2015), Takamatsu, Japan. https://www.researchgate.net/publication/280222998_Proceedings_of_The_Fourth_International_Conference_on_Informatics_Applications_Takamatsu_Japan_2015

Chong, G. (2012). Detecting fraud: What are auditors' responsibilities? Journal of Corporate Accounting & Finance, 24(2), 47-53. https://doi.org/10.1002/jcaf.21829 [ Links ]

Cotterill, J. (2018). Steinhoff shareholders sue Deloitte for damages [Last accessed 15 Nov 2019]. https://www.ft.com/content/4f4d591a-6f0f-11e8-92d3-6c13e5c92914

Cronje, J. (2017). Steinhoffs market cap a mere R20bn as shares drop another 30% [Last accessed 15 Nov 2019]. https://www.fin24.com/Companies/Retail/steinhoff-shares-drop-by-a-fifth-in-early-trade-20171220

Crotty, A. (2018a). Dutch investors gun for Deloitte over Steinhoff [Last accessed 15 Nov 2019]. Business Day. https://www.businesslive.co.za/bd/companies/retail-and-consumer/2018-04-09-dutch-investors-gun-for-deloitte-over-steinhoff/

Crotty, A. (2018b). Mattress Firm faces a lumpy fate [Last accessed 15 Nov 2019]. https://www.businesslive.co.za/bd/companies/retail-and-consumer/2018-10-04-how-steinhoffs-mattress-firm-faces-a-lumpy-fate/

Dalnial, H., Kamaluddin, A., Sanusi, Z. M., & Khairuddin, K. S. (2014). Accountability in financial reporting: Detecting fraudulent firms. Procedia - Social and Behavioral Sciences, 145, 61-69. https://doi.org/10.1016/j.sbspro.2014.06.011 [ Links ]

Deng, Q. (2010). Detection of fraudulent financial statements based on Naïve Bayes classifier, In 2010 5th International Conference on Computer Science & Education, IEEE. https://doi.org/10.1109/iccse.2010.5593407

Deng, Q., & Mei, G. (2009). Combining self-organizing map and k-means clustering for detecting fraudulent financial statements, In 2009 IEEE International Conference on Granular Computing, IEEE. https://doi.org/10.1109/grc.2009.5255148

Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35(2), 461-487. https://doi.org/10.1080/07421222.2018.1451954 [ Links ]

Donnelly, L. (2018). PIC still limping from Steinhoff blow [Last accessed 15 Nov 2019]. Mail & Guardian. https://mg.co.za/article/2018-06-08-00-pic-still-limping-from-steinhoff-blow

Fanning, K. M., & Cogger, K. O. (1998). Neural network detection of management fraud using published financial data. International Journal of Intelligent Systems in Accounting, Finance & Management, 7(1), 21-41. https://doi.org/10.1002/(sici)1099-1174(199803)7:1<21::aid-isaf138>3.0.co;2-k [ Links ]

Fernández-Gámez, M. A., Garcïa-Lagos, F., & Sánchez-Serrano, J. R. (2015). Integrating corporate governance and financial variables for the identification of qualified audit opinions with neural networks. Neural Computing and Applications, 27(5), 1427-1444. https://doi.org/10.1007/s00521-015-1944-6 [ Links ]

Gaganis, C. (2009). Classification techniques for the identification of falsified financial statements: A comparative analysis. Intelligent Systems in Accounting, Finance & Management, 16(3), 207-229. https://doi.org/10.1002/isaf.303 [ Links ]

Gaganis, C., Pasiouras, F., Spathis, C., & Zopounidis, C. (2007). A comparison of nearest neighbours, discriminant and logit models for auditing decisions. Intelligent Systems in Accounting, Finance and Management, 15(1-2), 23-40. https://doi.org/10.1002/isaf.283 [ Links ]

Ghorbani, A., Abid, A., & Zou, J. (2019). Interpretation of neural networks is fragile, In Proceedings of the AAAI Conference on Artificial Intelligence. doi:10.1609/aaai.v33i01.33013681

Glancy, F. H., & Yadav, S. B. (2011). A computational model for financial reporting fraud detection. Decision Support Systems, 50(3), 595-601. https://doi.org/10.1016/j.dss.2010.08.010 [ Links ]

Goel, S., & Gangolly, J. (2012). Beyond the numbers: Mining the annual reports for hidden cues indicative of financial statement fraud. Intelligent Systems in Accounting, Finance and Management, 19(2), 75-89. https://doi.org/10.1002/isaf.1326 [ Links ]

Green, B. P., & Choi, J. H. (1997). Assessing the risk of management fraud through neural network technology. Auditing, 16, 14-28. https://www.researchgate.net/publication/245508224_Assessing_the_Risk_of_Management_Fraud_Through_Neural_Network_Technology [ Links ]

Guan, L., Kaminski, K., & Wetzel, T. S. (2007). Can investors detect fraud using financial statements: An exploratory study. Advances in Public Interest Accounting, 13, 17-34. https://doi.org/10.1016/s1041-7060(07)13002-9 [ Links ]

Gupta, R., & Gill, N. S. (2012). Financial statement fraud detection using text mining. International Journal of Advanced Computer Science and Applications, 3(12). https://doi.org/10.14569/ijacsa.2012.031230 [ Links ]

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182. http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf [ Links ]

Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud - a comparative study of machine learning methods. Knowledge-Based Systems, 128, 139-152. https://doi.org/10.1016Zj.knosys.2017.05.001 [ Links ]

Healy, P. M., & Palepu, K. G. (2003). The Fall of Enron. Journal of Economic Perspectives, 17(2), 3-26. https://doi.org/10.1257/089533003765888403 [ Links ]

Hoberg, G., & Lewis, C. (2017). Do fraudulent firms produce abnormal disclosure? Journal of Corporate Finance, 43, 58-85. https://doi.org/10.1016/jjcorpfin.2016.12.007 [ Links ]

Hoogs, B., Kiehl, T., Lacomb, C., & Senturk, D. (2007). A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud. Intelligent Systems in Accounting, Finance and Management, 15(1-2), 41-56. https://doi.org/10.1002/isaf.284 [ Links ]

Huang, S.-Y., Tsaih, R.-H., & Lin, W.-Y. (2014). Feature extraction of fraudulent financial reporting through unsupervised neural networks. Neural Network World, 24(5), 539-560. https://doi.org/10.14311/nnw.2014.24.031 [ Links ]

Huang, S.-Y., Tsaih, R.-H., & Yu, F. (2014). Topological pattern discovery and feature extraction for fraudulent financial reporting. Expert Systems with Applications, 41 (9), 4360-4372. https://doi.org/10.1016/j.eswa.2014.01.012 [ Links ]

Humpherys, S. L., Moffitt, K. C., Burns, M. B., Burgoon, J. K., & Felix, W. F. (2011). Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems, 50(3), 585-594. https://doi.org/10.1016/j.dss.2010.08.009 [ Links ]

International Federation of Accountants (IFAC). (2009). International standard on auditing (ISA) 240, the auditory responsibilities relating to fraud in an audit of financial statements. https://www.ifac.org/system/files/downloads/a012-2010-iaasb-handbook-isa-240.pdf

Johannesburg Stock Exchange. (2015). JSE Limited Listings Requirements [Last accessed 15 Nov 2019]. https://www.jse.co.za/content/JSERulesPoliciesandRegulationltems/JSE%20Listings%20Requirements.pdf

Kassem, R., & Higson, A. (2012). Financial reporting fraud: Are standards' setters and external auditors doing enough? International Journal ofBusiness and Social Science, 3(19), 283-290. [ Links ]

Katsis, C. D., Goletsis, Y., Boufounou, P. V., Stylios, G., & Koumanakos, E. (2012). Using ants to detect fraudulent financial statements. Journal of Applied Finance and Banking, 2(6), 73-81. http://www.scienpress.com/Upload/JAFB%5C%2fVol%5C%202_6_6.pdf [ Links ]

Kiehl, T. R., Hoogs, B. K., LaComb, C. A., & Senturk, D. (2005). Evolving multi-variate time-series patterns for the discrimination of fraudulent financial filings [Last accessed 15 Nov 2019]. http://nanophobia.com/lab/cv/tom/tom-cv%5C_files/gecco-2005-fraud.pdf

Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4), 995-1003. https://doi.org/10.1016/j.eswa.2006.02.016 [ Links ]

Kotsiantis, S., Koumanakos, E., Tzelepis, D., & Tampakas, V. (2006). Forecasting fraudulent financial statements using data mining. International Journal ofComputational Intelligence, 3(2), 104-110. https://www.researchgate.net/publication/228084523_Forecasting_fraudulent_financial_statements_using_data_mining [ Links ]

Kou, Y., Lu, C.-T., Sirwongwattana, S., & Huang, Y.-P. (2004). Survey of fraud detection techniques, In IEEE International Conference on Networking, Sensing and Control, 2004, IEEE. https://doi.org/10.1109/icnsc.2004.1297040

Küçükkocaoglu, G., Benli, Y. K., & Küçüksozen, C. (1997). Detecting the manipulation of financial information by using artificial neural network models. Istanbul Stock Exchange Review, 9(36), 1-27. https://ideas.repec.org/a/bor/iserev/v9y2007i36p1-26.html [ Links ]

Lin, J. W., Hwang, M. I., & Becker, J. D. (2003). A fuzzy neural network for assessing the risk of fraudulent financial reporting. Managerial Auditing Journal, 18(8), 657-665. https://doi.org/10.1108/02686900310495151 [ Links ]

Macey, J. R. (2004). Efficient capital markets, corporate disclosure, and Enron. Cornell Law Review, 89, 394-422. https://scholarship.law.cornell.edu/clr/vol89/iss2/4 [ Links ]

Moepya, S. O., Akhoury, S. S., & Nelwamondo, F. V. (2014). Applying cost-sensitive classification for financial fraud detection under high class-imbalance, In 2014 IEEE International Conference onDataMining Workshop, IEEE. https://doi.org/10.1109/icdmw.2014.141

Moepya, S. O., Akhoury, S. S., Nelwamondo, F. V., & Twala, B. (2016). The role of imputation in detecting fraudulent financial reporting. International Journal of Innovative Computing, Information andControl, 12(1), 333-356. https://doi.org/10.24507/ijicic.12.01.333 [ Links ]

Moepya, S. O., Nelwamondo, F. V., & Walt, C. V. D. (2014). A support vector machine approach to detect financial statement fraud in South Africa: A first look, In Intelligent Information and Database Systems. Springer International Publishing. https://doi.org/10.1007/978-3-319-05458-2_5

Ögüt, H., Aktas, R., Alp, A., & Doganay, M. M. (2009). Prediction of financial information manipulation by using support vector machine and probabilistic neural network. Expert Systems with Applications, 36(3), 5419-5423. https://doi.org/10.1016/j.eswa.2008.06.055 [ Links ]

Omar, N., Johari, Z. A., & Smith, M. (2017). Predicting fraudulent financial reporting using artificial neural network. Journal of Financial Crime, 24(2), 362-387. https://doi.org/10.1108/jfc-11-2015-0061 [ Links ]

Omid, P., pour Hossein, N., & Zeinab, A. (2012). Identifying qualified audit opinions by artificial neural networks. African Journal of Business Management, 6(44), 11077-11087. https://doi.org/10.5897/ajbm12.855 [ Links ]

Perner, P. (2011). How to interpret decision trees?, In Industrial Conference on Data Mining. Springer. https://doi.org/10.1007/978-3-642-23184-1_4

Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19-50. https://doi.org/10.2308/ajpt-50009 [ Links ]

Perols, J. L., Bowen, R. M., Zimmermann, C., & Samba, B. (2016). Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review, 92(2), 221245. https://doi.org/10.2308/accr-51562 [ Links ]

Perols, J. L., & Lougee, B. A. (2011). The relation between earnings management and financial statement fraud. Advances in Accounting, 27(1), 39-53. https://doi.org/10.1016/j.adiac.2010.10.004 [ Links ]

Persons, O. S. (1995). Using financial statement data to identify factors associated with fraudulent financial reporting. Journal of Applied Business Research (JABR), 11(3), 38. https://doi.org/10.19030/jabr.v11i3.5858 [ Links ]

Presence, C. (2018). SA pensioners unlikely to recoup losses from Steinhoff [Last accessed 15 Nov 2019]. https://www.iol.co.za/business-report/companies/sa-pensioners-unlikely-to-recoup-losses-from-steinhoff-former-cfo-16795506

PricewaterhouseCoopers. (2018). Global Economic Crime and Fraud Survey 2018 - South Africa. https://www.pwc.co.za/en/assets/pdf/gecs-2018.pdf

Purda, L., & Skillicorn, D. (2014). Accounting variables, deception, and a bag of words: Assessing the tools of fraud detection. Contemporary Accounting Research, 32(3), 1193-1223. https://doi.org/10.1111/1911-3846.12089 [ Links ]

Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2011). Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems, 50(2), 491-500. https://doi.org/10.1016Zj.dss.2010.11.006 [ Links ]

Schilit, H. M. (2010). Financial shenanigans: Detecting accounting gimmicks that destroy investments (corrected november 2010). CFA Institute Conference Proceedings Quarterly, 27(4), 67-74. https://doi.org/10.2469/cp.v27.n4.1 [ Links ]

Seemakurthi, P., Zhang, S., & Qi, Y. (2015). Detection of fraudulent financial reports with machine learning techniques, In Systems and Information Engineering Design Symposium (SIEDS), 2015, IEEE. https://doi.org/10.1109/sieds.2015.7117005

Sen, i. K., & Terzi, S. (2012). Detecting falsified financial statements using data mining: Empirical research on finance sector in Turkey. Maliye Finans Yazilari, 26(96), 67-82. https://dergipark.org.tr/tr/download/article-file/150723 [ Links ]

Sharma, A., & Panigrahi, P. K. (n.d.). A review of financial accounting fraud detection based on data mining techniques. International Journal of Computer Applications, 975, 8887. https://doi.org/10.5120/4787-7016

Skillicorn, D. B., & Purda, L. (2012). Detecting fraud in financial reports, In 2012 European Intelligence and Security Informatics Conference. IEEE. https://doi.org/10.1109/eisic.2012.8

Song, X.-P., Hu, Z.-H., Du, J.-G., & Sheng, Z.-H. (2014). Application of machine learning methods to risk assessment of financial statement fraud: Evidence from China. Journal of Forecasting, 33(8), 611-626. https://doi.org/10.1002/for.2294 [ Links ]

Spathis, C., Doumpos, M., & Zopounidis, C. (2002). Detecting falsified financial statements: A comparative study using multicriteria analysis and multivariate statistical techniques. European Accounting Review, 11(3), 509-535. https://doi.org/10.1080/096381802200000096e [ Links ]

Spathis, C. T. (2002). Detecting false financial statements using published data: Some evidence from greece. Managerial Auditing Journal, 17(4), 179-191. https://doi.org/10.1108/02686900210424321 [ Links ]

Tsaih, R.-H., Lin, W.-Y., & Huang, S.-Y. (2009). Exploring fraudulent financial reporting with GHSOM, In Intelligence and Security Informatics, Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-01393-5_5

Wang, S. (2010). A comprehensive survey of data mining-based accounting-fraud detection research, In 2010 International Conference on Intelligent Computation Technologyand Automation. IEEE. https://doi.org/10.1109/icicta.2010.831

Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7), 1341-1390. https://doi.org/10.1162/neco.1996.8.7.1341 [ Links ]

Yao, J., Zhang, J., & Wang, L. (2018). A financial statement fraud detection model based on hybrid data mining methods, In 2018 International Conference on Artificial Intelligence andBigData (ICAIBD), IEEE. https://doi.org/10.1109/icaibd.2018.8396167

Ya§ar, A., Yakut, E., & Gutnu, M. M. (2015). Predicting qualified audit opinions using financial ratios: Evidence from the Istanbul Stock Exchange. International Journal ofBusiness and Social Science, 6(8), 1. https://ijbssnet.com/journal/index/3248 [ Links ]

Yeh, C.-C., Chi, D.-J., Lin, T.-Y., & Chiu, S.-H. (2016). A hybrid detecting fraudulent financial statements model using rough set theory and support vector machines. Cybernetics and Systems, 47(4), 261-276. https://doi.org/10.1080/01969722.2016.1158553 [ Links ]

Zhou, W., & Kapoor, G. (2011). Detecting evolutionary financial statement fraud. Decision Support Systems, 50(3), 570-575. https://doi.org/10.1016/j.dss.2010.08.007 [ Links ]

Zopounidis, C., Doumpos, M., & Spathis, C. T. (2000). Detecting falsified financial statements using multicriteria analysis: The case of Greece. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.250413

Received: 22 Nov 2019
Accepted: 1 Jun 2020
Available online: 20 Jul 2020