Date posted: 15/09/2020 12 min read

Beneath the data: 2020 John B. Thurston Award winner

Christopher Kelly FCA and James Hao’s article, reprinted here, was voted this year’s outstanding paper on internal auditing.

In Brief

  • Self-service business intelligence tools expand audit’s toolkit to mining entire data populations.
  • It makes it easier for auditors to data mine for errors such as anomalous transactions and fraudulent data correlations.
  • Payroll data mining can be used to help businesses save money on their largest single cost.

By Dr Christopher Kelly FCA and James Hao

Christopher Kelly FCA and co-author James Hao have been named winners of the Institute of Internal Auditors’ 2020 John B. Thurston Award for their article “Beneath the data: Using self-service business intelligence data analysis tools to generate original knowledge”, which appeared in the February 2019 issue of Internal Auditor. Established in 1952, the annual award goes to the author of the article published in Internal Auditor that is voted the outstanding paper in the field of internal auditing. It is reprinted here in full.

Big data can tell unexpected stories: The chief financial officer who had a conflict of interest with a supplier to whom he had awarded a multimillion-dollar contract. The two managers who provided their company-supplied fuel cards to family members to refuel their personal vehicles. The executive who had an affair with a union official during wage negotiations.

Such wrongdoing could never have been discovered through traditional audit sampling, walk-throughs, or reliance on the representations of management. They were only found by using business intelligence tools to mine data sources that are now routinely available.

Self-service business intelligence tools expand audit’s toolkit from mere questionnaires and statistical sampling to analysing entire data populations.

Christopher Kelly FCA Picture: Christopher Kelly FCA.

Business intelligence for auditors

Audits typically entail inquiries, walk-throughs, and transaction sampling as a basis for statistically inferring the effectiveness of each internal control attribute. To be generalisable within a statistical confidence interval, transaction samples need to be both large and randomised to represent the entire population. In doing so, auditors usually presume that the population conforms to a normal (Gaussian) bell curve. This brings with it the risk that if the sample is too small, the tests are performed with insufficient care, or the population is skewed differently from a normal bell curve, the auditor may form wrong conclusions about the control’s true characteristics. Furthermore, if the population contains any erroneous or fraudulent transactions, the likelihood is slim that they will turn up in a walk-through or random sample.

But now self-service business intelligence tools expand audit’s toolkit from mere questionnaires and sampling to mining entire data populations. Business intelligence tools make it easier for auditors to data mine for errors such as anomalous transactions and fraudulent data correlations. In this way business intelligence tools enable auditors to pinpoint actual error, fraud, and cost savings, thereby generating factual audit findings that demand action.

Besides traditional financial transactions, auditors can use business intelligence tools to access heterogenous data sources such as telecommunications, email, internet usage, road tolls, time sheets, maintenance schedules, security incident logs, clocking on/off, and electronic point-of-sale transactions. Previously, these data sources either were not auditable or were stored as manual records. Some examples follow:

Inventory: For many organisations, inventory is a complex and poorly understood process. Compared to purely financial processes like cash, creditors and debtors, inventory data can more easily get out of step with the daily movement of thousands of non-homogenous physical goods. Inventory is vulnerable to receipting errors, barcode misreads, obsolescence, rot, and shrinkage through staff or third-party theft.

Things often go wrong in inventory and downside errors of 10% are not uncommon. Therefore, audit could focus on ensuring quantity and description data matches physical reality through accurate goods receipting into the accounting system, precise sales capture, and reliable stock-taking. Once inventory data reflects the physical reality on hand, data mining can assist with identifying: 

  • Slow moving and excessive inventory build-up by line item.
  • Book-to-physical adjustments pointing to shrinkage or theft by location.
  • Refundable stock returnable to suppliers.
  • Stock-outs where sales were lost because of insufficient demand analysis.
  • Negative quantities which expose goods receipting or similar process errors.

Supply chain: Organisations need to know supplier appointments are not distorted by undeclared conflicts of interest and suppliers are paid no more than their contractual entitlements. Organisations often process thousands of supplier invoices daily, so human error is likely. Data mining can include:

  • Matching supplier master data such as bank account numbers, addresses, and telephone numbers to employee and next-of-kin master data for unexpected inter-relationships.
  • Isolation of purchase orders or payments just below authorisation thresholds.
  • Erroneous duplicate invoice payments due to OCR (Optical Character Recognition) or human error when entering invoice references such as mis-entry of “I” instead of “1,” or “S” instead of “5,” and “/” instead of “\”.
  • Historic credit notes that have never been offset against subsequent payments and therefore remain convertible into cold hard cash.

Audits using these tests have revealed 0.1% in errors that enabled organisations to recover cash refunds from suppliers. We have frequent success data mining over several prior years resulting in material cash recoveries.

Payroll: For most organisations, payroll is the largest single cost. The board and audit committee need to know underpaying or overpaying employees is minimised. Payroll data mining can include comparing hours worked to hours paid by matching sick leave and holiday to other time- and location-stamped data such as building and site entry/exit data, mobile phone metadata and email data. In doing this, data mining can produce compelling evidence that supports corrective action and savings of 1% of total payroll cost from:

  • fictional hours claimed on time sheets
  • employees who claimed to be working at home, off-site, or were on paid sick leave who were actually at holiday destinations
  • non-attendance at scheduled training
  • repetitive patterns of fictitious sick leave taken on Mondays, Fridays, and the day before or after public holidays.

Company motor fleets: Auditors can mine data gathered from vehicles, including road tolls, refuelling, traffic penalties, and insurance claims. This jigsaw puzzle of data can help auditors assess how vehicles are being used for business purposes, possible abuse of vehicles, and drivers with poor driving histories that result in unnecessary cost. Often, this data will be accessible from external motor fleet providers and insurers. Such data mining can easily save 5% of fleet costs.

Metadata: While the content of company-issued cell phone calls and text messages is confidential under telecommunications regulations, the accompanying non-confidential metadata includes called numbers, durations, date and time stamps, and base station geographical locations. These data sources can be instructive when investigating white-collar conflicts of interest and fraud.

Partnering with the chief information officer (CIO) can help in accessing the organisation’s databases. The CIO will know what data is available in-house, what resides with external providers, and the gateways to gaining access. Once access is granted, auditors can then use business intelligence tools with minimal assistance.

“Beyond financial transactions, auditors can use business intelligence tools to access newly available data sources.”

Getting started

With self-service business intelligence, we are no longer constrained by Microsoft Excel’s 1,048,576 row limit.

Indeed, Excel’s latest versions include built-in business intelligence tools, Power Query and Power Pivot, which are also available as add-ons for earlier versions. Power Query is an extract, transform, and load (ETL) tool that reads source data and makes it available for data modelling in Power Pivot. This source data typically comes from comma- or tab-separated outputs from other systems. Auditors can access Power Query under Excel’s Data ribbon, where it is also known as Get Data and, once opened, Query Editor. In older versions, it is available under the Power Query ribbon.

Power Query and Power Pivot also have formula languages that allow users to create new data columns specific to their own unique needs. Power Query uses M formula language and Power Pivot uses Data Access Expressions (DAX). Both languages differ from Excel formulas and therefore require incremental learning. Whereas Excel formulas are not case sensitive and usually do not distinguish among string, date, and numeric data types, Power Query M and Power Pivot DAX are sensitive to both text case and data type. This distinction is important when manipulating data and performing calculations. For example, while Excel allows flexible concatenation of strings, dates, and numbers, M and DAX require all data to be converted to strings first.

Once auditors have loaded, edited, and trimmed the raw data down to only the needed columns in Power Query, they can add each table to the Power Pivot data model under the “Add to data model” option. Auditors can then access Power Pivot from Excel under “Manage data model.” From there, they can use the intuitive “Diagram view” to link tables such as transaction files keyed to their corresponding master files. The data model can handle multiple external data sources as well as normal Excel tables. This capability allows auditors to create and work with multi-dimensional relational data cubes rather than two-dimensional flat files.

Power Pivot enables auditors to annotate the relational databases retrieved in Power Query with unique columns and measures specific to audit needs, which can be analysed using Excel’s pivot tables. “Applying business intelligence using Benford’s law” (see below) illustrates how Power Query, M, Power Pivot, and Excel can work together to search for irregularities.

Data cleansing

Textbooks and online tutorials often omit informing readers that data files need to be cleansed before analysis. Why is data cleansing needed? Because over time, original source data is input by a variety of users whose training and attention to accuracy may have lacked consistency. Some fields may hold invalid data as a result of being migrated from different systems or different versions of the same system. Moreover, stack overflow and other error types may still lurk in historic data, the text files may have misaligned some fields, and records may be broken across two or more rows.

Comma-separated text files can present extra cleansing problems if users have input commas into individual fields. For example, “Kelly & Yang, Inc” would misalign into two separate fields because of the comma, whereas “Kelly & Yang Inc” would translate correctly into one field.

ETL [extract, transform, load] tools such as Power Query will attempt to read all transactions from the raw data files. But if the tool encounters errors, it may exclude them from the upload, resulting in loss of data that dilutes the objective of analysing the entire population. If time allows, the auditor may cleanse the text files field-by-field in a spreadsheet or word processor by re-joining broken records, recalibrating misaligned fields, trimming stray characters or spaces, replacing known error values such as “#N/A” with blanks or zeros, and converting dates stored as text to real dates using Excel’s DATEVALUE() formula.

Further cleansing may be required if source files are fragmented across different years or subsidiaries and need to be joined into a single table, or if source files are tabulated differently from how audit wishes to use them. In the first case, Power Query can append files into a single data source provided the field headings are identical. In the second case, auditors can un-tabulate inappropriately tabulated source files back into a single column of data using Power Query’s Unpivot command.

Auditors should keep a log of data cleansing actions in case future rework is required. Any updates to source data made in Power Query will need to be refreshed in the Power Pivot data model as well as in dependent pivot tables.

“Internal auditors should keep a record of data cleansing actions in case future rework is required.”

Efficient queries

Business intelligence tools are more powerful than vanilla Excel, but auditors still need to be mindful of formula efficiency. If the auditor tries to add a new calculated field to a data model that requires a row-by-row lookup of each element in a two-million-row database, that could easily result in two million x two million = four trillion separate lookups.

Although each individual lookup only takes a few milliseconds, four trillion lookups could take several hours. Auditors can increase query efficiency by indexing, compartmentalising a large query with efficient calculated fields, and filtering out unwanted columns or transactions that are blank or below a given materiality threshold.

Securing data

To avoid being the source of a leak, or to limit the damage if the unthinkable occurs, auditors should take care with data. Auditors can exclude fields that identify living individuals, home addresses, or bank account numbers from downloads or replace them with codes such as an employee number instead of a name.

They should be cautious when transmitting data to ensure USB drives are secure and electronic data is not emailed to unintended recipients. Auditors should think twice and check recipient email addresses before hitting “send”. Password protection and encryption should be used when practical. As auditors only need to work on copy data – rather than live data – they usually can destroy their version and wipe USB drives after the assignment is completed.

Original insights

With only a modest amount of new learning, business intelligence tools can expand audit’s adventures into new pools of financial and operational data that may lead to original risk and control insights. Moreover, because even the most innocuous transactions leave data trails, imaginative analysis can uncover errors, fraud, and cost savings that transform audit reports into compelling reading for executives and the board.

Mining for errors

The following diagram summarises the steps from raw data to audit findings if Excel’s Power Query and Power Pivot are used as described above.

Mining for errorsClick image to enlarge. Diagram 1: Mining for errors.

Applying business intelligence using Benford’s law

Using Benford’s law to annotate original source data with leading digits, the following steps seek to illuminate in a practical way how Power Query, Power Pivot, M, DAX and standard pivot tables work together to produce audit insights.


Using a data-cleansed table created in Power Query, create additional field columns using “Add column/custom column” to capture the leftmost 1, 2, 3, etc. digits for Benford’s law analysis. < /p>

Data mining learning point: Being able to create custom columns in the data model is key to the ability to generate original insights. Auditors should not create too many new columns or the data model may become unmanageable within their computer’s memory limits.


To avoid picking up the dollar-cent decimal points, multiply the amount field by 100 to convert it into whole cents and then use an M formula to convert the absolute amount to text and pick up the two (or three or more) leftmost digits. For example: = Text.Start(Text.From(Number.Abs([Amount]) * 100), 2)

Data mining learning point: In this M formula, auditors are isolating the two leftmost digits and “[Amount]” is the literal field heading from the source text file. Note that the formula syntax differs from Excel.


Once the desired Benford analysis columns have been created in Power Query, refresh the data model in Power Pivot.

Data mining learning point: Now that the raw data is in the Power Pivot data model, we can access the entire table including the new column we added in Step 1 within Excel’s standard pivot tables.


Show the leftmost digits using the pivot table’s “Show Values As/% of Grand Total” and compare this to the expected logarithmic frequency under Benford’s law. Then, visualise the resulting columns with a chart to highlight spikes between actual and expected frequency. 

Benford’s law first 3 digits actual to expected frequency - invoicesClick image to enlarge. Diagram 2: Benford’s law first 3 digits actual to expected frequency -invoices.

Deviations are most likely to have occurred where a systematic weakness has been exploited repeatedly.

Data mining learning point: </bold itals> Double clicking each deviant spike in the pivot table will display all the individual transactions that caused the spike, which auditors can then scrutinise for irregularities. In this way Excel can instantly find the deviant transactions from a huge data population.

Dr Christopher Kelly, DProf, FCA, MIIA, is a partner at Kelly & Yang in Melbourne, Australia.
James Hao CPA is an associate at Kelly & Yang in Melbourne.

Read more:

Choosing the right business insight tools

Business intelligence and data visualisation tools can help turn financial information into something useful for stakeholders.

Read more

How Power BI reaches far beyond Excel

The take-up of Microsoft’s Power BI is growing in Australia and two Excel experts explain how usage is moving beyond early adopters.

Read more