# Caught by the numbers

### Data analytics winnows out possible fraudster

###### March/April 2014

*By* John Giardino, CFE, CPA

The bank comptroller used internal statistics to point the finger at
an over-performing loan officer. Learn how you can use powerful data
analytics tools to narrow your list of suspects for fraud examinations.

The
case in this article, a composite of several similar cases involving
data analytics and statistical analysis applied to fraud examinations,
is designed to be a tutorial for CFEs. — ed.

Jeff Baker, controller for a large regional bank, entered the room hoping to get an admission of guilt from his suspect. He came prepared. Baker had spent considerable time carefully preparing his questions and planning his interview tactics, and he had the documentation to back him up. Above all, he was proud that he had identified a sizeable straw-purchase and kickback fraud scheme, which probably involved the bank employee in the interview room. Baker credited the identification of this scheme to effective analytical procedures that included some basic statistical methods.

Weeks before, when Baker reported a significant spike in defaults on mortgage loans through the second quarter of 2013, several of the bank’s board members expressed concern over a recently adopted growth strategy. In late 2012, the bank had eased underwriting requirements in an effort to increase market share of residential mortgage lending; a major component of these new underwriting practices was an across-the-board reduction in debt-to-income (DTI) requirements for borrowers.

Board members were worried about impact to the balance sheet. They wanted assurances that toxic, defaulted assets wouldn’t erode shareholders’ equity. Baker, however, wasn’t convinced that changes in underwriting guidelines were the root cause of the uptick in loan nonperformance. He was aware that the stalling economic climate within the bank’s operating footprint had exacerbated the moral hazard for mortgage fraud, so he had been following several high-profile prosecutions of straw-purchase schemes at other financial institutions throughout the region.

Despite his employer’s sterling reputation based on lending history, business practices and community involvement, Baker worried that the organization’s reluctance to break from the traditional risk management model created a blind spot to internal threats, weak controls and susceptibility to fraud schemes.

**CORRELATION ANALYSIS: COMPARING TWO VARIABLES**

Figure 1: The CORREL function |

In this case, he compared two variables — DTI requirements and default rate. If the assumption is that default rate is correlated to DTI, then DTI is the independent variable and default rate is the dependent variable. A correlation coefficient value has a range between -1 and 1; values closer to -1 or 1 indicate a negative or positive correlation, respectively. In a negative correlation, the default rate would decrease as DTI requirements increase. In this particular case, the correlation coefficient was very close to 0, which indicates a loose correlation between required DTI and default rate. Baker was correct on his first assumption — the new growth initiatives weren't a significant driver in loan nonperformance.

**IDENTIFYING WHAT SHOULD OCCUR: PROBABILITY DISTRIBUTIONS**

In
statistical terms, a probability distribution refers to a graph, table
or formula, which illustrates the probability for each value of a random
variable, such as household income, IQ or set of test scores. The
normal probability distribution is perhaps the most widely known and is
commonly referred to as a “bell curve” based on its appearance; the mean
of the distribution is represented by the top of the bell curve, as it
represents the expected value. The bell curve is symmetric, and a key
related concept is variation from the mean, measured by the standard
deviation. In a normal distribution nearly all of the possible values —
95 percent, in fact — fall within two standard deviations of the mean.
The greater the standard deviation, the more possible values could occur
naturally.

Baker determined the probability distribution of original loan values of the bank’s outstanding mortgage loans based on the mean and standard deviation values obtained from aggregated internal data. The distribution he identified is represented in Figure 2 below.

Figure 2: Probability distribution - original loan amounts |

This chart indicates that the highest probability of original mortgage loan amounts that the bank originated will be close to the mean value of $213,157. Because 95 percent of all original loans issued by the bank will be within two standard deviations of the mean in either direction, almost all of the mortgage loans made by the bank have an original loan balance between $76,767 and $349,547. If nonperformance of loans occurred randomly and wasn’t tied to any particular characteristic of underwriting, Baker would expect to observe a similar distribution for loans currently in default. However, a statistical analysis of data on the nonperforming loans that the bank originated reveals significantly different characteristics, as presented in Figure 3 below.

Figure 3: Probability distribution - original loan amounts (nonperforming) |

The
population presented in this distribution is those loans within the
bank’s portfolio that are in default. The statistical mean original loan
amount on these nonperforming loans is $95,132 — significantly lower
than the mean original loan amount of the bank’s entire mortgage
portfolio. The observed standard deviation of this population is
$26,538; based on a normal probability distribution, Baker concluded
that virtually all nonperforming loans were originated at amounts
between $42,056 and $148,208. There’s statistical significance in this
disparity: loans originated by the bank that are in default exhibit a
much lower mean original loan amount and degree of variability than the
entire mortgage loan portfolio.

Baker was aware that there
were many possible fraud and non-fraud scenarios that would explain this
disparity. Borrowers with original mortgage loan amounts between
$42,056 and $148,208 may present a greater credit risk based on volatile
employment situations or adverse credit histories. Mortgage loans in
this range typically require a small down payment, which increases the
borrowers’ incentive to “walk away” when situations become dire. Baker
noted that the dispersion of the nonperforming loans is also very
narrow: the coefficient of variation — *or the ratio of standard deviation to the mean*
— is less than 1/3 ($26,538/$95,132 = .27). In other words, there’s a
very narrow range of original loan amount in which a majority of his
employer’s loans default above and below the statistical mean.

Baker
focused on one particular characteristic based on his knowledge of his
organization’s internal control structure: The mean value of the
original loan amount on nonperforming loans is slightly below $100,000,
and the bank requires secondary approval on those mortgages with
original loan amounts above that threshold. This secondary approval
serves as a check against unauthorized (and potentially fraudulent) loan
origination. Baker’s analysis of the probability distribution of
nonperforming mortgage loan data indicated the secondary approval
control might have been circumvented in a mortgage-fraud scenario.

For full access to story, members may sign in here.

Not a member? Click here to Join Now. Or Click here to sign up for a FREE TRIAL.

Your Rating: | ||

Your Review: |

Reviews |

By Michael_22 |

By Jamie_Miller Very interesting article. I enjoyed it. |

By Joshua_7 OUTSTANDING WORK. |

By Mark_Nigrini I like the fraud example because it is different from the usual fictitious vendor or fictitious employee examples that we most often see in fraud detection cases. I like the fact that we have the controller running the analysis as a result of interaction with the Board of Directors and not the internal auditors running the tests as is usually the case. I was happy to see that several different statistical measures were used and all of these measures can be calculated using either Excel or the specialized data analysis software packages that are used by auditors. The analysis uses statistical tests that everyone with a business degree should have seen in their statistics courses. I’d like to hypothesize that Baker also tried some other analytic tests that either gave no results or inconclusive results. It is very seldom that it is a linear procedure where every analysis test performed gives evidence supporting a fraud. After all, fraud is theft by deception that includes the act of concealment and we’re t |

By PATRENA_SANDERMAN_2 |