|
© iStock/Thinkstock |
The bank comptroller used internal statistics to point the finger at
an over-performing loan officer. Learn how you can use powerful data
analytics tools to narrow your list of suspects for fraud examinations.
The
case in this article, a composite of several similar cases involving
data analytics and statistical analysis applied to fraud examinations,
is designed to be a tutorial for CFEs. — ed.
Jeff Baker,
controller for a large regional bank, entered the room hoping to get an
admission of guilt from his suspect. He came prepared. Baker had spent
considerable time carefully preparing his questions and planning his
interview tactics, and he had the documentation to back him up. Above
all, he was proud that he had identified a sizeable straw-purchase and
kickback fraud scheme, which probably involved the bank employee in the
interview room. Baker credited the identification of this scheme to
effective analytical procedures that included some basic statistical
methods.
Weeks before, when Baker reported a significant spike in
defaults on mortgage loans through the second quarter of 2013, several
of the bank’s board members expressed concern over a recently adopted
growth strategy. In late 2012, the bank had eased underwriting
requirements in an effort to increase market share of residential
mortgage lending; a major component of these new underwriting practices
was an across-the-board reduction in debt-to-income (DTI) requirements
for borrowers.
Board members were worried about impact to the
balance sheet. They wanted assurances that toxic, defaulted assets
wouldn’t erode shareholders’ equity. Baker, however, wasn’t convinced
that changes in underwriting guidelines were the root cause of the
uptick in loan nonperformance. He was aware that the stalling economic
climate within the bank’s operating footprint had exacerbated the moral
hazard for mortgage fraud, so he had been following several high-profile
prosecutions of straw-purchase schemes at other financial institutions
throughout the region.
Despite his employer’s sterling reputation
based on lending history, business practices and community involvement,
Baker worried that the organization’s reluctance to break from the
traditional risk management model created a blind spot to internal
threats, weak controls and susceptibility to fraud schemes.
CORRELATION ANALYSIS: COMPARING TWO VARIABLES |
Figure 1: The CORREL function
|
After
buying some time from the board members by voicing his suspicions,
Baker tested his theory that the easing of DTI requirements was
unrelated to the spike in nonconforming loans. He calculated the
correlation coefficient of nonperforming loans to DTI requirements.
Baker obtained aggregated default rate data of loans made at various DTI
requirements from internal management reports and performed his
calculations in Microsoft Excel using the CORREL function. (CORREL is
based on the mathematical formula, Figure 1, above.)
In this
case, he compared two variables — DTI requirements and default rate. If
the assumption is that default rate is correlated to DTI, then DTI is
the independent variable and default rate is the dependent variable. A
correlation coefficient value has a range between -1 and 1; values
closer to -1 or 1 indicate a negative or positive correlation,
respectively. In a negative correlation, the default rate would decrease
as DTI requirements increase. In this particular case, the correlation
coefficient was very close to 0, which indicates a loose correlation
between required DTI and default rate. Baker was correct on his first
assumption — the new growth initiatives weren't a significant driver in
loan nonperformance.
IDENTIFYING WHAT SHOULD OCCUR: PROBABILITY DISTRIBUTIONS
In
statistical terms, a probability distribution refers to a graph, table
or formula, which illustrates the probability for each value of a random
variable, such as household income, IQ or set of test scores. The
normal probability distribution is perhaps the most widely known and is
commonly referred to as a “bell curve” based on its appearance; the mean
of the distribution is represented by the top of the bell curve, as it
represents the expected value. The bell curve is symmetric, and a key
related concept is variation from the mean, measured by the standard
deviation. In a normal distribution nearly all of the possible values —
95 percent, in fact — fall within two standard deviations of the mean.
The greater the standard deviation, the more possible values could occur
naturally.
Baker determined the probability distribution
of original loan values of the bank’s outstanding mortgage loans based
on the mean and standard deviation values obtained from aggregated
internal data. The distribution he identified is represented in Figure 2
below.
|
Figure 2: Probability distribution - original loan amounts
|
This
chart indicates that the highest probability of original mortgage loan
amounts that the bank originated will be close to the mean value of
$213,157. Because 95 percent of all original loans issued by the bank
will be within two standard deviations of the mean in either direction,
almost all of the mortgage loans made by the bank have an original loan
balance between $76,767 and $349,547. If nonperformance of loans
occurred randomly and wasn’t tied to any particular characteristic of
underwriting, Baker would expect to observe a similar distribution for
loans currently in default. However, a statistical analysis of data on
the nonperforming loans that the bank originated reveals significantly
different characteristics, as presented in Figure 3 below.
|
Figure 3: Probability distribution - original loan amounts (nonperforming)
|
The
population presented in this distribution is those loans within the
bank’s portfolio that are in default. The statistical mean original loan
amount on these nonperforming loans is $95,132 — significantly lower
than the mean original loan amount of the bank’s entire mortgage
portfolio. The observed standard deviation of this population is
$26,538; based on a normal probability distribution, Baker concluded
that virtually all nonperforming loans were originated at amounts
between $42,056 and $148,208. There’s statistical significance in this
disparity: loans originated by the bank that are in default exhibit a
much lower mean original loan amount and degree of variability than the
entire mortgage loan portfolio.
Baker was aware that there
were many possible fraud and non-fraud scenarios that would explain this
disparity. Borrowers with original mortgage loan amounts between
$42,056 and $148,208 may present a greater credit risk based on volatile
employment situations or adverse credit histories. Mortgage loans in
this range typically require a small down payment, which increases the
borrowers’ incentive to “walk away” when situations become dire. Baker
noted that the dispersion of the nonperforming loans is also very
narrow: the coefficient of variation — or the ratio of standard deviation to the mean
— is less than 1/3 ($26,538/$95,132 = .27). In other words, there’s a
very narrow range of original loan amount in which a majority of his
employer’s loans default above and below the statistical mean.
Baker
focused on one particular characteristic based on his knowledge of his
organization’s internal control structure: The mean value of the
original loan amount on nonperforming loans is slightly below $100,000,
and the bank requires secondary approval on those mortgages with
original loan amounts above that threshold. This secondary approval
serves as a check against unauthorized (and potentially fraudulent) loan
origination. Baker’s analysis of the probability distribution of
nonperforming mortgage loan data indicated the secondary approval
control might have been circumvented in a mortgage-fraud scenario.
For full access to story, members may
sign in here.
Not a
member? Click here to Join Now.
Or Click here to sign up for a FREE
TRIAL.