Embracing complexity via text mining


By Jeremy Clopton, CFE, CPA; Les E. Heitger, Ph.D., Educator Associate; Lanny Morrow, EnCE
les-heitger jeremy-clopton lanny-morrow





Fraud EDge: A forum for fraud-fighting faculty in higher ed

In the last column, we began a discussion of leading-edge technologies that have the potential to provide significant amounts of useful information to fraud examinations. We also introduced the collaboration between data mining and digital forensics, which is driven by the increasing volume of structured and unstructured data that can account for as much as 80 percent of the total data in an organization. Now we address text mining (sometimes called text analytics) and discuss its characteristics and possible areas of application in fraud cases. We look at the components of text mining and how practitioners might utilize these methods to analyze large data sets to provide information that achieves the fraud examiner's goals.

SOURCES OF TEXT USED IN FRAUD EXAMINATIONS

Some of the more commonly used sources of unstructured data in an examination include:

  • Email communications (corporate and web-based).
  • Documents.
  • Social media.
  • Chat, texting and instant messaging.
  • Websites and blogs.
  • Contents of computer hard drives, mobile devices and cloud storage.

While there are many other potential sources, experience has shown these to be the most common in corporate examinations.

Email, chief among unstructured data sources in fraud examinations, not only contains word-for-word communications but also possesses a date/time element, metadata and even emotional tones expressed through idioms, phrases and adjectives. Fraud examiners can use these components to analyze the personalities of the communications and the communicators.

The contents of computer hard drives include not just email but also documents, audio and video, caches of Internet activity, discarded instant messaging and chat sessions, deleted content and overlooked backup and temporary copies of items. Digital forensics technologies can preserve, identify and produce these obscure items.



  


For full access to story, members may sign in here.

Not a member? Click here to Join Now. Or Click here to sign up for a FREE TRIAL.