The answers to the “Five W’s” that every investigator knows from school form the basis of the complete story of an investigation:
• Who is it about?
• What happened?
• When did it take place?
• Where did it take place?
• Why did it happen?
Some investigators add a sixth and seven question, ‘how’, and ‘how much’ to the list.
• How did it happen?
• How much or how often?
During an investigation, the answers to these questions have to be found and the technology that is used needs to organize and prepare data along these questions.
Manually analyzing the data to get the answers to these questions is very time consuming and requires lots of resources. As investigators often do not exactly know what (words) to search for, one has to go beyond simple keyword. Criminals may use aliases; transfers may be done by unknown off-shore companies or via unknown bank accounts, etc. This all complicates and slows down the investigative process. On top of this, the size of electronic data sizes that need to be investigated continues to grow, making the problem larger and more complex as time passed by.
Computer technology can be of great benefit in this process: if computers are good at one this, it is analyzing large data sets at tremendous speed for specific patterns. Recent progress in text mining, computational linguistics, statistics, machine learning and even artificial intelligence, make it possible to analyze the data specifically focused to find data that helps to answer the Golden W questions.
Modern text mining and content analytic technology is able to search on a higher level than just key words, for instance look for linguistic patterns like ‘someone pays someone else’ or ‘someone meets someone else at a certain location and at a certain time’ without the need to identify the exact names or amounts up front. By extracting such patterns and by using simple statistics, one can easily identify unknown persons, companies, bank account numbers, and also spot code names and aliases.
In addition, criminals try to cover up illegal activities as good as possible by hiding information in non-searchable file formats or by embedding different types of electronic objects into complex compound files where the most relevant information is often hidden in the deepest layers. Your technology should help you to identify information even when it is hidden in the deepest layers and even when the information seems unsearchable because it is a bitmap, an image, a non-searchable PDF, an audio file or even a video.
By using this technology, computer are a great at finding potential relevant information at speeds many times faster and more efficient humans could ever do it. At the same time, but interactively and efficiently presenting the found results to skilled and experienced investigators, the validity of the potential relevant information can quickly be analyzed to prevent so-called tunnel vision and identify invalid evidence or investigation directions.
Over the last years, I have seen many real-life cases where this hybrid man-machine approach has resulted in finding minimally two times more relevant information with maximally half the resources in half the time! A great example where Big Data analytics can lead to Big Savings!
#e-discovery #internalinvestigations #lawenforcement