Anonymous Email Authors can be Traced


Researchers with NCFTA Canada and the Concordia Institute for Information Systems Engineering have developed an effective new technique to determine the authorship of anonymous emails. The study has received international coverage. The original story was published in the March 7 edition of the Concordia Journal.

-------------

A team of researcher scientists from NCFTA Canada have developed an effective new technique to determine the authorship of anonymous emails. Tests showed their method has a high level of accuracy and unlike many other methods of ascertaining authorship, it can provide presentable evidence in courts of law. The new technique is profiled in a study published in Digital Investigation and Information Sciences.

"In the past few years, we've seen an alarming increase in the number of cybercrimes involving anonymous emails," say study co-authors Mourad Debbabi and Benjamin Fung, research scientists of NCFTA Canada, professors of Information Systems Engineering at Concordia University, and experts in cyber forensics and data mining. "These emails can transmit threats or child pornography, facilitate communications between criminals or carry viruses."

While police can often use the IP address to locate the house or apartment where an email originated, they may find many people at that address. They need a reliable, effective way to determine which of several suspects has written the emails under investigation.

Debbabi, Fung, and their supervised PhD student Iqbal developed a novel method of authorship attribution to meet this need, based on techniques used in speech recognition and data mining. Their approach relies on the identification of frequent patterns unique combinations of features that recur in a suspect's emails.

To determine whether a suspect has authored the target email, they first identify the patterns found in emails written by the subject. Then, they filter out any of these patterns which are also found in the emails of other suspects. The remaining frequent patterns are unique to the author of the emails being analyzed. They constitute the suspect's 'write-print,' a distinctive identifier like a fingerprint. "Let's say the anonymous email contains typos or grammatical mistakes, or is written entirely in lowercase letters," says Fung. "We use those special characteristics to create a write-print. Using this method, we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author."

To test the accuracy of their technique, Debbabi, Fung, and their colleagues examined the Enron Email Dataset, a collection which contains over 200,000 real-life emails from 158 employees of the Enron Corporation. Using a sample of 10 emails written by each of 10 subjects (100 emails in all), they were able to identify authorship with an accuracy of 80% to 90%.

"Our technique was designed to provide credible evidence that can be presented in a court of law," says Fung. "For evidence to be admissible, investigators need to explain how they have reached their conclusions. Our method allows them to do this."