Survey of Sensitive Information Detection Techniques: The need and usefulness of Machine Learning Techniques
Contributed by Riya Shah and Manisha Valera
The amount of digital data generated is growing by the day and so is the need to protect sensitive content from being published on the Internet. This paper talks about studies in the field of content-based detection of both personal sensitive information like name, birth date, medical records and corporate sensitive information like confidential agreements, earnings reports. It describes previous dominant technology in data leakage prevention along with their shortcomings, resulting in the need for machine learning being used. Since personal sensitive information usually has a certain pattern, regular expressions are the most efficient way to detect those. However, corporate sensitive information generally does not follow a particular pattern and varies in different companies. This paper talks about research related to the use of machine learning in detecting sensitive content, eventually concluding the need for further research in detecting corporate sensitive information.
This document is in PDF format. To view it click here.