Large data volumes (aka “big data”) coupled with the use of new technologies can greatly increase the amount of Personally Identifiable Information (PII) data collected by an organization. Correspondingly, there has been an escalation of security breaches involving PII data which has contributed to the loss of millions of records over the past few years. The recommended mitigation strategy is to assume security postures in accordance with industry best practices, which includes adequate training for technology users. However, an organization cannot properly protect PII if it does not know PII data resides on computers and servers. One solution is to purchase commercial products that scan, extract, and report PII data. Such products are often prohibitively expensive, and they tend to suffer from “feature bloat” which makes them difficult and overly complex for simple use cases. A compromise is to develop scripted components that utilize regular expressions/keyword searches to discover PII instances in textual content. A significant challenge is that most PII content is encoded in common binary file formats (such as PDF), which are not directly searchable as is text.
This webinar will discuss a prototype software tool developed to detect and extract PII from over a thousand binary file formats. The prototype, called “BFAS – Binary File Application Scanner”, seamlessly injects a text extraction facility into the standard (existing) Powershell pipeline. A graphical user interface (GUI) was developed to facilitate multiprocessing and XML-based reporting and visualization. Ideas for extending the BFAS architecture to leverage machine learning (ML) methods will also be discussed.
Mike,
Can you share the slides or information material that was presented today?
Best Regards,
Vagish Shanmukh
The video and the slides from the webinar are available on this page. You must be registered and logged in to download the slides. Thank you for your participation in the CSIAC community.
Very good solution and nice way to package open source components. I’ve done some of these tasks in isolated prototypes (for work and for home) and this gives me some ideas of where I can go with those. I would like to see more from Mr. Corley. Thanks.
I am still trying to figure out where can we download the BFAS tool and get updates?
Hello. Currently BFAS is still prototype project and actively under development. It is not available for download yet, but we are having discussions about how/when to release. We will inform everyone when updates are available. We will be developing follow on podcasts that build on the webinar ideas with more demonstration/discussion. Thanks for your interest!
Mike