New Breach Data Lesson I: Barcode Scanning

This is the first post of a three-part series about the new ways in which breach data can be beneficial for both offense and defense.

I have mentioned on my show that my company collects breach data as part of our investigations and privacy services. Last year, we started ingesting stolen password logger data and leaked ransomware content. Today, we bring in over 2 Terabytes of new published stolen content weekly, which we parse down to an average of 25 GB of useful data for the week. Our entire breach data collection is over 25TB. The cyber world is an absolute mess.

Yesterday, we took in a large ransomware data set which included scans of the front and back of thousands of customers' driver's licenses. Our systems analyze all images, documents, and PDFs, and extract any text content via Optical Character Recognition (OCR). This allows us to query or monitor our clients' details, which can notify us when their identification cards or financial data is present online. This process has weaknesses, especially when scanned images are of poor quality. The text must be properly identified in order to receive any alerts about client data.

Recently, we started scanning all documents for any type of barcode, including QR codes and license barcodes. This has paid off greatly, and this recent breach turned out to be quite beneficial. Our automated barcode scanning element immediately identified the text details of all customers, which allows us to query by name, address, DOB, DL number, etc. Below is a redacted back of a scanned DL from this recent ransomware breach:

Below is the (redacted) automated text conversion of that barcode, which allows us to ingest text data into our overall breach database for easy access.

The first lesson here is that barcode analysis of entire data sets can reveal much more data about the victims within the breach. In this example, our systems ingested the images, performed OCR on all text, scanned the barcodes, populated all text data within our database, and alerted us that a client's driver's license was within the breach, all without any manual effort from us. I present more on this on Friday's show.

The second lesson is to never allow companies to scan your identification. It will never be stored securely and will be leaked during the next attack. The barcode on the back of a passport card only includes the passport number, and may be much safer whenever required to hand over ID. My driver's license possesses a vinyl sticker on the back which contains a new barcode. Scanning it reveals a stern text message about consumer rights. I have yet to allow anyone to scan it though.

Tomorrow, the second part of this series will explain detailed use of stealer logs for both OSINT (offense) and privacy (defense).