Architecture and Security

OS2datascanner is a system designed to help organisations to follow the EU's General Data Protection Regulation (GDPR). OS2datascanner is designed to find personally-sensitive information without copying it: only references to its findings are saved. The system's underlying architecture is also built according to this principle, to make sure that OS2datascanner does not itself make personal information available to unauthorised parties.

The system's architecture

OS2datascanner is built around several key security principles, the most important of which is isolation. OS2datascanner is broken up into three main modules, each of which has a specific responsibility:

  1. The administration system is a web application used by system administrators and owners to configure the search rules and the data sources to search. Users of this system must be established manually, and its data is not shared with the report module. All credentials and authorisation details stored by this system are encrypted with AES.

  2. The scanner engine consists of many processes, each of which carries out a single scan task. These processes communicate exclusively through message queues, and have no access to databases and no way of persisting data; they additionally run in restricted environments and can, for example, only create temporary filesystem files. Sensitive data under processing can be stored in these files, but will be deleted as soon as processing is complete.

  3. The report module reports the scanner engine's findings. It is another web application, to be used to manage and oversee an organisation's use of personally-sensitive information. A match from the scanner engine consists of a redacted reference to the object where the match was found; any metadata extracted from that file; a brief redacted snippet of context for the match; and a summary of the rules that led to the find. Matches are automatically assigned to the report module's users based on the extracted metadata. Users typically log into this system through a SSO provider like Active Directory.

All communication between the parts of the system is conducted through message queues, which makes the system loosely coupled: every component and subcomponent can live on its own server or container.

Recommendation: local network

It is strongly recommended to install OS2datascanner on a private network alongside the data sources to be scanned. In this way, sensitive information will never leave the organisation's control.

Security

OS2datascanner must be granted read access to data sources in order to be able to function. When this is granted through a special service user account, the account's details are encrypted before being saved to the database. The system uses symmetric AES cryptography in CTR mode. (At the time of writing, no practical attack on the integrity of this cryptosystem is known.)

The encryption and decryption process requires three values: a master key (32 randomly-generated bytes), an initialisation vector, and the text to encrypt or decrypt. All three values are required for encryption or decryption.

The master key is saved on the server's hard disk, and the initialisation vector and the encrypted form of the account details are saved together in the administration system's database. This ensures that access to the disk or database alone is insufficient to gain access to the encrypted details.

The scanner engine is given access to the decrypted credentials in order to perform its tasks, but these credentials are stripped from all messages before they are sent to the report module.