Management Commands

OS2datascanner consists of multiple Django apps, and each Django app can register their own actions with manage.py.

In our project, we have multiple such actions/commands registered.

Custom commands must be created in the correct path (<app>/management/commands) of the Django app they concern.

More information regarding how to create custom commands, can be found in Django's documentation (5.2)

Quick guide to command execution

In general, to execute these commands, do one of the following, dependent on whether your container is already running:

Container running: docker-compose exec <app> django-admin <command>

Container not running: docker-compose run <app> django-admin <command>

Admin application

list_scannerjobs

This command lists all scanner jobs and the following attributes:

  • Primary Key
  • Name
  • Start time
  • Number of objects scanned
  • Scan status (as a bool)
  • Checkup messages (as count())

To execute this command run: docker-compose exec admin python manage.py list_scannerjobs

initial_setup

This command is used to set up an initial client, organization, and superadmin user. The client and organization can be given a name using --client-name and --org-name respectively, and if no name is given, they will take the name from settings.NOTIFICATION_INSTITUTION. The name of the user and corresponding account can be set with --username. "os" is the default value for username.

If a client, organization, or user with the given name already exists, the command will fail, and you will have to call it again with unused names.

The available arguments are:

  • --phone <phone number>, sets client contact phone number
  • --email <email adress>, sets client contact email adress
  • --username <username>, sets username for created user and corresponding account. Default: "os"
  • --client-name <client name>, sets name of created client
  • --org-name <organization name>, sets name of created organization
  • --password <password>, sets password for the superadmins user
  • --load-cpr-rule, loads the "rules-cpr-da" fixture and adds the loaded system rule to the created organization.

The corresponding command in the Report module needs to be run after this to update the created user for use in the Report module.

quickstart_dev

This command is only intended for getting a developer environment up-and-running quickly. It creates a user named dev with the password dev and registers the Samba share from the docker-compose dev env as a filescan.

There is a corresponding command in the Report module.

pipelinectl

This command is used to interact with a running pipeline by sending it a prioritised "command message" that pipeline components can react to.

Currently, the command can be used to abort running scans, to change the log log level of the live system, and to obtain profiling information from live processes.

Abort a scan

To abort a scan, use one of the following flags:

Flag Help
--abort-scantag the tag of a running scan that should be stopped
--abort-scannerjob the primary key of a scanner job whose most recent scan should be stopped
--abort-scanstatus the primary key of a ScanStatus object whose scan should be stopped

Change the log level

To change the log level of the running pipeline, use --log-level=LEVEL where LEVEL can be any of critical, error, warn, warning, info or debug.

Switch profiling on and off

To enable runtime profiling for pipeline components, use the --profile parameter. To switch it off again, use --no-profile.

While runtime profiling is enabled, Python's cProfile profiler will record details of what each pipeline process is doing and how long it takes. Disabling runtime profiling will print the active profile to the log (sorted by the total time spent in each function call) before clearing it and switching the profiler off.

Attempting to enable runtime profiling while it's already enabled (that is, calling pipelinectl --profile twice) will print and clear the active profile, effectively resetting the profiler without switching it off.

cleanup_account_results

This command is used to delete all document reports of a specified account and scanner job. The command is primarily intended for programmatic use, but can be called manually.

The command must include the following options:

  • --accounts followed by space-separated UUIDs or usernames of existing accounts.

  • --scanners followed by space-separated primary keys of existing scanners.

diagnostics

This command is used to give basic information about some objects in the database. The tool will warn you about certain broken objects or known issues, such as Account objects with an empty string as a username. The intended use is an initial automatic analysis of the database, which hopefully leads to some useful information for further debugging.

The command can be called with the --only option, followed by one or more of the following arguments:

  • "Account"
  • "Alias"
  • "OrganizationalUnit"
  • "Organization"
  • "UserErrorLog"
  • "Rule"

scanstatus

Replicates the useful information normally presented on the "scanner status" page of the UI for a given ScanStatus object. The command takes one argument: The primary key of the ScanStatus object.

keycloakctl

Perform various tasks in Keycloak through its API, with the admin module as entry point.

Usage: keycloakctl <action> --argument

Actions

  • list-realms: Lists ID and name of available realms in Keycloak
  • search-users: Used with --username, searches Keycloak users by their usernames.
  • delete-user: Used with --user-id (Keycloak user ID) deletes that user
  • recreate-federation: Deletes Keycloak User federation (if present) and creates a new one based on OSdatascanner LDAPConfig

All actions but list-realms, can be provided a Keyclok realm id (which looks like a name), by --realm-id. If no realm id is provided, it is assumed that only one OSdatascanner Realm is present, and then that'll be used.

OBS: Do note that deleting Keycloak users (f.e. withdelete-user) when Keycloak is set up with a User federation, may not behave as you expect. The user will be deleted, but will immediately be re-imported by Keycloak, just with a new id.

Report application

scannerjob_info

Provided a PK of a scanner job finds associated document reports and lists:

  • Scanner job name & PK
  • Total message count
  • Problem message count
  • Match message count
  • Mimetype info and count on messages

To execute this command run: docker-compose exec report python manage.py scannerjob_info <PK>

leader_overview

Replicates the information found on the leader overview page of the report module UI.

Takes an organization UUID as an argument and shows all accounts related to that organization.

Takes two optional arguments:

  • --unit: An OrganizationalUnit UUID. Restricts shown accounts to only those related to the unit.
  • --leader: An Account username. Restricts shown accounts to only those directly managed by the account.

dpo_overview

Replicates the information found on the DPO overview page of the report module UI.

Takes three optional arguments:

  • --organization: The name or UUID of an organization. If none are given, the tool assumes only one organization exists and uses that one. If multiple organizations exist, this argument is required. Names are case insensitive.
  • --unit: An OrganizationalUnit UUID. Restricts data foundation to reports related to accounts related to the unit or descendant units of the unit.
  • --scanner: A scanner primary key (from the admin module). Restricts data foundation to reports related to ScannerReference objects containing that scanner_pk value.

user_overview

Replicates the information found on the user overview page of the report module UI.

Takes an account username as an argument and shows information about that account.

initial_setup

This command should only be used after running the corresponding command in the admin module.

This command updates the user created by running initial_setup in the admin module.

The available arguments are:

  • --username <username>, <username> must be the same username used in the admin module
  • --password <password>, sets the password of the superadmins user

quickstart_dev

This command is only intended for getting a developer environment up-and-running quickly. It creates a user named dev with the password dev and registers the user as remediator.

There is a corresponding command in the Admin module.

list_problems

List the problem messages for a scanner job.

Optionally takes a --head argument to limit the output.

To execute this command run: docker-compose exec report django-admin list_problems <PK>

makefake

Randomizes and creates new data types to populate document reports.

To execute this command run: docker-compose exec report python manage.py makefake args: --scan-count = amount of scans (default: random amount between 5 and 10) --page-count = amount of pages where at least 1 match will be found (default: random amount between 5 and 10)

performance_scan

creates and runs 2 (limited by gitlabs runners max timeout) scans and then measures the average time a scan takes. Also generates a report with cProfile that informs about which methods have used how much time. However the report only monitors the admin module when run. In the future the reports generated in the pipeline should contain information about the engine components. the location of the output is a .prof file located in /src/datascanner/ of the project dir. The .prof file is used with snakeviz( pip install snakeviz), to give an icicle visualization of the performance of the scan. To see the visualization: snakeviz {report_location}/performance.prof

To execute this command run: docker-compose exec -u 0 admin python manage.py performance_measurement

diagnostics

This command is used to give basic information about some objects in the database. The tool will warn you about certain broken objects or known issues, such as Account objects with an empty string as a username. The intended use is an initial automatic analysis of the database, which hopefully leads to some useful information for further debugging.

The command can be called with the --only option, followed by one or more of the following arguments:

  • "Account"
  • "Alias"
  • "OrganizationalUnit"
  • "Organization"
  • "DocumentReport"
  • "Problem"

account_coverage

This command figures out which account has been scanned by which scanner at which time and delivers that information to the checkup_collector. The checkup_collector will then recreate CoveredAccounts for those combinations of accounts and scanners at those scan-times.

In an environment with multiple organizations, the command must be supplied with an organization uuid using the --organization argument.

Additionally, the command can be supplied with the primary key of a scanner in the admin module using the --scanner argument. Then, only CoveredAccounts for that scanner will be created.

The command will look at both the "scan_time" field and the "time" value of the "raw_scan_tag" json field of the DocumentReports in the report module to determine scan times for different scanners and accounts.

Limitations

This tool can only recreate CoveredAccounts based on the information available on DocumentReports in the report module. If an account does not have any associated results in the report module for example, no CoveredAccounts will be recreated for them in the admin module, even if they are covered by the scan.

The MSGraphFileScanner can scan based on both organizational units (OneDrive) and Sharepoint. Normally, CoveredAccounts would only be created for the accounts covered by the scanner through the related organizational units when executing the scan. However, the tool is not able to distinguish results stemming from OneDrive or Sharepoint, and will create CoveredAccounts not only for the accounts covered through the related organizational units, but also for the accounts with results from Sharepoint.