Docker Compose
You can (and should) use docker compose to start the OSdatascanner system and its runtime
dependencies.
A docker-compose.yml for development is included in the repository. It
specifies the settings to start and connect all required services.
Admin
admin
Reachable on: http://localhost:8020
Runs the django application that provides the administration interface for defining and managing organisations, rules, scans etc.
admin_checkup_collector
Runs the collector service that saves ScheduledCheckup and UserErrorLog objects
to the admin database.
ScheduledCheckups are objects a Scanner must revisit at its next run.
UserErrorLogs are objects that represent some error during a scan, which can be displayed for
the user.
admin_cron
Runs supercronic pointed at docker/admin/crontab which is responsible for starting scheduled
Scannerjobs, import jobs and checking GraphGrant expiry.
admin_job_runner
Responsible for picking up and executing BackgroundJobs, which are our different import jobs.
admin_status_collector
Runs the collector service that saves ScanStatus, ScanStatusSnapshot
and MIMETypeProcessStat objects.
These are objects that relate to the progress of a Scanner run.
Report
report
Reachable on: http://localhost:8040
Runs the django application that provides the interface for accessing and handling reported matches.
report_cron
Runs supercronic pointed at docker/report/crontab which is responsible for sending out emails
and pushing some metrics.
report_email_tagger
Optional service/required only for labelling of O365 emails.
Reads from RabbitMQ os2ds_email_tags queue and labels O365 emails in Outlook.
report_event_collector
Runs the collector service that saves events sent from the admin module.
An event is an update or deletion of one or more model objects. F.e. Organization, Account
or others.
report_result_collector
Runs the collector service that saves match results to the database of the report module.
Depending on configuration, it can also write to the RabbitMQ queue: os2ds_email_tags,
which is read by the optional email_tagger
Engine
explorer
Runs the explorer stage of the engine.
exporter
Runs the exporter stage of the engine.
worker
Runs the worker stage of the engine, which includes:
processormatchertagger
These can be run separately, but are most commonly used bundled as a worker.
API
api_server
Runs the OSdatascanner API.
swagger_ui
Swagger UI for our api_server
Required infrastructure
db
Runs a postgres database server based on the official postgres docker image.
queue
Runs a RabbitMQ message queue server based on the official RabbitMQ docker image , including a plugin providing a web interface for monitoring (and managing) queues and users.
The web interface can be reached on: http://localhost:8030
Helper / one-shot services
admin_migrate
Responsible for applying database migrations from everything in the "admin" project.
frontend
Builds and bundles different frontend static files.
pg_upgrade
A helper service specific to upgrading PostgreSQL version.
report_migrate
Responsible for applying database migrations from everything in the "report" project.
Test data sources
init_SBSYS_DB
A helper service that pre-populates the SBSYS_DB with relevant test data.
mailhog
A SMTP-server for testing purposes.
Web interface available at http://localhost:8025/.
nginx
A webserver that exposes the same folder as samba.
samba
A Samba service that serves up some test files in a shared folder
called e2test. This can be useful to test the file scanner.
The Samba server is available as samba:139 inside the Docker environment
and localhost:8139 on the host machine. The full UNC of the shared folder
in the Docker environment is //samba/e2test.
The Samba server doesn't require a workgroup name, but it does require a
username (os2) and password (swordfish).
Thus, from the host: smbclient -p 8139 -U os2%swordfish //localhost/e2test
SBSYS_DB
An MSSQL server used to simulate an SBSYS database.
Authentication & SSO (testing)
idp
A simplesamlphp identity provider which can be used to test Single Sign-On.
ldap-server
An openldap server that can be used to test LDAP import.
ldap-server-admin
Graphical interface for ldap-server
localkeycloak.os2datascanner
A Keycloak instance used for LDAP import and Single Sign-On.
postgres-keycloak
PostgreSQL database for Keycloak.
traefik
Traefik instance.
Observability
cadvisor
Collects certain system metrics from containers.
grafana
Dashboards to display metrics.
prometheus
For collecting metrics and a place to scrape them from.
pushgateway
A prometheus helper, to push metrics to.
Machine learning
ai-classifier
Runs a text classifier that can be used in OSdatascanner Rule creation.
Developer tooling
docs
Runs an mkdocs image, for locally viewing our ReadTheDocs.
Profiles
The docker-compose.yml use --profiles which requires version
docker compose > 1.28.
To start the core components(engine, report- and admin interface, db
and queue) of datascanner, use
docker compose up -d
To start a service behind a profiles flag, use
docker compose --profile api up -d
The following profiles are available: ldap, sso, api, metric and
ai.
The development config files are stored in os2datascanner/dev-environment/
--profile ldap
The ldap profile defines a Keycloak instance and connected Postgres database,
as well as a OpenLDAP server and admin interface.
Be sure to enable import and structured org. features on the client in the admin module's django admin page.
--profile sso
Starts Keycloak and its db, and the idp service.
--profile cron
The cron profile starts two instances of supercronic,
one for the admin system and one for the report module, for testing that task
scheduling works properly.
These services are not enabled by default because settings that make sense in production ("start this scanner at midnight", "send emails out every day at 10am", or "synchronise this organisation every day at 3pm") don't normally make sense in the development environment.
--profile ai
The ai profile starts the ai-classifier helper component, which performs
additional classification on the results of certain special CPR and regex
matches.
--profile metric
Starts the observability stack.