Architecture¶
Renovate¶
Dependency updates are managed by Renovate. The configuration lives in renovate.json5 (root) and split files under .renovate/:
| File | Purpose |
|---|---|
renovate.json5 |
Root config — global settings and extends index |
.renovate/autoMerge.json5 |
Auto-merge policy for GitHub Actions |
.renovate/customManagers.json5 |
Regex managers for SOPS version, mise min_version, workflow versions |
.renovate/groups.json5 |
Grouped updates (postgres, mise) |
.renovate/labels.json5 |
PR labels by update type and datasource |
.renovate/packageRules.json5 |
Release age gates, stale-dependency flag, linuxserver versioning |
.renovate/semanticCommits.json5 |
Scoped commit messages with version arrows |
Update timing policy¶
All updates must meet a minimum release age before Renovate opens a PR, giving time for bad releases to be retracted:
| Update type | Manager / datasource | Minimum age |
|---|---|---|
| minor / patch | actions/* GitHub Actions |
3 days, auto-merged |
| digest | All GitHub Actions | 14 days, auto-merged |
| minor / patch | All other GitHub Actions | 14 days, manual merge |
| major | Everything | 14 days, manual merge |
| minor / patch / digest | Docker images | 14 days, manual merge |
| minor / patch / digest | GitHub Releases | 14 days, manual merge |
| minor / patch / digest | mise tools |
14 days, manual merge |
Auto-merges use automergeType: "branch" (direct push, no PR) and require CI to pass. Major updates always require a manual merge regardless of datasource.
Rule precedence note¶
packageRules are applied in the order they appear across all extends entries — last matching rule wins for each property. autoMerge.json5 is loaded before packageRules.json5, so packageRules.json5 must not contain a matchManagers: ["github-actions"] timing rule or it would override the 3-day exception for actions/*.
Commit Message Convention¶
All commits follow the Conventional Commits specification:
Common types: feat, fix, chore, docs, refactor, ci. The scope is typically the service folder name (e.g. feat(immich):, fix(traefik):). Compliance is enforced locally by a lefthook commit-msg hook using cog verify.
Release Process¶
Releases are version-tagged on main and automatically published as GitHub Releases via a CI workflow.
Creating a release¶
# Bump the minor version (updates CHANGELOG.md, commits, tags, and pushes)
cog bump --minor
# Or patch for bug-fix releases
cog bump --patch
# Dry-run to preview the next version without making changes
cog bump --minor --dry-run
cog bump orchestrates the full release:
- Calculates the next semver version from conventional commits since the previous tag
- Runs
git-cliff --tag <version> --output CHANGELOG.mdto regenerate the full changelog - Runs
dprint fmt CHANGELOG.mdto ensure the changelog passes CI formatting checks - Creates a
chore(release): bump version to <version>commit containing the changelog update - Creates the
v<version>git tag - Pushes the commit and tag to
origin
The tag push triggers .github/workflows/release.yml, which runs git-cliff --latest --strip all
to produce release-scoped notes and creates the GitHub Release automatically.
Tools¶
| Tool | Role |
|---|---|
cog |
Version bump, bump commit, git tag, push orchestration |
git-cliff |
Changelog generation (CHANGELOG.md + GitHub Release notes) |
cliff.toml |
Commit grouping, body template, GitHub commit link configuration |
cog.toml |
Bump hooks, tag prefix, merge-commit filtering |
Compose File Standards¶
Every service in this repo follows these conventions:
services:
example:
image: registry/image:tag@sha256:... # Always pin to digest
container_name: example # Explicit name for predictable references
env_file:
- .env # SOPS-decrypted secrets
- ../shared/env/tz.env # Shared timezone
user: "3100:3100" # Hardcoded UID:GID (see § UID/GID Allocation)
deploy:
restart_policy:
condition: on-failure # Restart only on crash (non-zero exit)
max_attempts: 3 # Stop after 3 rapid crashes within the window
window: 120s # Counter resets if the container is up > 2 min
networks:
- <service>-frontend # Traefik-facing network
mem_limit: ${MEM_LIMIT:-<default>} # Prevent runaway memory
pids_limit: 100 # Prevent fork-bomb DoS
security_opt:
- no-new-privileges=true # Block privilege escalation
cap_drop:
- ALL # Drop every capability …
# cap_add: # … re-add only what is provably needed
# - NET_BIND_SERVICE
read_only: true # Immutable root filesystem
tmpfs:
- /tmp # Writable scratch space
healthcheck: # Required for --wait deploys
test: [...]
interval: 15s
timeout: 5s
retries: 3
start_period: 10s
labels:
- "traefik.enable=true" # Opt-in to Traefik discovery
- "traefik.http.routers...middlewares=chain-auth@file"
Key rules:
- Images must always include an explicit registry prefix (e.g.
docker.io/library/busybox,ghcr.io/gethomepage/homepage). Bare image names likebusyboxoruser/imageare not allowed — Docker's implicitdocker.iodefault is not reliable across runtimes and Renovate cannot enforce the correct registry without it - Images are digest-pinned (
@sha256:...) — Renovate manages updates via PRs read_only: truewithtmpfsmounts for writable pathsno-new-privilegeson every container, no exceptionscap_drop: ALLon every container — this is a hard security requirement. If a container needs a specific capability, declarecap_addwith only the minimum required capability and add a comment on the container in the compose file explaining why the exception is necessary- Memory limits with env-var overrides for per-environment tuning
pids_limiton every container to prevent fork-bomb DoS- Health checks are mandatory —
dccd.shusesdocker compose up --wait - Volumes mounted
:rowherever the container only reads ./configvolumes must always be mounted:ro— config files are git-tracked and must never be modified by a container at runtime. If a service needs to write config at runtime, copy the file from./configto./datain an init container and mount the./datacopy read-write (see the gatus pattern). Any exception requires explicit approval and a comment in the compose file explaining why
Volume Permissions: Init Container Pattern¶
Named Docker volumes and bind-mounted directories are created as root:root by Docker. A container with a hardcoded non-root user: and cap_drop: ALL has no CAP_CHOWN and cannot fix this at runtime — it will fail to write on first deploy.
Rule: Any service with both of the following requires a <app>-init container:
user: "<UID>:<GID>"(explicit non-root)- At least one writable volume (named volume or bind mount)
The init container runs as root, chowns the volume paths to the service's UID:GID, and exits before the main container starts. The main service declares depends_on: <app>-init: condition: service_completed_successfully.
Bind-mount directories (./data, ./backups) that are runtime-only (gitignored) must be included in the init container's chown command, even when the main container mounts a path inside them as :ro. A host-level chown (e.g. a TrueNAS dataset permission reset) can make those directories unreadable or untraversable. The init container is the single recovery point that restores ownership on every deploy.
Git-tracked ./config directories must NEVER be chowned or chmod'd by an init container. Doing so changes file ownership away from the deploy user and causes git pull to fail with error: unable to unlink old '...': Permission denied. Config files checked out by git are already world-readable (644 files, 755 directories), so any container user can read them without ownership changes. If a service needs to write config at runtime, copy the file from ./config to ./data in the init container and mount the ./data copy into the main container (see the gatus pattern).
# Pattern — copy this block, adjust container_name, UID:GID, command paths, and volumes
# IMPORTANT: only chown ./data (runtime) paths — NEVER chown ./config (git-tracked)
<app>-init:
image: docker.io/library/busybox:1.37.0@sha256:1487d0af5f52b4ba31c7e465126ee2123fe3f2305d638e7827681e7cf6c83d5e
container_name: <app>-init
env_file:
- path: .env # Decrypted from secret.sops.env
required: false
restart: "no"
network_mode: none
mem_limit: 64m
pids_limit: 50
security_opt:
- no-new-privileges=true
cap_drop:
- ALL
cap_add:
- CHOWN # Required to chown volume paths
read_only: true
command:
- "sh"
- "-c"
- |-
chown -Rv <UID>:<GID> /data
volumes:
- ./data:/data
For services that only chown runtime-only paths (named Docker volumes, ./data/), the chmod 775/664 step and FOWNER/DAC_OVERRIDE capabilities can be omitted — only CHOWN is needed. Docker creates named volumes and ./data/ directories as root:root 755, so UID 0 is always the owner and can traverse them without DAC_OVERRIDE.
Exception — external bind-mount paths with non-root ownership: If the bind-mount source is a host directory owned by a non-root user (e.g., a TrueNAS dataset with truenas_admin:truenas_admin 770), UID 0 inside the container matches neither owner nor group and has no permissions. Busybox chown -Rv opens the directory before chowning it, which fails without DAC_OVERRIDE. Add DAC_OVERRIDE to any init container that chowns such a path.
Exceptions — images that manage their own permissions:
- s6-overlay images (LinuxServer, tiredofit/db-backup) start as root and chown their own directories during their own init phase. They do not need an external init container.
- Database images (postgres, MongoDB) initialise their own data directories. They do not need an external init container.
Services using this pattern:
| Service | Init container | Volumes chown'd |
|---|---|---|
| _bootstrap | content-init |
/mnt/archive-pool/content (full tree: mkdir + chown :3200 + setgid 2775) |
| adguard | adguard-init |
./data/work, ./data/conf |
| dozzle | dozzle-init |
./data |
| homepage | (removed) | None — config is git-tracked and read-only; no init needed |
| metube | metube-init |
./data/state |
| traefik | traefik-init |
./data/acme |
| traefik-forward-auth | traefik-forward-auth-init |
./data |
| immich | immich-init |
/mnt/archive-pool/private/photos/immich (+ DAC_OVERRIDE), ./data/model-cache |
| spottarr | spottarr-chown |
./data |
| gatus | gatus-init |
Copies ./config/config.yaml → ./data/sidecar-config/ (config mounted :ro) |
| home-assistant | home-assistant-init |
Seeds ./config/configuration.yaml → ./data/config/ on first deploy (cp -n) |
| outline | outline-init |
./data/data (chown to UID 1000 — image-internal node user) |
| hadiscover | hadiscover-init |
./data |
Exceptions — s6-overlay and root-start containers:
Some images cannot use read_only: true or user: because their init system (s6-overlay) requires a writable root filesystem and starts as root before dropping privileges internally. cap_drop: ALL is still required for these images — only the specific capabilities that s6-overlay needs are re-added via cap_add. Each such container must include a comment block in the compose file explaining the deviation. This applies to:
- LinuxServer images (e.g.,
unifi-network-application,plex) — usePUID/PGIDenvironment variables for internal privilege dropping; omituser:andread_only. Add backCHOWN,SETUID,SETGID, andSETPCAPviacap_add. - LinuxServer socket-proxy — runs as root by design to proxy the Docker socket. Does not support custom users, mods, or scripts. Omit
cap_drop: ALL;no-new-privilegesandread_onlyare still applied. - tiredofit/db-backup — uses
USER_DBBACKUP/GROUP_DBBACKUPfor internal privilege dropping; omituser:andread_only. - mvance/unbound — starts as root and drops privileges to the
_unbounduser internally; its startup script generatesunbound.confand creates subdirectories at runtime, so omituser:andread_only. - meeb/tubesync — uses its own
start.shinit script to create thePUID:PGIDuser, chown/config, and launch supervisord; omituser:andread_only:. Add backCHOWN,SETUID,SETGID, andSETPCAPviacap_add. - ghcr.io/home-assistant/home-assistant — uses s6-overlay (confirmed by
s6-rclog lines). Omituser:andread_only:. Add backCHOWN,SETUID,SETGID,SETPCAPviacap_add(standard s6-overlay set). Also addNET_RAW— required by HA's built-in DHCP watcher integration, which opens rawAF_PACKETsockets to track devices; without it HA logs[Errno 1] Operation not permittedat startup and the DHCP integration stops working. No TrueNAS service account or init container is required — s6-overlay manages/configownership internally.
Each exception is documented with a comment block in the compose file explaining why the deviation is necessary.
Minimum cap_add for LinuxServer/s6-overlay images:
| Capability | Why it is needed |
|---|---|
CHOWN |
s6-overlay chowns mounted volumes (e.g., /config) to PUID:PGID at startup |
SETUID |
s6-overlay calls setuid() to drop from root to PUID |
SETGID |
s6-overlay calls setgid() to drop from root to PGID |
SETPCAP |
s6-overlay clears the bounding capability set before exec-ing the application daemon |
All other default Docker capabilities (NET_RAW, NET_BIND_SERVICE, MKNOD, AUDIT_WRITE, SYS_CHROOT, FSETID, FOWNER, DAC_OVERRIDE, KILL) are dropped and not needed.
Pitfalls specific to s6-overlay images:
-
read_only: truesilently breaksPUID/PGID. s6-overlay writes the UID/GID entries to/etc/passwdand/etc/groupduring startup before dropping privileges. Withread_only: truethose writes fail silently and the container continues running as the image default (UID 911 for LinuxServer Plex), ignoringPUID/PGIDentirely. Always omitread_onlyfor s6-overlay images when working with subprcesses in the container such as Plex that need access to volumes. -
group_adddoes not grant supplementary groups to the application process.group_addadds GIDs to the credentials of PID 1 (s6-overlay, which runs as root). When s6-overlay drops privileges to run the application it re-initialises the process's supplementary groups from/etc/groupinside the container — where the host-only GID does not exist. The result is that the application process has no membership in the added group. To grant an s6-overlay image membership in a host GID, set that GID asPGID(primary group) or ensure the image's own group-setup mechanism adds it. The correct approach for LinuxServer images is to set the desired GID via thePGIDenv var; s6-overlay will then create the/etc/groupentry and the application process will run with that GID.
Config Template Substitution: Envsubst Init Containers¶
Some services need secrets or environment-specific values (domain names, API keys) injected into their configuration files at deploy time. Since these config files are committed to Git as templates with ${VAR} placeholders, a separate init container processes them before the main service starts.
Pattern: An <app>-init container mounts ./config as /templates:ro, runs envsubst.sh to replace ${VAR} placeholders with values from secret.sops.env, and writes the processed output to ./data/. The main container then mounts the processed file from data/ as :ro.
This keeps secrets out of Git (the template only contains placeholder names) while the processed config with real values lives in data/ which is gitignored.
Services using this pattern:
| Service | Init container | Template → Output |
|---|---|---|
| adguard (unbound) | adguard-unbound-init |
config/unbound/*.conf → data/unbound/*.conf |
| traefik-forward-auth | traefik-forward-auth-init |
config/config.yaml → data/config.yaml |
Networking: Per-Service Isolation¶
Each service gets its own frontend network (e.g., echo-server-frontend, homepage-frontend). Traefik joins each frontend network individually.
Why not a single shared traefik-public network?
Network-level isolation. With per-service networks, containers cannot communicate with each other — only with Traefik. A shared network would let any compromised container reach every other service. The trade-off is that adding a new service requires adding its network to Traefik's compose file.
Services that need Docker API access get a dedicated internal backend network with a socket proxy (e.g., homepage-backend). The same pattern applies to databases and other backing services — they sit on an internal backend network with internal: true, preventing external routing and ensuring only the application container can reach them.
Exception: arr-stack-backend¶
The arr stack (Radarr, Sonarr, Bazarr, Lidarr, Prowlarr, qBittorrent, SABnzbd, Spottarr) shares a single arr-stack-backend internal bridge network so the apps can communicate directly for API calls (e.g., Prowlarr pushing indexer results to Sonarr). This network is created by the _bootstrap service and referenced as external: true by each arr app. All internet traffic still exits through each app's dedicated VLAN 70 macvlan network — the backend bridge is internal: true and carries no internet route.
Gatus Internal Monitoring Entrypoint¶
Auth-protected services (those using chain-auth@file) redirect Gatus health checks to the OAuth login page, causing false-negative alerts. To monitor these services without bypassing network isolation, Traefik exposes a dedicated monitoring entrypoint on port 8444.
How it works:
- Each auth-protected service declares a secondary Traefik router (
<app>-monitor) that listens on themonitoringentrypoint and useschain-no-auth@fileinstead ofchain-auth@file. - Gatus sends HTTP requests to
http://172.30.100.6:8444with the appropriateHostheader. The IP is Traefik's static address ongatus-frontend(set viaipv4_addressintraefik/compose.yaml) to avoid Docker DNS falling through to the host's external resolver. - Traefik routes the request to the target container over that service's dedicated frontend network.
- Gatus endpoints also set
client.ignore-redirect: trueas defense-in-depth — if the monitoring router were misconfigured, Gatus would still detect a redirect rather than silently passing.
Security model — three independent layers:
- Unpublished port. Port 8444 does not appear in Traefik's
ports:mapping — the internet cannot reach it regardless of middleware misconfiguration. - Docker network isolation. Only containers that share a Docker network with Traefik can open a TCP connection to it. However, Traefik joins every service's frontend network, so this alone is insufficient — any container could reach
:8444. - Entrypoint-level
ipAllowList. Themonitoringentrypoint intraefik.ymlappliesmonitoring-ipallowlist@file(defined inmiddlewares.yml) which restricts source IPs to thegatus-frontendsubnet (172.30.100.0/29). This runs before any router-level middleware, so it blocks all traffic from other frontend subnets. Thegatus-frontendnetwork uses a fixed IPAM subnet ingatus/compose.yamlto make this deterministic.
All three layers must be defeated for a container on another frontend network to bypass auth via the monitoring entrypoint.
Why not a shared secret header instead of an IP allowlist? Any container with a socket proxy (Dozzle, Homepage) could read the secret from docker inspect labels, so the secret is only as strong as the weakest socket-proxy consumer. The IP allowlist is infrastructure-derived, not stored anywhere a container can read it, and cannot be brute-forced.
Traffic flow comparison:
Browser request (authenticated):
Browser → Host:443 → Traefik :443 → chain-auth → (sonarr-frontend) → Sonarr
Gatus health check (internal monitoring):
Gatus → (gatus-frontend 172.30.100.0/29) → Traefik :8444 [172.30.100.6]
→ ipAllowList ✓ → chain-no-auth → (sonarr-frontend) → Sonarr
Other container attempting to use monitoring entrypoint:
Sonarr → (sonarr-frontend 172.x.x.x) → Traefik :8444
→ ipAllowList ✗ (403 Forbidden)
Services with monitoring routers: All services monitored by Gatus have a -monitor router on the monitoring entrypoint. This includes both auth-protected services (which need it to bypass forward-auth) and no-auth services (which need it to avoid TLS/SNI issues when checking by IP address). Only the Gatus service itself is excluded (gatus.enabled=false).
Configuration locations (keep in sync when changing the subnet):
| File | What to update |
|---|---|
services/gatus/compose.yaml |
gatus-frontend network ipam.config[0].subnet |
services/traefik/compose.yaml |
Traefik gatus-frontend ipv4_address |
services/traefik/config/rules/middlewares.yml |
monitoring-ipallowlist.ipAllowList.sourceRange |
services/traefik/config/traefik.yml |
Comment documenting the subnet (for reference) |
Docker Socket Proxy¶
Services never mount /var/run/docker.sock directly. Instead, each gets its own LinuxServer socket-proxy instance with minimal permissions:
CONTAINERS=1— read container metadata onlyPOST=0— read-only, no mutations (Homepage)- Separate proxy per service to prevent lateral movement
Why one proxy per service instead of sharing?
If Traefik and Homepage shared one proxy, compromising either would grant the attacker the union of both permission sets. Separate proxies enforce least privilege per consumer.
Directory Conventions¶
Each service follows a consistent layout:
services/<service>/
compose.yaml # Service definition — committed to Git
secret.sops.env # SOPS-encrypted secrets — committed to Git
config/ # Static configuration — committed to Git
data/ # Runtime data — NOT committed to Git
backups/ # Backup output — NOT committed to Git
compose.yaml defines the service: images, networks, volumes, labels, and resource limits. It is the source of truth for how the service runs and is always committed to Git.
secret.sops.env stores secrets (API keys, passwords, tokens) encrypted with SOPS + Age. Because the values are encrypted, the file is safe to commit to Git. At deploy time, the CD script decrypts it to .env which is excluded from Git.
config/ holds files that you author and version-control: configuration files, rule sets, and any other inputs the container reads at startup. For example, Traefik's config/ contains traefik.yml and the dynamic rules under rules/. These are mounted :ro into the container because the container should only read them, never write to them.
data/ holds files that are produced or mutated by the running container: databases, certificates, caches, state files, and other dynamic output. This directory lives only on the host machine and is excluded from Git via .gitignore. It is mounted read-write so the container can persist its runtime state across restarts.
Named Docker volumes are not used in this repo. All persistent container data uses bind mounts to ./data/ (or a subdirectory of it). This ensures that TrueNAS ZFS snapshots — taken at the dataset level — capture all container state without needing to snapshot opaque Docker-managed volumes. It also makes data locations explicit and auditable from the host filesystem.
ZFS datasets do not need to be created manually for individual services. The full dataset hierarchy is established once during initial setup (see README.md § Setup). Each services/<app>/ directory lives on the vm-pool/apps dataset (or a child dataset created at setup time). TrueNAS handles snapshots and replication of these datasets automatically — no per-service backup containers are needed for file-level data (only for databases, which require consistent pg_dump / mongodump exports).
backups/ holds database backup files produced by the backup sidecar container (e.g., tiredofit/db-backup). Like data/, this directory is excluded from Git and mounted read-write. Each backup type gets its own subdirectory (e.g., backups/db-backup/).
Secret Management¶
Secrets are encrypted with SOPS + Age and stored in git as secret.sops.env. The CD script decrypts them to .env at deploy time using an Age key stored on the TrueNAS host.
PUID and PGID are hardcoded directly in each service's compose.yaml so they are visible, auditable, and self-documenting. See § UID/GID Allocation.
UID/GID Allocation¶
Every service runs under a dedicated non-root user with a unique UID. Each user has an auto-created primary group with the same GID (UID = GID). This ensures file ownership is unambiguous in ls -la and allows fine-grained access control via TrueNAS group membership.
Naming Convention¶
TrueNAS service accounts follow the pattern svc-app-<name> (e.g., svc-app-traefik). This distinguishes them from human users and makes their purpose immediately clear in ls -la output.
ID Ranges¶
| Range | Purpose |
|---|---|
| 911 | Reserved for Plex (LinuxServer image default) |
| 3100–3199 | Per-app service accounts (UID = GID) |
| 3200+ | Shared purpose groups (no matching user account) |
Each service account has a matching svc-app-<name> group created at the same GID as its UID. These groups are GID reservations only — they exist to prevent TrueNAS from assigning the GID to an unrelated group in the future. The app's functional primary group is typically a shared purpose group (e.g., media at GID 3200), not the svc-app-* placeholder. There is no need to add truenas_admin or other users to the svc-app-* groups.
App Service Accounts¶
| UID/GID | TrueNAS user | Service(s) | Git-tracked config? |
|---|---|---|---|
| 3100 | svc-app-traefik |
traefik, traefik-init | Yes (./config) |
| 3101 | svc-app-adguard |
adguard, adguard-init, adguard-unbound-init | No (./data/conf) |
| 3102 | svc-app-homepage |
homepage, homepage-init | Yes (./config) |
| 3103 | svc-app-gatus |
gatus, gatus-db-backup | No |
| 3104 | svc-app-echo |
echo-server | No |
| 3105 | svc-app-tfa |
traefik-forward-auth, init | No (./data) |
| 3106 | svc-app-immich |
immich-server, immich-ml, immich-init | No |
| 3107 | svc-app-metube |
metube, metube-init | No |
| 3108 | svc-app-unifi |
unifi, unifi-db-backup | No |
| 3109 | svc-app-dozzle |
dozzle, dozzle-init | No |
| 3110 | svc-app-radarr |
radarr | No |
| 3118 | svc-app-tubesync |
tubesync | No |
| 3119 | svc-app-drawio |
drawio | No |
| 3120 | svc-app-outline |
outline-db-backup† | No |
| 3121 | svc-app-hadiscover |
hadiscover-api, hadiscover-init | No |
† The outlinewiki/outline image does not support PUID/PGID — it runs as the
image-internal node user (UID/GID 1000). UID 3120 is used only for the
db-backup sidecar. The Outline server itself runs without a user: directive;
an outline-init container pre-chowns ./data/data to UID 1000 so the node
process can write to the bind-mount path. See:
https://github.com/outline/outline/discussions/9452
Shared Purpose Groups¶
These groups have no matching user account. They grant cross-service access to shared datasets.
| GID | Group | Purpose | Used as primary group by |
|---|---|---|---|
| 3200 | media |
Read/write access to media datasets | Plex (UID 911), MeTube (UID 3107), Radarr (UID 3110), Bazarr (UID 3111), Lidarr (UID 3112), qBittorrent (UID 3114), SABnzbd (UID 3115), Sonarr (UID 3116), TubeSync (UID 3118) |
| 3202 | private-photos |
Access to private photos (Immich upload dir) | Immich (UID 3106) |
| 3203 | private-documents |
Access to private documents (reserved) | — |
Plex Exception¶
Plex stays at UID 911 (LinuxServer image default) with PGID 3200 (media). The s6-overlay init system manages permissions internally. UID 911 is reserved exclusively for Plex — no other service may use it. For naming consistency, create a svc-app-plex user on TrueNAS with UID 911 and primary group media (GID 3200).
TrueNAS Host Setup¶
Important: When creating service accounts in TrueNAS, always create the group first, then the user. If you rely on TrueNAS's "auto-create primary group" checkbox when creating a user, TrueNAS assigns the earliest available GID — which may not match the desired UID. By pre-creating the group with the correct GID, the auto-created primary group step is skipped and UID = GID is guaranteed.
Creation order for each app service account:
- Create group
svc-app-<name>with GID matching the UID (e.g., GID 3100) — this is a GID reservation to prevent conflicts - Create user
svc-app-<name>with UID matching the GID (e.g., UID 3100), primary group set to the app's functional group (e.g.,mediafor media apps, or thesvc-app-*placeholder for apps that don't need shared access) - For apps with git-tracked config (
./config): addtruenas_adminto the app's functional primary group — this grants group-write access to chown'd config files, allowinggit pullwithout permission conflicts
For shared purpose groups (media, private-photos, private-documents):
- Create the groups with the designated GIDs (3200, 3202, 3203).
- Configure the relevant service accounts' group memberships:
svc-app-plex(911): primary groupmedia(3200)svc-app-metube(3107): primary groupmedia(3200)svc-app-immich(3106): primary groupprivate-photos(3202)- Add
truenas_adminas an auxiliary group member of each group if admin access to those datasets is needed
apps Dataset ACLs¶
The git repo lives on the vm-pool/apps dataset. Because dccd.sh decrypts secret.sops.env → .env files into this tree, access must be restricted to prevent other users from reading secrets.
Owner: truenas_admin — allows git pull without sudo. Root does not need ownership because it bypasses all permission checks on Linux/ZFS.
Owning group: truenas_admin.
Configure the following Unix permissions on the vm-pool/apps dataset using the TrueNAS Unix Permissions Editor:
| Setting | Value |
|---|---|
| User | truenas_admin |
| Group | truenas_admin |
| User | Read ✓ Write ✓ Execute ✓ |
| Group | Read ✓ Write ✓ Execute ✓ |
| Other | No permissions |
Enable both Apply permissions recursively and Apply permissions to child datasets. Child datasets are created as root:root regardless of the parent's permissions, so this must be done after all child datasets exist.
This gives truenas_admin full access while blocking all other users from reading decrypted .env files containing secrets. Root does not need explicit permissions — it bypasses all permission checks.
Per-app config directories are handled separately by init containers, not by dataset-level permissions:
- Init containers chown
./configsubdirectories to the app's UID:GID with group-write (775/664) truenas_admin(a member of each app's primary group) gets group-write access via POSIX group permissions- Next deploy, the init container re-chowns everything (idempotent)
Shared Environment Files¶
Reusable env files live in services/shared/env/ and are referenced via relative paths in env_file blocks. They are committed to Git because they contain no secrets.
| File | Purpose | When to include |
|---|---|---|
tz.env |
Sets TZ=Europe/Amsterdam |
Every container |
UID and GID values are not stored in shared env files or in secret.sops.env. They are hardcoded directly in each service's compose.yaml (in the user: directive and init container commands) so they are visible, auditable, and not treated as secrets. See § UID/GID Allocation for the full allocation table.
Media Access¶
Troubleshooting: If a container cannot read or write media files, see TROUBLESHOOTING.md § Permissions.
All services that interact with media datasets share a single media group (GID 3200). Every media-touching service account on TrueNAS has media as its primary group. Unix permissions replace NFSv4 ACLs on these datasets.
Why not separate reader/writer groups? Consumer services (e.g., Plex) are already restricted to read-only at the kernel level via :ro Docker volume mounts — a filesystem-level write restriction would only be a secondary layer for a modest risk. The same media group for all services keeps the model simple, debuggable with plain ls -la, and trivially extensible to SMB (add a user to the group, done).
Each media service (e.g., MeTube) runs under its own dedicated UID so file ownership is auditable — ls -la shows which service wrote a file.
Dataset Layout¶
All media and download data lives under a single ZFS dataset archive-pool/content, mounted at /mnt/archive-pool/content/. No child datasets are created beneath it — everything is plain directories.
Why one dataset? Hardlinks only work within the same filesystem. When an arr app (Radarr, Sonarr) imports a finished download, it can create a hardlink from downloads/ to media/ instead of copying the file — but only if both paths are on the same ZFS dataset. Child datasets would act as separate filesystems and break this.
/mnt/archive-pool/content/
├── downloads/ ← download clients (arr stack)
│ ├── isos/
│ ├── torrents/ ← torrent client (qBittorrent, Deluge, etc.)
│ │ ├── movies/
│ │ ├── music/
│ │ └── tv/
│ └── usenet/ ← Usenet client (SABnzbd, NZBGet, etc.)
│ ├── incomplete/
│ └── complete/
│ ├── movies/
│ ├── music/
│ └── tv/
└── media/ ← final library; Plex reads this
├── audiobooks/
├── movies/
├── music/
├── study/
├── tv/
└── youtube/
└── metube/ ← MeTube writes here
All folder names are lowercase — Linux is case-sensitive and lowercase avoids ambiguity.
TrueNAS Scale Setup¶
On the TrueNAS host, create or confirm:
- A
mediagroup (GID 3200) — for all media-touching services - A
svc-app-plexuser (UID 911) with primary groupmedia(GID 3200) - UID 911 is fixed by the LinuxServer image; it cannot be changed via
PUID - UID 911 is reserved exclusively for Plex. No other service may use this UID unless strictly necessary, and any exception must be documented with a comment in the relevant compose file.
- A dedicated user per media service (e.g.,
svc-app-metubeat UID 3107) - Primary group:
media(GID 3200) - Use a distinct UID per tool so file ownership is unambiguous in
ls -la - To add a new media service: create its user with primary group
media— no dataset permission changes needed - Add
truenas_adminas an auxiliary group member ofmediafor admin access
Create the archive-pool/content dataset via the TrueNAS GUI as a Dataset (not a zvol) with the following settings:
| Setting | Value | Why |
|---|---|---|
| ACL Type | Off | Plain Unix permissions; NFSv4 adds complexity with no benefit |
| ACL Mode | Discard | Ensures chmod works cleanly without ACL interference |
| Compression | zstd |
Free compression on metadata and small files; video/audio files are already compressed |
| Enable Atime | Off | Prevents a write on every read; useless for media workloads |
| Exec | Off | No binaries run from this path; init containers use their own image layer, not this mount |
Do not create child datasets beneath it — everything under content/ must be plain directories on the same filesystem for hardlinks and atomic moves to work.
The content-init container (in the _bootstrap service) creates the full directory tree, sets group ownership to media (GID 3200), and applies the setgid bit (2775) on all directories on every deploy. The _bootstrap service deploys first because its directory name sorts before all other services alphabetically. No manual shell setup is needed after the dataset exists.
The setgid bit (2775) on every directory causes new files and subdirectories to inherit the media group automatically. UMASK=002 in writing services ensures new files are created as 664 (group-readable).
Container Configuration¶
All media-touching services hardcode GID 3200 (media). Consumer services mount paths :ro:
environment:
- PUID=911
- PGID=3200 # media group — all media-touching services use this GID
volumes:
- /mnt/archive-pool/content/media/movies:/media/movies:ro
Media-writing services omit :ro and set UMASK=002 so created files are group-readable (664) and directories group-traversable (775):
user: "3107:3200" # svc-app-metube:media
environment:
- UMASK=002
volumes:
- /mnt/archive-pool/content/media/youtube/metube:/downloads # read-write; no :ro
Future arr apps (Radarr, Sonarr) must mount the entire /mnt/archive-pool/content/ root so that downloads/ and media/ are on the same filesystem inside the container — this is what enables hardlinks and atomic moves:
Plex exception: The LinuxServer Plex image starts as root and drops to
PUID:PGIDvia s6-overlay —read_only: truebreaks this silently, so it is omitted.user:is also omitted for the same reason. Despite this, Plex ends up running as 911:3200 matching the dataset group ownership. UID 911 is reserved for Plex — no other service may use it unless strictly necessary, and any exception must be documented with a comment in the relevant compose file.
Service Summary¶
| Service | UID | Primary group | Media mount | UMASK |
|---|---|---|---|---|
| Plex | 911 (image-fixed) | 3200 (media) |
:ro |
— |
| MeTube | 3107 | 3200 (media) |
read-write | 002 |
| Radarr | 3110 | 3200 (media) |
read-write | 002 |
| Bazarr | 3111 | 3200 (media) |
read-write | 002 |
| Lidarr | 3112 | 3200 (media) |
read-write | 002 |
| qBittorrent | 3114 | 3200 (media) |
read-write | 002 |
| SABnzbd | 3115 | 3200 (media) |
read-write | 002 |
| Sonarr | 3116 | 3200 (media) |
read-write | 002 |
| TubeSync | 3118 | 3200 (media) |
read-write | — |
Private Storage: Access Model¶
Private data (photos, documents) is intentionally separated from the shared media group hierarchy. Each category of private data gets its own dedicated group, ensuring services can only access the specific subdirectory they need — Immich cannot read a future documents directory, and a future documents service cannot read photos.
Isolation Model¶
Access isolation is enforced at two layers:
-
Parent dataset (
/mnt/archive-pool/private): Owned bytruenas_admin:truenas_adminwith Unix permissions 770 (no access for others). Same model as theappsdataset. This prevents any service account from traversing the parent path unless Docker mounts it directly — and Docker bind-mounts are resolved by the root daemon, so the container does not need host-path traversal rights. -
Subdirectory ownership via init containers: Each service's init container chowns its specific subdirectory to the service's UID:GID. Because the parent dataset is root-inaccessible to service accounts, a service that doesn't have its path bind-mounted cannot reach sibling directories even if it somehow escapes its container.
Per-Category Group Allocation¶
Each private data category has its own group. Services only receive the group for their specific category:
| GID | Group | Subdirectory | Service |
|---|---|---|---|
| 3202 | private-photos |
/mnt/archive-pool/private/photos/immich |
Immich (UID 3106) |
| 3203 | private-documents |
/mnt/archive-pool/private/documents/... |
Reserved |
truenas_admin is added as an auxiliary group member of each group, granting admin access to each category's subdirectory after the init container sets ownership.
TrueNAS Host Setup¶
On the TrueNAS host, create or confirm:
- A
private-photosgroup (GID 3202), withtruenas_adminas an auxiliary member - A
svc-app-immichuser (UID 3106) with primary groupprivate-photos(GID 3202)
On the parent private dataset (/mnt/archive-pool/private), using the TrueNAS Unix Permissions Editor:
| Setting | Value |
|---|---|
| User | truenas_admin |
| Group | truenas_admin |
| User | Read ✓ Write ✓ Execute ✓ |
| Group | Read ✓ Write ✓ Execute ✓ |
| Other | No permissions |
No NFSv4 ACLs are needed on the parent dataset. Subdirectory permissions are managed entirely by init containers.
The init container chowns the service-specific subdirectory to the service UID:GID on every deploy. This is the single recovery point that restores access after any host-level permission reset.
Container Configuration¶
Private-data containers hardcode the category-specific GID in user: directives and the init container:
Adding a New Private-Data Service¶
- Allocate the next GID from the
private-documentsrow (3203+) in the Shared Purpose Groups table - Create the group on TrueNAS with that GID, add
truenas_adminas auxiliary member - Create the service account user with its UID and the new group as primary
- Add an init container that chowns the service's specific subdirectory under
/mnt/archive-pool/private/ - Bind-mount only that subdirectory into the container — never the parent
private/path
Multi-Server Deployment¶
This repository supports deploying apps to multiple servers beyond the primary TrueNAS host. Server-app mappings are defined in servers.yaml at the repo root.
servers.yaml¶
The servers.yaml file maps servers to the apps they should deploy. Schema is validated by servers.schema.json.
servers:
svlazext:
description: "Azure VM — DNS (AdGuard + Unbound) and VPN"
age_public_key: "age1..."
apps:
- adguard
svlazextpub:
description: "Azure VM — Public web server"
age_public_key: "age1..."
apps:
- traefik
TrueNAS (svlnas) uses TrueNAS mode (-t) which has its own app discovery, but is listed in servers.yaml for SOPS key scoping.
Deploying to a Server¶
Use the -S <server> flag with dccd.sh:
# Deploy only apps assigned to svlazext
bash scripts/dccd.sh -d /opt/apps -S svlazext -k /opt/apps/age.key -x shared -f
# Cron job example (runs every 5 minutes)
*/5 * * * * bash /opt/apps/scripts/dccd.sh -d /opt/apps -S svlazext -k /opt/apps/age.key -x shared
The -S flag:
- Reads
servers.yamland resolves the app list for the named server - Only decrypts
secret.sops.envfiles for those apps (not all apps) - Only deploys compose stacks for those apps
- Auto-detects server-specific compose overrides (
compose.<server>.yaml) - Is mutually exclusive with
-a(single app filter) and-t(TrueNAS mode) - Requires
yqonPATH
Compose Overrides¶
Some apps (notably Traefik) need different configurations per server. Server-specific compose override files use the naming convention:
When dccd.sh -S <server> detects a matching override file, it automatically applies it using Docker Compose's multi-file syntax (-f compose.yaml -f compose.<server>.yaml). Docker Compose's list-replacement semantics mean the override cleanly replaces sections like the network list.
Example: Traefik on svlnas joins 25+ app frontend networks, but Traefik on svlazext only needs adguard-frontend. The override at services/traefik/compose.svlazext.yaml replaces the network list and adjusts labels.
Shared config (traefik.yml, rules/, TLS options) is reused via the same volume mounts — no config duplication.
Per-Server Age Keys¶
Each server has its own Age keypair for SOPS decryption. The .sops.yaml creation_rules scope which servers can decrypt which app secrets:
creation_rules:
# adguard runs on svlnas + svlazext
- path_regex: services/adguard/secret\.sops\.env$
age: "deploy_key,svlnas_key,svlazext_key"
# traefik runs on svlnas + svlazextpub
- path_regex: services/traefik/secret\.sops\.env$
age: "deploy_key,svlnas_key,svlazextpub_key"
# fallback: new apps default to deploy + svlnas
- path_regex: secret\.sops\.env$
age: "deploy_key,svlnas_key"
Key roles:
- Deploy key: Lives on your dev machine. Can decrypt everything. Used for
sops -e/sops -dduring development. - Server keys: Each server stores only its own private key at
age.key. It can only decrypt secrets for apps assigned to it.
Generate rules from servers.yaml using:
The script reads the deploy key from age.key (the # public key: comment) and all server keys from servers.yaml. Servers without an apps list are treated as all-access.
After updating rules, re-encrypt all files: sops updatekeys services/<app>/secret.sops.env for each app.
Ansible Integration¶
Each remote server (Azure VMs) is managed by Ansible-pull which:
- Clones this repository to the configured directory (e.g.,
/opt/apps) - Installs
yq(required for server mode) - Places the server's Age private key at
<base_dir>/age.key - Sets up a cron job running
dccd.sh -S <server>at the desired interval
Retiring an App¶
Retirement is the inverse of adding an app. Use the prompt at .github/prompts/retire-docker-app.prompt.md for the full checklist — it covers removing compose files, Traefik networks/middleware, DNS records, documentation entries, and post-merge host cleanup.
Key mechanisms:
dccd-down <app>(ordccd.sh -R <app>): Server-aware teardown that applies compose overrides and uses the correct project name for each deployment mode.- Auto-cleanup: When
dccd.shpulls new commits that remove a service directory, it automatically detects the orphaned compose project and tears it down — no manual intervention needed on any server. - Retired services log: Add an entry to
docs/RETIRED-SERVICES.mdwith the retirement date, reason, and the last active commit hash so the old configuration is easy to find.