Skip to content
English

High availability (Pacemaker, DRBD & a floating IP)

This guide describes a reference architecture for running grommunio in a highly available (HA) two-node active/standby cluster with a third witness node. Storage is replicated at the block level with DRBD, the cluster is managed by Pacemaker/Corosync, and clients reach the active node through a floating virtual IP (VIP). On failover the whole stack — storage, IP and services — moves to the surviving node together.

  • Two data nodes (node1, node2) replicate a block device to each other with DRBD. At any time one is primary (active), the other secondary.
  • The active node promotes DRBD, mounts the replicated volume as XFS at /grodata, bind-mounts the service data directories from /grodata/* onto the standard /var/lib/* paths, takes the floating VIP, and starts all grommunio/Gromox services.
  • A witness node (node3) participates only in quorum voting. It holds no data and runs no grommunio services.
  • Clients resolve grommunio.example.com via DNS to the VIP, which always lives on the active node.
Highly available grommunio cluster: clients reach a floating VIP on the active node, which promotes DRBD, mounts /grodata, bind-mounts data directories and runs the service stack; a second node holds the DRBD secondary and a third node provides quorum.Highly available grommunio cluster: clients reach a floating VIP on the active node, which promotes DRBD, mounts /grodata, bind-mounts data directories and runs the service stack; a second node holds the DRBD secondary and a third node provides quorum.
Highly available grommunio cluster: clients reach a floating VIP on the active node, which promotes DRBD, mounts /grodata, bind-mounts data directories and runs the service stack; a second node holds the DRBD secondary and a third node provides quorum.
Role Example host Purpose
Data node 1 node1 DRBD peer; eligible to run the active stack
Data node 2 node2 DRBD peer; eligible to run the active stack
Witness / quorum node3 Quorum vote only — no data, no services
Floating VIP 10.0.0.10/24 The service address clients connect to
  • Three nodes on the same OS and grommunio package versions (two data nodes plus one witness).
  • A dedicated block device on each data node for DRBD (the same size on both).
  • A spare IP address for the VIP on the cluster network interface (eth0 in the examples — use your actual NIC name, e.g. ens192).
  • Reliable name resolution (/etc/hosts entries for all nodes), time synchronization (chrony), and SSH connectivity between nodes.
  • Cluster packages installed on all nodes: pacemaker, corosync, a CRM shell (crmsh), drbd-utils and resource-agents.

Corosync (/etc/corosync/corosync.conf) defines the cluster name and the three member nodes; Pacemaker manages the resources on top.

A few cluster-wide properties matter for this design:

  • Quorum with three nodes tolerates the loss of any one node (including the witness) without losing quorum.
  • The witness is kept service-free with location constraints that score the VIP, the /grodata mount and the service group at -inf on node3.

A single DRBD resource (grodata, device /dev/drbd0) replicates the backing disk between the two data nodes. After the initial full synchronization, DRBD is handed to Pacemaker, which manages it as a promotable clone (clone-max=2, promoted-max=1) so exactly one node is primary at a time.

Terminal window
# Inspect replication state before and after any change
drbdadm status
cat /proc/drbd

On the active node /dev/drbd0 is XFS-mounted at /grodata. Each service's data directory is then bind-mounted from /grodata onto its standard /var/lib/* path, so the persistent data follows the DRBD volume on failover.

Source under /grodata Bind-mount target Purpose
/grodata/mysql /var/lib/mysql MariaDB data directory
/grodata/redis /var/lib/redis Redis persistence
/grodata/gromox /var/lib/gromox Gromox mail store / store data
/grodata/grommunio-web /var/lib/grommunio-web Web data, sessions, index
/grodata/grommunio-antispam /var/lib/grommunio-antispam Antispam data
/grodata/grommunio-dav /var/lib/grommunio-dav DAV data
/grodata/grommunio-admin-api /var/lib/grommunio-admin-api Admin-API data

In the cluster the bind-mounts are modeled as Filesystem primitives with fstype=none and options=bind, collected in a group (grodata_binds) so they all follow the DRBD mount.

Resource Agent Role
groCluster ocf:heartbeat:IPaddr2 The floating VIP
ms_grodata ocf:linbit:drbd (promotable) DRBD primary/secondary
grodata_mount ocf:heartbeat:Filesystem XFS mount of /dev/drbd0 at /grodata
grodata_binds group of Filesystem (bind) The seven bind-mounts
grommunio_svc group of systemd:* The ordered grommunio/Gromox service stack

The service group starts (and stops, in reverse) in a fixed order so dependencies come up first:

# Resource Unit
1 mariadb systemd:mariadb
2 redis-grommunio systemd:[email protected]
3 php-fpm systemd:php-fpm
4 gromox-http systemd:gromox-http
5 gromox-midb systemd:gromox-midb
6 gromox-zcore systemd:gromox-zcore
7 gromox-event systemd:gromox-event
8 gromox-timer systemd:gromox-timer
9 gromox-imap systemd:gromox-imap
10 gromox-pop3 systemd:gromox-pop3
11 gromox-delivery-queue systemd:gromox-delivery-queue
12 gromox-delivery systemd:gromox-delivery
13 grommunio-antispam systemd:grommunio-antispam
14 grommunio-admin-api systemd:grommunio-admin-api
15 nginx systemd:nginx

The constraints enforce one logical chain — promote DRBD → mount /grodata → provide the bind-mounts → bring up the VIP → start the services — and keep everything colocated on the DRBD-primary node:

Constraint Effect
drbd_before_grodata ms_grodata must be promoted before grodata_mount starts
grodata_before_binds /grodata mounts before the bind-mounts
binds_before_grommunio_svc Bind-mounts start before the services
grodata_before_ip / ip_before_grommunio_svc The VIP is tied to the /grodata stack and starts before grommunio_svc
grodata_on_drbd (colocation) /grodata runs on the DRBD-promoted node
grommunio_svc_on_* (colocation) Services run together with the VIP, /grodata and the bind-mounts
no_*_on_witness (location, -inf) The VIP, mount and services are excluded from the witness

The full configuration is in the example below.

In this reference, Postfix runs under systemd and is not a cluster resource — it is configured separately on each node. Postfix integrates with grommunio via MySQL lookup maps and a milter:

Parameter Example value / path
myhostname grommunio.example.com
virtual_mailbox_domains mysql:/etc/postfix/grommunio-virtual-mailbox-domains.cf
virtual_mailbox_maps mysql:/etc/postfix/grommunio-virtual-mailbox-maps.cf
virtual_alias_maps mysql:/etc/postfix/grommunio-virtual-mailbox-alias-maps.cf
recipient_bcc_maps mysql:/etc/postfix/grommunio-bcc-forwards.cf
virtual_transport smtp:[::1]:24
smtpd_milters inet:localhost:11332 (when grommunio-antispam is active)

You can run Postfix in any of three supported ways; choose per your operating model:

  • systemd-only on each node (as above). A standby node can still send local system mail even though it holds no active Postfix role.
  • As a Pacemaker resource added to the HA group, so it fails over with the rest of the stack.
  • As a clone, running on both nodes simultaneously.

After any change, reconcile the documentation with postconf -n from the live host.

Terminal window
# Cluster
crm_mon -1
crm status
crm configure show
# Floating IP
ip a | grep 10.0.0.10
crm resource status groCluster
# DRBD and the mount
drbdadm status
cat /proc/drbd
mount | grep /grodata
df -h /grodata
# Services and logs
systemctl --failed
journalctl -u corosync -u pacemaker
journalctl -fu gromox-http
journalctl -fu grommunio-admin-api
Terminal window
# Clear failure state (all resources, or a single one)
crm resource cleanup
crm resource cleanup <RESOURCE>
# Restart an individual service
crm resource restart gromox-http
crm resource restart grommunio-admin-api
crm resource restart nginx
Terminal window
crm node standby node1 # take a node out of resource placement
crm node online node1 # make it eligible again
crm node online node2 # keep the secondary ready for failover
  1. Check cluster health: crm_mon -1 — quorum present, no failed resources.
  2. Ensure the target node is online and not in standby (crm node online node2).
  3. Verify the DRBD sync state: drbdadm status and cat /proc/drbd.
  4. Put the active node into standby, or move the resources to the target node in a controlled way.
  5. Confirm on the target node: DRBD promoted, /grodata mounted, bind-mounts active, VIP up, services started.
  6. Validate application-side: web login, Admin API, IMAP/SMTP, mail queue.
Terminal window
postconf -n
systemctl status postfix
journalctl -fu postfix
mailq
postqueue -f
postsuper -d ALL # flush the queue — only with operational sign-off

Configuration to back up: /etc/corosync/*; the Pacemaker CIB (crm configure show > cib.txt, optionally cibadmin --query > cib.xml); /etc/drbd.d/*; /etc/fstab and the /grodata mount layout; /etc/gromox/*; /etc/grommunio-common/* (including TLS material); /etc/grommunio-admin-api/*; /etc/nginx/*; PHP/php-fpm configuration; /etc/postfix/*; and host/network files (/etc/hosts, /etc/hostname, NetworkManager connections, sshd_config).

Data to back up: consistent MariaDB dumps and/or physical backups of /grodata/mysql; a file-level backup of /grodata/gromox; the remaining /grodata/* subdirectories (web, redis, dav, admin-api, antispam); TLS certificates and private keys (with separate access control); and the secrets from your password/secret manager.

Restore principle: provision a node with an identical OS/package base, restore (or re-initialize) the DRBD configuration and backing disk, replay the configuration from backup, restore the data under /grodata and verify the bind-mounts, start MariaDB consistently and check the grommunio/Postfix maps, then bring the cluster resources up in a controlled order and run acceptance tests.

  • Configure STONITH/fencing (see the caution above) before production use.
  • Keep TLS private keys (/etc/grommunio-common/ssl/server.key) and the certificate bundle with restrictive file permissions.
  • Hold secrets (database credentials, etc.) in a secret manager — never in the documentation or a repository.
  • Harden SSH and restrict cluster/replication traffic to a trusted network.

A genericized crm configure show for the architecture above. Adapt node names, the VIP, the NIC and your DRBD/fencing specifics; this is an example to work from, not a drop-in.

Terminal window
node 1: node1 attributes standby=off
node 2: node2 attributes standby=off
node 3: node3 attributes standby=off
primitive groCluster IPaddr2 \
params ip=10.0.0.10 cidr_netmask=24 nic=eth0 \
op monitor interval=15s
primitive grodata ocf:linbit:drbd \
params drbd_resource=grodata \
op monitor interval=15s role=Promoted \
op monitor interval=30s role=Unpromoted
primitive grodata_mount Filesystem \
params device="/dev/drbd0" directory="/grodata" fstype=xfs \
op monitor interval=20s
# One bind-mount primitive per service data directory (fstype=none, options=bind)
primitive grodata_mount_bind_mysql Filesystem \
params device="/grodata/mysql" directory="/var/lib/mysql" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_redis Filesystem \
params device="/grodata/redis" directory="/var/lib/redis" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_gromox Filesystem \
params device="/grodata/gromox" directory="/var/lib/gromox" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_grommunio_web Filesystem \
params device="/grodata/grommunio-web" directory="/var/lib/grommunio-web" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_grommunio_antispam Filesystem \
params device="/grodata/grommunio-antispam" directory="/var/lib/grommunio-antispam" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_grommunio_dav Filesystem \
params device="/grodata/grommunio-dav" directory="/var/lib/grommunio-dav" fstype=none options=bind \
op monitor interval=20s timeout=40s
primitive grodata_mount_bind_grommunio_admin_api Filesystem \
params device="/grodata/grommunio-admin-api" directory="/var/lib/grommunio-admin-api" fstype=none options=bind \
op monitor interval=20s timeout=40s
# Service primitives (one per unit; same op timeouts) — see the start-order table
primitive mariadb systemd:mariadb \
op monitor interval=30s timeout=30s \
op start interval=0s timeout=60s \
op stop interval=0s timeout=60s
# … redis-grommunio, php-fpm, gromox-http, gromox-midb, gromox-zcore, gromox-event,
# gromox-timer, gromox-imap, gromox-pop3, gromox-delivery-queue, gromox-delivery,
# grommunio-antispam, grommunio-admin-api, nginx (identical pattern)
group grodata_binds \
grodata_mount_bind_mysql grodata_mount_bind_redis grodata_mount_bind_gromox \
grodata_mount_bind_grommunio_web grodata_mount_bind_grommunio_antispam \
grodata_mount_bind_grommunio_dav grodata_mount_bind_grommunio_admin_api
group grommunio_svc \
mariadb redis-grommunio php-fpm gromox-http gromox-midb gromox-zcore \
gromox-event gromox-timer gromox-imap gromox-pop3 gromox-delivery-queue \
gromox-delivery grommunio-antispam grommunio-admin-api nginx
clone ms_grodata grodata \
meta promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 \
notify=true promotable=true interleave=true
# Ordering: promote DRBD → mount → binds → VIP → services
order drbd_before_grodata Mandatory: ms_grodata:promote grodata_mount:start
order grodata_before_binds Mandatory: grodata_mount:start grodata_binds:start
order grodata_before_ip Mandatory: grodata_mount groCluster
order binds_before_grommunio_svc Mandatory: grodata_binds:start grommunio_svc:start
order ip_before_grommunio_svc Mandatory: groCluster:start grommunio_svc:start
# Colocation: keep the whole stack on the DRBD-primary node
colocation grodata_on_drbd inf: grodata_mount ms_grodata:Promoted
colocation grodata_binds_on_grodata inf: grodata_binds grodata_mount
colocation clusterip_on_grodata inf: groCluster grodata_mount
colocation grommunio_svc_on_grodata inf: grommunio_svc grodata_mount
colocation grommunio_svc_on_binds inf: grommunio_svc grodata_binds
colocation grommunio_svc_on_ip inf: grommunio_svc groCluster
# Keep data and services off the witness node
location no_grodata_on_witness groCluster -inf: node3
location no_mount_on_witness grodata_mount -inf: node3
location no_grommunio_on_witness grommunio_svc -inf: node3
property cib-bootstrap-options: \
have-watchdog=false \
cluster-infrastructure=corosync \
cluster-name=grommuniocluster \
stonith-enabled=false
rsc_defaults build-resource-defaults: \
resource-stickiness=1