High availability (Pacemaker, DRBD & a floating IP)
This guide describes a reference architecture for running grommunio in a highly available (HA) two-node active/standby cluster with a third witness node. Storage is replicated at the block level with DRBD, the cluster is managed by Pacemaker/Corosync, and clients reach the active node through a floating virtual IP (VIP). On failover the whole stack — storage, IP and services — moves to the surviving node together.
Architecture overview
Section titled “Architecture overview”- Two data nodes (
node1,node2) replicate a block device to each other with DRBD. At any time one is primary (active), the other secondary. - The active node promotes DRBD, mounts the replicated volume as XFS at
/grodata, bind-mounts the service data directories from/grodata/*onto the standard/var/lib/*paths, takes the floating VIP, and starts all grommunio/Gromox services. - A witness node (
node3) participates only in quorum voting. It holds no data and runs no grommunio services. - Clients resolve
grommunio.example.comvia DNS to the VIP, which always lives on the active node.
Nodes and roles
Section titled “Nodes and roles”| Role | Example host | Purpose |
|---|---|---|
| Data node 1 | node1 |
DRBD peer; eligible to run the active stack |
| Data node 2 | node2 |
DRBD peer; eligible to run the active stack |
| Witness / quorum | node3 |
Quorum vote only — no data, no services |
| Floating VIP | 10.0.0.10/24 |
The service address clients connect to |
Prerequisites
Section titled “Prerequisites”- Three nodes on the same OS and grommunio package versions (two data nodes plus one witness).
- A dedicated block device on each data node for DRBD (the same size on both).
- A spare IP address for the VIP on the cluster network interface
(
eth0in the examples — use your actual NIC name, e.g.ens192). - Reliable name resolution (
/etc/hostsentries for all nodes), time synchronization (chrony), and SSH connectivity between nodes. - Cluster packages installed on all nodes:
pacemaker,corosync, a CRM shell (crmsh),drbd-utilsandresource-agents.
Cluster stack: Corosync & Pacemaker
Section titled “Cluster stack: Corosync & Pacemaker”Corosync (/etc/corosync/corosync.conf) defines the cluster name and the three
member nodes; Pacemaker manages the resources on top.
A few cluster-wide properties matter for this design:
- Quorum with three nodes tolerates the loss of any one node (including the witness) without losing quorum.
- The witness is kept service-free with location constraints that score the
VIP, the
/grodatamount and the service group at-infonnode3.
Storage: DRBD and /grodata
Section titled “Storage: DRBD and /grodata”DRBD resource
Section titled “DRBD resource”A single DRBD resource (grodata, device /dev/drbd0) replicates the backing
disk between the two data nodes. After the initial full synchronization, DRBD is
handed to Pacemaker, which manages it as a promotable clone (clone-max=2,
promoted-max=1) so exactly one node is primary at a time.
# Inspect replication state before and after any changedrbdadm statuscat /proc/drbdMount and bind-mount concept
Section titled “Mount and bind-mount concept”On the active node /dev/drbd0 is XFS-mounted at /grodata. Each service's data
directory is then bind-mounted from /grodata onto its standard
/var/lib/* path, so the persistent data follows the DRBD volume on failover.
Source under /grodata |
Bind-mount target | Purpose |
|---|---|---|
/grodata/mysql |
/var/lib/mysql |
MariaDB data directory |
/grodata/redis |
/var/lib/redis |
Redis persistence |
/grodata/gromox |
/var/lib/gromox |
Gromox mail store / store data |
/grodata/grommunio-web |
/var/lib/grommunio-web |
Web data, sessions, index |
/grodata/grommunio-antispam |
/var/lib/grommunio-antispam |
Antispam data |
/grodata/grommunio-dav |
/var/lib/grommunio-dav |
DAV data |
/grodata/grommunio-admin-api |
/var/lib/grommunio-admin-api |
Admin-API data |
In the cluster the bind-mounts are modeled as Filesystem primitives with
fstype=none and options=bind, collected in a group (grodata_binds) so they
all follow the DRBD mount.
Pacemaker resources
Section titled “Pacemaker resources”Resource summary
Section titled “Resource summary”| Resource | Agent | Role |
|---|---|---|
groCluster |
ocf:heartbeat:IPaddr2 |
The floating VIP |
ms_grodata |
ocf:linbit:drbd (promotable) |
DRBD primary/secondary |
grodata_mount |
ocf:heartbeat:Filesystem |
XFS mount of /dev/drbd0 at /grodata |
grodata_binds |
group of Filesystem (bind) |
The seven bind-mounts |
grommunio_svc |
group of systemd:* |
The ordered grommunio/Gromox service stack |
Service start order (grommunio_svc)
Section titled “Service start order (grommunio_svc)”The service group starts (and stops, in reverse) in a fixed order so dependencies come up first:
| # | Resource | Unit |
|---|---|---|
| 1 | mariadb |
systemd:mariadb |
| 2 | redis-grommunio |
systemd:[email protected] |
| 3 | php-fpm |
systemd:php-fpm |
| 4 | gromox-http |
systemd:gromox-http |
| 5 | gromox-midb |
systemd:gromox-midb |
| 6 | gromox-zcore |
systemd:gromox-zcore |
| 7 | gromox-event |
systemd:gromox-event |
| 8 | gromox-timer |
systemd:gromox-timer |
| 9 | gromox-imap |
systemd:gromox-imap |
| 10 | gromox-pop3 |
systemd:gromox-pop3 |
| 11 | gromox-delivery-queue |
systemd:gromox-delivery-queue |
| 12 | gromox-delivery |
systemd:gromox-delivery |
| 13 | grommunio-antispam |
systemd:grommunio-antispam |
| 14 | grommunio-admin-api |
systemd:grommunio-admin-api |
| 15 | nginx |
systemd:nginx |
Ordering and colocation
Section titled “Ordering and colocation”The constraints enforce one logical chain — promote DRBD → mount /grodata →
provide the bind-mounts → bring up the VIP → start the services — and keep
everything colocated on the DRBD-primary node:
| Constraint | Effect |
|---|---|
drbd_before_grodata |
ms_grodata must be promoted before grodata_mount starts |
grodata_before_binds |
/grodata mounts before the bind-mounts |
binds_before_grommunio_svc |
Bind-mounts start before the services |
grodata_before_ip / ip_before_grommunio_svc |
The VIP is tied to the /grodata stack and starts before grommunio_svc |
grodata_on_drbd (colocation) |
/grodata runs on the DRBD-promoted node |
grommunio_svc_on_* (colocation) |
Services run together with the VIP, /grodata and the bind-mounts |
no_*_on_witness (location, -inf) |
The VIP, mount and services are excluded from the witness |
The full configuration is in the example below.
Postfix (MTA)
Section titled “Postfix (MTA)”In this reference, Postfix runs under systemd and is not a cluster resource — it is configured separately on each node. Postfix integrates with grommunio via MySQL lookup maps and a milter:
| Parameter | Example value / path |
|---|---|
myhostname |
grommunio.example.com |
virtual_mailbox_domains |
mysql:/etc/postfix/grommunio-virtual-mailbox-domains.cf |
virtual_mailbox_maps |
mysql:/etc/postfix/grommunio-virtual-mailbox-maps.cf |
virtual_alias_maps |
mysql:/etc/postfix/grommunio-virtual-mailbox-alias-maps.cf |
recipient_bcc_maps |
mysql:/etc/postfix/grommunio-bcc-forwards.cf |
virtual_transport |
smtp:[::1]:24 |
smtpd_milters |
inet:localhost:11332 (when grommunio-antispam is active) |
You can run Postfix in any of three supported ways; choose per your operating model:
- systemd-only on each node (as above). A standby node can still send local system mail even though it holds no active Postfix role.
- As a Pacemaker resource added to the HA group, so it fails over with the rest of the stack.
- As a clone, running on both nodes simultaneously.
After any change, reconcile the documentation with postconf -n from the live
host.
Operations runbook
Section titled “Operations runbook”Status and health checks
Section titled “Status and health checks”# Clustercrm_mon -1crm statuscrm configure show
# Floating IPip a | grep 10.0.0.10crm resource status groCluster
# DRBD and the mountdrbdadm statuscat /proc/drbdmount | grep /grodatadf -h /grodata
# Services and logssystemctl --failedjournalctl -u corosync -u pacemakerjournalctl -fu gromox-httpjournalctl -fu grommunio-admin-apiClean up and restart resources
Section titled “Clean up and restart resources”# Clear failure state (all resources, or a single one)crm resource cleanupcrm resource cleanup <RESOURCE>
# Restart an individual servicecrm resource restart gromox-httpcrm resource restart grommunio-admin-apicrm resource restart nginxNode standby / online
Section titled “Node standby / online”crm node standby node1 # take a node out of resource placementcrm node online node1 # make it eligible againcrm node online node2 # keep the secondary ready for failoverPlanned failover
Section titled “Planned failover”- Check cluster health:
crm_mon -1— quorum present, no failed resources. - Ensure the target node is online and not in standby (
crm node online node2). - Verify the DRBD sync state:
drbdadm statusandcat /proc/drbd. - Put the active node into standby, or move the resources to the target node in a controlled way.
- Confirm on the target node: DRBD promoted,
/grodatamounted, bind-mounts active, VIP up, services started. - Validate application-side: web login, Admin API, IMAP/SMTP, mail queue.
Postfix operations
Section titled “Postfix operations”postconf -nsystemctl status postfixjournalctl -fu postfixmailqpostqueue -fpostsuper -d ALL # flush the queue — only with operational sign-offBackup and restore
Section titled “Backup and restore”Configuration to back up: /etc/corosync/*; the Pacemaker CIB
(crm configure show > cib.txt, optionally cibadmin --query > cib.xml);
/etc/drbd.d/*; /etc/fstab and the /grodata mount layout; /etc/gromox/*;
/etc/grommunio-common/* (including TLS material); /etc/grommunio-admin-api/*;
/etc/nginx/*; PHP/php-fpm configuration; /etc/postfix/*; and host/network
files (/etc/hosts, /etc/hostname, NetworkManager connections, sshd_config).
Data to back up: consistent MariaDB dumps and/or physical backups of
/grodata/mysql; a file-level backup of /grodata/gromox; the remaining
/grodata/* subdirectories (web, redis, dav, admin-api, antispam); TLS
certificates and private keys (with separate access control); and the secrets
from your password/secret manager.
Restore principle: provision a node with an identical OS/package base,
restore (or re-initialize) the DRBD configuration and backing disk, replay the
configuration from backup, restore the data under /grodata and verify the
bind-mounts, start MariaDB consistently and check the grommunio/Postfix maps,
then bring the cluster resources up in a controlled order and run acceptance
tests.
Security & hardening
Section titled “Security & hardening”- Configure STONITH/fencing (see the caution above) before production use.
- Keep TLS private keys (
/etc/grommunio-common/ssl/server.key) and the certificate bundle with restrictive file permissions. - Hold secrets (database credentials, etc.) in a secret manager — never in the documentation or a repository.
- Harden SSH and restrict cluster/replication traffic to a trusted network.
Example configuration
Section titled “Example configuration”A genericized crm configure show for the architecture above. Adapt node names,
the VIP, the NIC and your DRBD/fencing specifics; this is an example to work
from, not a drop-in.
node 1: node1 attributes standby=offnode 2: node2 attributes standby=offnode 3: node3 attributes standby=off
primitive groCluster IPaddr2 \ params ip=10.0.0.10 cidr_netmask=24 nic=eth0 \ op monitor interval=15s
primitive grodata ocf:linbit:drbd \ params drbd_resource=grodata \ op monitor interval=15s role=Promoted \ op monitor interval=30s role=Unpromoted
primitive grodata_mount Filesystem \ params device="/dev/drbd0" directory="/grodata" fstype=xfs \ op monitor interval=20s
# One bind-mount primitive per service data directory (fstype=none, options=bind)primitive grodata_mount_bind_mysql Filesystem \ params device="/grodata/mysql" directory="/var/lib/mysql" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_redis Filesystem \ params device="/grodata/redis" directory="/var/lib/redis" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_gromox Filesystem \ params device="/grodata/gromox" directory="/var/lib/gromox" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_grommunio_web Filesystem \ params device="/grodata/grommunio-web" directory="/var/lib/grommunio-web" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_grommunio_antispam Filesystem \ params device="/grodata/grommunio-antispam" directory="/var/lib/grommunio-antispam" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_grommunio_dav Filesystem \ params device="/grodata/grommunio-dav" directory="/var/lib/grommunio-dav" fstype=none options=bind \ op monitor interval=20s timeout=40sprimitive grodata_mount_bind_grommunio_admin_api Filesystem \ params device="/grodata/grommunio-admin-api" directory="/var/lib/grommunio-admin-api" fstype=none options=bind \ op monitor interval=20s timeout=40s
# Service primitives (one per unit; same op timeouts) — see the start-order tableprimitive mariadb systemd:mariadb \ op monitor interval=30s timeout=30s \ op start interval=0s timeout=60s \ op stop interval=0s timeout=60s# … redis-grommunio, php-fpm, gromox-http, gromox-midb, gromox-zcore, gromox-event,# gromox-timer, gromox-imap, gromox-pop3, gromox-delivery-queue, gromox-delivery,# grommunio-antispam, grommunio-admin-api, nginx (identical pattern)
group grodata_binds \ grodata_mount_bind_mysql grodata_mount_bind_redis grodata_mount_bind_gromox \ grodata_mount_bind_grommunio_web grodata_mount_bind_grommunio_antispam \ grodata_mount_bind_grommunio_dav grodata_mount_bind_grommunio_admin_api
group grommunio_svc \ mariadb redis-grommunio php-fpm gromox-http gromox-midb gromox-zcore \ gromox-event gromox-timer gromox-imap gromox-pop3 gromox-delivery-queue \ gromox-delivery grommunio-antispam grommunio-admin-api nginx
clone ms_grodata grodata \ meta promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 \ notify=true promotable=true interleave=true
# Ordering: promote DRBD → mount → binds → VIP → servicesorder drbd_before_grodata Mandatory: ms_grodata:promote grodata_mount:startorder grodata_before_binds Mandatory: grodata_mount:start grodata_binds:startorder grodata_before_ip Mandatory: grodata_mount groClusterorder binds_before_grommunio_svc Mandatory: grodata_binds:start grommunio_svc:startorder ip_before_grommunio_svc Mandatory: groCluster:start grommunio_svc:start
# Colocation: keep the whole stack on the DRBD-primary nodecolocation grodata_on_drbd inf: grodata_mount ms_grodata:Promotedcolocation grodata_binds_on_grodata inf: grodata_binds grodata_mountcolocation clusterip_on_grodata inf: groCluster grodata_mountcolocation grommunio_svc_on_grodata inf: grommunio_svc grodata_mountcolocation grommunio_svc_on_binds inf: grommunio_svc grodata_bindscolocation grommunio_svc_on_ip inf: grommunio_svc groCluster
# Keep data and services off the witness nodelocation no_grodata_on_witness groCluster -inf: node3location no_mount_on_witness grodata_mount -inf: node3location no_grommunio_on_witness grommunio_svc -inf: node3
property cib-bootstrap-options: \ have-watchdog=false \ cluster-infrastructure=corosync \ cluster-name=grommuniocluster \ stonith-enabled=falsersc_defaults build-resource-defaults: \ resource-stickiness=1