hhmx.de

Föderation EN Fr 28.03.2025 14:47:35

A few days ago, we started converting our single-instance MariaDB database to a three-node Galera cluster. However, we had difficulties synchronizing the setup to the other nodes, and our experiments resulted in several short downtimes of our main service.

While our tests on various test clusters always worked, we never managed to reproduce it on the production one. It kinda drove us crazy.

Now we finally figured it out, and we can proceed with next steps towards load-balancing.

Föderation EN Fr 28.03.2025 14:48:55

It looks like the name for our cluster was too long (`wsrep_cluster_name`). We renamed from
"Codeberg Galera Production Cluster"
to
"Codeberg Galera Prod Cluster"
and now everything works. Our "staging" clusters were apparently always short enough for it to work.

Föderation EN Fr 28.03.2025 14:50:20

@Codeberg 27 characters seems like a *super* weird limit!!

Föderation EN Fr 28.03.2025 14:53:52

@dstndstn The limit is likely 32 characters.

Föderation EN Fr 28.03.2025 15:07:17

@Codeberg ahh, I missed that you shortened "Production" to "Prod" as well as "Cluster" to "Cluste" :)

Föderation EN Fr 28.03.2025 15:12:31

@dstndstn Oh, the latter was a copy-paste issue. We only shortened "Production" to "Prod".

Föderation EN Fr 28.03.2025 14:59:36

@Codeberg Interesting. Took me quite some digging to find this rather vague notice:

"Note
It should not exceed 32 characters. A node cannot join the cluster if the cluster names do not match. You must re-bootstrap the cluster after a name change."

docs.percona.com/percona-xtrad

The Google "AI" hallucinates that is max 255 characters.

Föderation EN Fr 28.03.2025 15:07:10

@jwildeboer Oh, interesting. After figuring this out, I searched the mariadb docs and couldn't find any mention of such a limit.
~f

Föderation DE Fr 28.03.2025 15:00:24

@Codeberg thanks for sharing!

Föderation EN Fr 28.03.2025 15:23:21

@Codeberg The more you know! Thanks for sharing :)

Föderation EN Sa 29.03.2025 19:06:30

@Codeberg wow who figured that one out🥲

Föderation EN Fr 28.03.2025 14:56:23

@Codeberg

> we had difficulties synchronizing the setup to the other nodes, and our experiments resulted in several short downtimes of our main service.

As a former Galera admin I have to say this is not at all an unusual experience 🙈

Föderation EN Fr 28.03.2025 18:46:06

@smlx @Codeberg isn't this a read-only operation on main? Why does it lead to main instance downtime?

Föderation EN Fr 28.03.2025 21:02:42

@sadmin
The main node (donor node) needs to be started in bootstrap mode, and restarted when config changes, so it required several restarts for testing.
@smlx

Föderation EN Sa 29.03.2025 21:25:27

@Codeberg @sadmin @smlx @johanneskastl

well you could make one of the replicas primary node when you reboot the bootstrap node :)

but yeah it is pita. luckily haproxy switches for us automatically with a small service on each node that queries galera sync state for httpcheck

Föderation EN Sa 29.03.2025 21:27:34

@darix
The point is that you need to restart the single instance mariadb node in order to create a cluster (the attempts failed multiple times). You cannot have cluster features before you create one.
@sadmin @smlx @johanneskastl

Föderation EN Sa 29.03.2025 21:28:11

@Codeberg @sadmin @smlx @johanneskastl oh I know the pain :) someone got to do it for opensuse :)

Föderation EN Sa 29.03.2025 12:30:27

@Codeberg we use mariabackup for synchronization of our nodes in another opensource project infrastructure