Hhmx.de * hhmx.de

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 14:47:35

A few days ago, we started converting our single-instance MariaDB database to a three-node Galera cluster. However, we had difficulties synchronizing the setup to the other nodes, and our experiments resulted in several short downtimes of our main service.

While our tests on various test clusters always worked, we never managed to reproduce it on the production one. It kinda drove us crazy.

Now we finally figured it out, and we can proceed with next steps towards load-balancing.

0x 4 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 14:48:55

It looks like the name for our cluster was too long (`wsrep_cluster_name`). We renamed from
"Codeberg Galera Production Cluster"
to
"Codeberg Galera Prod Cluster"
and now everything works. Our "staging" clusters were apparently always short enough for it to work.

0x 6 0x

Dustin Lang
@dstndstn@hachyderm.io

Föderation EN Fr 28.03.2025 14:50:20

@Codeberg 27 characters seems like a *super* weird limit!!

0x 2 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 14:53:52

@dstndstn The limit is likely 32 characters.

0x 1 0x

Dustin Lang
@dstndstn@hachyderm.io

Föderation EN Fr 28.03.2025 15:07:17

@Codeberg ahh, I missed that you shortened "Production" to "Prod" as well as "Cluster" to "Cluste" :)

0x 1 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 15:12:31

@dstndstn Oh, the latter was a copy-paste issue. We only shortened "Production" to "Prod".

0x 0 0x

Jan Wildeboer 😷 :krulorange:
@jwildeboer@social.wildeboer.net

Föderation EN Fr 28.03.2025 14:59:36

@Codeberg Interesting. Took me quite some digging to find this rather vague notice:

"Note
It should not exceed 32 characters. A node cannot join the cluster if the cluster names do not match. You must re-bootstrap the cluster after a name change."

https://docs.percona.com/percona-xtradb-cluster/8.0/wsrep-system-index.html#wsrep_cluster_name

The Google "AI" hallucinates that is max 255 characters.

0x 1 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 15:07:10

@jwildeboer Oh, interesting. After figuring this out, I searched the mariadb docs and couldn't find any mention of such a limit.
~f

0x 0 0x

david
@david@vnecke.social

Föderation DE Fr 28.03.2025 15:00:24

@Codeberg thanks for sharing!

0x 0 0x

Livia Weigel
@livia@sciences.social

Föderation EN Fr 28.03.2025 15:23:21

@Codeberg The more you know! Thanks for sharing :)

0x 0 0x

Bart
@bartvdbraak@mstdn.social

Föderation EN Sa 29.03.2025 19:06:30

@Codeberg wow who figured that one out🥲

0x 0 0x

Scott Leggett :fedi: :golang:
@smlx@fosstodon.org

Föderation EN Fr 28.03.2025 14:56:23

@Codeberg

> we had difficulties synchronizing the setup to the other nodes, and our experiments resulted in several short downtimes of our main service.

As a former Galera admin I have to say this is not at all an unusual experience 🙈

0x 1 0x

sadmin
@sadmin@social.tchncs.de

Föderation EN Fr 28.03.2025 18:46:06

@smlx @Codeberg isn't this a read-only operation on main? Why does it lead to main instance downtime?

0x 1 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Fr 28.03.2025 21:02:42

@sadmin
The main node (donor node) needs to be started in bootstrap mode, and restarted when config changes, so it required several restarts for testing.
@smlx

0x 1 0x

darix
@darix@mastodon.social

Föderation EN Sa 29.03.2025 21:25:27

@Codeberg @sadmin @smlx @johanneskastl

well you could make one of the replicas primary node when you reboot the bootstrap node :)

but yeah it is pita. luckily haproxy switches for us automatically with a small service on each node that queries galera sync state for httpcheck

0x 1 0x

Codeberg.org
@Codeberg@social.anoxinon.de

Föderation EN Sa 29.03.2025 21:27:34

@darix
The point is that you need to restart the single instance mariadb node in order to create a cluster (the attempts failed multiple times). You cannot have cluster features before you create one.
@sadmin @smlx @johanneskastl

0x 1 0x

darix
@darix@mastodon.social

Föderation EN Sa 29.03.2025 21:28:11

@Codeberg @sadmin @smlx @johanneskastl oh I know the pain :) someone got to do it for opensuse :)

0x 0 0x

darix
@darix@mastodon.social

Föderation EN Sa 29.03.2025 12:30:27

@Codeberg we use mariabackup for synchronization of our nodes in another opensource project infrastructure

0x 0 0x