DRBD & Heartbeat – takeover/failback verification

DRBD 上の ext3 filesystem を Heartbeat 配下に設定し、 Primary を落としたときと、落としてからあげたときの syslog。

何が起きているのかさっぱりですが、時間を作って追っかけたいと思います。

とりあえず、 Primary で mount されてた /var/lib/mysql という directory は、 Primary の shutdown 後、ちゃんと Secondary で mount されました。そして落とした Primary の復旧後、ちゃんと Secondary から Primary に戻りました。

以下の document が参考になるかも。

以下は設定。

FILE: /etc/drbd.conf
--
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error   detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on debiancluster1 {
device     /dev/drbd0;
disk       /dev/sdb1;
address    192.168.2.16:7788;
meta-disk  internal;
}
on debiancluster2 {
device    /dev/drbd0;
disk      /dev/sdb1;
address   192.168.2.17:7788;
meta-disk internal;
}
}
FILE: /etc/ha.d/ha.cf
--
logfacility     local0
ucast eth1 192.168.2.17
#ucast eth1 192.168.2.16
auto_failback on
node debiancluster1
node debiancluster2
FILE: /etc/ha.d/haresources
--
debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3

以下は Primary 側を強制 shutdown させたときの Secondary 側の syslog。

Sep 13 00:08:38 debiancluster2 kernel: drbd0: PingAck did not arrive in time.
Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_asender [6652]: cstate Connected --> NetworkFailure
Sep 13 00:08:38 debiancluster2 kernel: drbd0: asender terminated
Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate NetworkFailure --> BrokenPipe
Sep 13 00:08:38 debiancluster2 kernel: drbd0: short read expecting header on sock: r=-512
Sep 13 00:08:38 debiancluster2 kernel: drbd0: worker terminated
Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate BrokenPipe --> Unconnected
Sep 13 00:08:38 debiancluster2 kernel: drbd0: Connection lost.
Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate Unconnected --> WFConnection
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: node debiancluster1: is dead
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: No STONITH device configured.
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: Shared disks are not protected.
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: info: Resources being acquired from debiancluster1.
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: info: Link debiancluster1:eth1 dead.
Sep 13 00:08:56 debiancluster2 heartbeat: [6925]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Sep 13 00:08:56 debiancluster2 harc[6925]: info: Running /etc/ha.d/rc.d/status status
Sep 13 00:08:56 debiancluster2 heartbeat: [6926]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys debiancluster2] to acquire.
Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: debug: StartNextRemoteRscReq(): child count 1
Sep 13 00:08:56 debiancluster2 mach_down[6945]: info: Taking over resource group drbddisk::r0
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Acquiring resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Running /etc/ha.d/resource.d/drbddisk r0 start
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 start
Sep 13 00:08:57 debiancluster2 kernel: drbd0: Secondary/Unknown --> Primary/Unknown
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: /etc/ha.d/resource.d/drbddisk r0 start done. RC=0
Sep 13 00:08:57 debiancluster2 Filesystem[7101]: INFO: Running status for /dev/drbd0 on /var/lib/mysql
Sep 13 00:08:57 debiancluster2 Filesystem[7101]: INFO: /var/lib/mysql is unmounted (stopped)
Sep 13 00:08:57 debiancluster2 Filesystem[7037]: INFO: Filesystem Resource is stopped
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start
Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start
Sep 13 00:08:57 debiancluster2 Filesystem[7210]: INFO: Running start for /dev/drbd0 on /var/lib/mysql
Sep 13 00:08:57 debiancluster2 kernel: kjournald starting.  Commit interval 5 seconds
Sep 13 00:08:57 debiancluster2 kernel: EXT3 FS on drbd0, internal journal
Sep 13 00:08:57 debiancluster2 kernel: EXT3-fs: recovery complete.
Sep 13 00:08:57 debiancluster2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 13 00:08:57 debiancluster2 Filesystem[7146]: INFO: Filesystem Success
Sep 13 00:08:58 debiancluster2 ResourceManager[6965]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start done. RC=0
Sep 13 00:08:58 debiancluster2 mach_down[6945]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Sep 13 00:08:58 debiancluster2 mach_down[6945]: info: mach_down takeover complete for node debiancluster1.
Sep 13 00:08:58 debiancluster2 heartbeat: [6088]: info: mach_down takeover complete.

以下は強制 shutdown した Primary を再起動したときの Primary 側の syslog。

Sep 13 09:11:24 debiancluster1 kernel: drbd: initialised. Version: 0.7.21 (api:79/proto:74)
Sep 13 09:11:24 debiancluster1 kernel: drbd: SVN Revision: 2326 build by root@debiancluster1, 2007-09-13 07:55:36
Sep 13 09:11:24 debiancluster1 kernel: drbd: registered as block device major 147
Sep 13 09:11:24 debiancluster1 kernel: drbd0: resync bitmap: bits=5208430 words=81382
Sep 13 09:11:24 debiancluster1 kernel: drbd0: size = 19 GB (20833720 KB)
Sep 13 09:11:24 debiancluster1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map.
Sep 13 09:11:24 debiancluster1 kernel: drbd0: Found 6 transactions (267 active extents) in activity log.
Sep 13 09:11:24 debiancluster1 kernel: drbd0: Marked additional 809 MB as out-of-sync based on AL.
Sep 13 09:11:24 debiancluster1 kernel: drbd0: drbdsetup [2198]: cstate Unconfigured --> StandAlone
Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbdsetup [2211]: cstate StandAlone --> Unconnected
Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate Unconnected --> WFConnection
Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFConnection --> WFReportParams
Sep 13 09:11:25 debiancluster1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Sep 13 09:11:25 debiancluster1 kernel: drbd0: Connection established.
Sep 13 09:11:25 debiancluster1 kernel: drbd0: I am(S): 1:00000002:00000001:00000003:00000002:10
Sep 13 09:11:25 debiancluster1 kernel: drbd0: Peer(P): 1:00000002:00000001:00000003:00000003:10
Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFReportParams --> WFBitMapT
Sep 13 09:11:25 debiancluster1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFBitMapT --> SyncTarget
Sep 13 09:11:25 debiancluster1 kernel: drbd0: Resync started as SyncTarget (need to sync 828856 KB [207214 bits set]).
Sep 13 09:11:30 debiancluster1 heartbeat: [2278]: WARN: Core dumps could be lost if multiple dumps occur
Sep 13 09:11:30 debiancluster1 heartbeat: [2278]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: info: **************************
Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: info: Configuration validated. Starting heartbeat 2.0.7
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: heartbeat: version 2.0.7
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: Heartbeat generation: 4
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_TriggerHandler: Added signal manual handler
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: Removing /var/run/heartbeat/rsctmp failed, recreating.
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: bound send socket to device: eth1
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: bound receive socket to device: eth1
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: started on port 694 interface eth1 to 192.168.2.17
Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Local status now set to: 'up'
Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Link debiancluster2:eth1 up.
Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Status update for node debiancluster2: status active
Sep 13 00:11:42 debiancluster1 heartbeat: [2343]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Sep 13 00:11:42 debiancluster1 harc[2343]: info: Running /etc/ha.d/rc.d/status status
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Comm_now_up(): updating status to active
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Local status now set to: 'active'
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 480 ms (> 50 ms) (GSource: 0x5dfeb8)
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: remote resource transition completed.
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: remote resource transition completed.
Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Local Resource acquisition completed. (none)
Sep 13 00:11:44 debiancluster1 heartbeat: [2279]: info: debiancluster2 wants to go standby [foreign]
Sep 13 00:11:46 debiancluster1 kernel: drbd0: Secondary/Primary --> Secondary/Secondary
Sep 13 00:11:46 debiancluster1 heartbeat: [2279]: info: standby: acquire [foreign] resources from debiancluster2
Sep 13 00:11:46 debiancluster1 heartbeat: [2354]: info: acquire local HA resources (standby).
Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: info: Acquiring resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3
Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: info: Running /etc/ha.d/resource.d/drbddisk r0 start
Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 start
Sep 13 00:11:46 debiancluster1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary
Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: debug: /etc/ha.d/resource.d/drbddisk r0 start done. RC=0
Sep 13 00:11:47 debiancluster1 Filesystem[2500]: INFO: Running status for /dev/drbd0 on /var/lib/mysql
Sep 13 00:11:47 debiancluster1 Filesystem[2500]: INFO: /var/lib/mysql is unmounted (stopped)
Sep 13 00:11:47 debiancluster1 Filesystem[2436]: INFO: Filesystem Resource is stopped
Sep 13 00:11:47 debiancluster1 ResourceManager[2364]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start
Sep 13 00:11:47 debiancluster1 ResourceManager[2364]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start
Sep 13 00:11:47 debiancluster1 Filesystem[2609]: INFO: Running start for /dev/drbd0 on /var/lib/mysql
Sep 13 00:11:49 debiancluster1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 13 00:11:49 debiancluster1 kernel: EXT3 FS on drbd0, internal journal
Sep 13 00:11:49 debiancluster1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 13 00:11:49 debiancluster1 Filesystem[2545]: INFO: Filesystem Success
Sep 13 00:11:49 debiancluster1 ResourceManager[2364]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start done. RC=0
Sep 13 00:11:49 debiancluster1 heartbeat: [2354]: info: local HA resource acquisition completed (standby).
Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: Standby resource acquisition done [foreign].
Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: Initial resource acquisition complete (auto_failback)
Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: remote resource transition completed.
Sep 13 00:13:40 debiancluster1 kernel: drbd0: Resync done (total 127 sec; paused 0 sec; 6524 K/sec)
Sep 13 00:13:40 debiancluster1 kernel: drbd0: drbd0_worker [2199]: cstate SyncTarget --> Connected

以下は強制 shutdown した Primary を再起動したときの Secondary 側の syslog。

Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFConnection --> WFReportParams
Sep 13 00:11:32 debiancluster2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Sep 13 00:11:32 debiancluster2 kernel: drbd0: Connection established.
Sep 13 00:11:32 debiancluster2 kernel: drbd0: I am(P): 1:00000002:00000001:00000003:00000003:10
Sep 13 00:11:32 debiancluster2 kernel: drbd0: Peer(S): 1:00000002:00000001:00000003:00000002:10
Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFReportParams --> WFBitMapS
Sep 13 00:11:32 debiancluster2 kernel: drbd0: Primary/Unknown --> Primary/Secondary
Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFBitMapS --> SyncSource
Sep 13 00:11:32 debiancluster2 kernel: drbd0: Resync started as SyncSource (need to sync 828856 KB [207214 bits set]).
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Heartbeat restart on node debiancluster1
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Link debiancluster1:eth1 up.
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status init
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status up
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: debug: StartNextRemoteRscReq(): child count 1
Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: debug: get_delnodelist: delnodelist=
Sep 13 00:11:42 debiancluster2 heartbeat: [7260]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Sep 13 00:11:42 debiancluster2 harc[7260]: info: Running /etc/ha.d/rc.d/status status
Sep 13 00:11:42 debiancluster2 heartbeat: [7270]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Sep 13 00:11:42 debiancluster2 harc[7270]: info: Running /etc/ha.d/rc.d/status status
Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status active
Sep 13 00:11:43 debiancluster2 heartbeat: [7280]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Sep 13 00:11:43 debiancluster2 harc[7280]: info: Running /etc/ha.d/rc.d/status status
Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: remote resource transition completed.
Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: debiancluster2 wants to go standby [foreign]
Sep 13 00:11:44 debiancluster2 heartbeat: [6088]: info: standby: debiancluster1 can take our foreign resources
Sep 13 00:11:44 debiancluster2 heartbeat: [7290]: info: give up foreign HA resources (standby).
Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: info: Releasing resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3
Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop
Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop
Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: Running stop for /dev/drbd0 on /var/lib/mysql
Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: Trying to unmount /var/lib/mysql
Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: unmounted /var/lib/mysql successfully
Sep 13 00:11:45 debiancluster2 Filesystem[7335]: INFO: Filesystem Success
Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop done. RC=0
Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop
Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 stop
Sep 13 00:11:45 debiancluster2 kernel: drbd0: Primary/Secondary --> Secondary/Secondary
Sep 13 00:11:46 debiancluster2 ResourceManager[7300]: debug: /etc/ha.d/resource.d/drbddisk r0 stop done. RC=0
Sep 13 00:11:46 debiancluster2 heartbeat: [7290]: info: foreign HA resource release completed (standby).
Sep 13 00:11:46 debiancluster2 heartbeat: [6088]: info: Local standby process completed [foreign].
Sep 13 00:11:46 debiancluster2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary
Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: WARN: 1 lost packet(s) for [debiancluster1] [17:19]
Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: remote resource transition completed.
Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: No pkts missing from debiancluster1!
Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: Other node completed standby takeover of foreign resources.
Sep 13 00:13:32 debiancluster2 kernel: drbd0: [drbd0_worker/6924] sock_sendmsg time expired, ko = 4294967295
Sep 13 00:13:40 debiancluster2 kernel: drbd0: Resync done (total 127 sec; paused 0 sec; 6524 K/sec)
Sep 13 00:13:40 debiancluster2 kernel: drbd0: drbd0_worker [6924]: cstate SyncSource --> Connected

Comments are closed.