DRBD 上の ext3 filesystem を Heartbeat 配下に設定し、 Primary を落としたときと、落としてからあげたときの syslog。
何が起きているのかさっぱりですが、時間を作って追っかけたいと思います。
とりあえず、 Primary で mount されてた /var/lib/mysql という directory は、 Primary の shutdown 後、ちゃんと Secondary で mount されました。そして落とした Primary の復旧後、ちゃんと Secondary から Primary に戻りました。
以下の document が参考になるかも。
以下は設定。
FILE: /etc/drbd.conf
--
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 10M;
group 1;
al-extents 257;
}
on debiancluster1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.2.16:7788;
meta-disk internal;
}
on debiancluster2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.2.17:7788;
meta-disk internal;
}
}
FILE: /etc/ha.d/ha.cf -- logfacility local0 ucast eth1 192.168.2.17 #ucast eth1 192.168.2.16 auto_failback on node debiancluster1 node debiancluster2
FILE: /etc/ha.d/haresources -- debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3
以下は Primary 側を強制 shutdown させたときの Secondary 側の syslog。
Sep 13 00:08:38 debiancluster2 kernel: drbd0: PingAck did not arrive in time. Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_asender [6652]: cstate Connected --> NetworkFailure Sep 13 00:08:38 debiancluster2 kernel: drbd0: asender terminated Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate NetworkFailure --> BrokenPipe Sep 13 00:08:38 debiancluster2 kernel: drbd0: short read expecting header on sock: r=-512 Sep 13 00:08:38 debiancluster2 kernel: drbd0: worker terminated Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate BrokenPipe --> Unconnected Sep 13 00:08:38 debiancluster2 kernel: drbd0: Connection lost. Sep 13 00:08:38 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate Unconnected --> WFConnection Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: node debiancluster1: is dead Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: No STONITH device configured. Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: WARN: Shared disks are not protected. Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: info: Resources being acquired from debiancluster1. Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: info: Link debiancluster1:eth1 dead. Sep 13 00:08:56 debiancluster2 heartbeat: [6925]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Sep 13 00:08:56 debiancluster2 harc[6925]: info: Running /etc/ha.d/rc.d/status status Sep 13 00:08:56 debiancluster2 heartbeat: [6926]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys debiancluster2] to acquire. Sep 13 00:08:56 debiancluster2 heartbeat: [6088]: debug: StartNextRemoteRscReq(): child count 1 Sep 13 00:08:56 debiancluster2 mach_down[6945]: info: Taking over resource group drbddisk::r0 Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Acquiring resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3 Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Running /etc/ha.d/resource.d/drbddisk r0 start Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 start Sep 13 00:08:57 debiancluster2 kernel: drbd0: Secondary/Unknown --> Primary/Unknown Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: /etc/ha.d/resource.d/drbddisk r0 start done. RC=0 Sep 13 00:08:57 debiancluster2 Filesystem[7101]: INFO: Running status for /dev/drbd0 on /var/lib/mysql Sep 13 00:08:57 debiancluster2 Filesystem[7101]: INFO: /var/lib/mysql is unmounted (stopped) Sep 13 00:08:57 debiancluster2 Filesystem[7037]: INFO: Filesystem Resource is stopped Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start Sep 13 00:08:57 debiancluster2 ResourceManager[6965]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start Sep 13 00:08:57 debiancluster2 Filesystem[7210]: INFO: Running start for /dev/drbd0 on /var/lib/mysql Sep 13 00:08:57 debiancluster2 kernel: kjournald starting. Commit interval 5 seconds Sep 13 00:08:57 debiancluster2 kernel: EXT3 FS on drbd0, internal journal Sep 13 00:08:57 debiancluster2 kernel: EXT3-fs: recovery complete. Sep 13 00:08:57 debiancluster2 kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 13 00:08:57 debiancluster2 Filesystem[7146]: INFO: Filesystem Success Sep 13 00:08:58 debiancluster2 ResourceManager[6965]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start done. RC=0 Sep 13 00:08:58 debiancluster2 mach_down[6945]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired Sep 13 00:08:58 debiancluster2 mach_down[6945]: info: mach_down takeover complete for node debiancluster1. Sep 13 00:08:58 debiancluster2 heartbeat: [6088]: info: mach_down takeover complete.
以下は強制 shutdown した Primary を再起動したときの Primary 側の syslog。
Sep 13 09:11:24 debiancluster1 kernel: drbd: initialised. Version: 0.7.21 (api:79/proto:74) Sep 13 09:11:24 debiancluster1 kernel: drbd: SVN Revision: 2326 build by root@debiancluster1, 2007-09-13 07:55:36 Sep 13 09:11:24 debiancluster1 kernel: drbd: registered as block device major 147 Sep 13 09:11:24 debiancluster1 kernel: drbd0: resync bitmap: bits=5208430 words=81382 Sep 13 09:11:24 debiancluster1 kernel: drbd0: size = 19 GB (20833720 KB) Sep 13 09:11:24 debiancluster1 kernel: drbd0: 0 KB marked out-of-sync by on disk bit-map. Sep 13 09:11:24 debiancluster1 kernel: drbd0: Found 6 transactions (267 active extents) in activity log. Sep 13 09:11:24 debiancluster1 kernel: drbd0: Marked additional 809 MB as out-of-sync based on AL. Sep 13 09:11:24 debiancluster1 kernel: drbd0: drbdsetup [2198]: cstate Unconfigured --> StandAlone Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbdsetup [2211]: cstate StandAlone --> Unconnected Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate Unconnected --> WFConnection Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFConnection --> WFReportParams Sep 13 09:11:25 debiancluster1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Sep 13 09:11:25 debiancluster1 kernel: drbd0: Connection established. Sep 13 09:11:25 debiancluster1 kernel: drbd0: I am(S): 1:00000002:00000001:00000003:00000002:10 Sep 13 09:11:25 debiancluster1 kernel: drbd0: Peer(P): 1:00000002:00000001:00000003:00000003:10 Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFReportParams --> WFBitMapT Sep 13 09:11:25 debiancluster1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary Sep 13 09:11:25 debiancluster1 kernel: drbd0: drbd0_receiver [2212]: cstate WFBitMapT --> SyncTarget Sep 13 09:11:25 debiancluster1 kernel: drbd0: Resync started as SyncTarget (need to sync 828856 KB [207214 bits set]). Sep 13 09:11:30 debiancluster1 heartbeat: [2278]: WARN: Core dumps could be lost if multiple dumps occur Sep 13 09:11:30 debiancluster1 heartbeat: [2278]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: info: ************************** Sep 13 09:11:31 debiancluster1 heartbeat: [2278]: info: Configuration validated. Starting heartbeat 2.0.7 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: heartbeat: version 2.0.7 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: Heartbeat generation: 4 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_TriggerHandler: Added signal manual handler Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_TriggerHandler: Added signal manual handler Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: bound send socket to device: eth1 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: bound receive socket to device: eth1 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: glib: ucast: started on port 694 interface eth1 to 192.168.2.17 Sep 13 09:11:31 debiancluster1 heartbeat: [2279]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Local status now set to: 'up' Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Link debiancluster2:eth1 up. Sep 13 00:11:42 debiancluster1 heartbeat: [2279]: info: Status update for node debiancluster2: status active Sep 13 00:11:42 debiancluster1 heartbeat: [2343]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Sep 13 00:11:42 debiancluster1 harc[2343]: info: Running /etc/ha.d/rc.d/status status Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Comm_now_up(): updating status to active Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Local status now set to: 'active' Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 480 ms (> 50 ms) (GSource: 0x5dfeb8) Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: remote resource transition completed. Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: remote resource transition completed. Sep 13 00:11:43 debiancluster1 heartbeat: [2279]: info: Local Resource acquisition completed. (none) Sep 13 00:11:44 debiancluster1 heartbeat: [2279]: info: debiancluster2 wants to go standby [foreign] Sep 13 00:11:46 debiancluster1 kernel: drbd0: Secondary/Primary --> Secondary/Secondary Sep 13 00:11:46 debiancluster1 heartbeat: [2279]: info: standby: acquire [foreign] resources from debiancluster2 Sep 13 00:11:46 debiancluster1 heartbeat: [2354]: info: acquire local HA resources (standby). Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: info: Acquiring resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3 Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: info: Running /etc/ha.d/resource.d/drbddisk r0 start Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 start Sep 13 00:11:46 debiancluster1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary Sep 13 00:11:46 debiancluster1 ResourceManager[2364]: debug: /etc/ha.d/resource.d/drbddisk r0 start done. RC=0 Sep 13 00:11:47 debiancluster1 Filesystem[2500]: INFO: Running status for /dev/drbd0 on /var/lib/mysql Sep 13 00:11:47 debiancluster1 Filesystem[2500]: INFO: /var/lib/mysql is unmounted (stopped) Sep 13 00:11:47 debiancluster1 Filesystem[2436]: INFO: Filesystem Resource is stopped Sep 13 00:11:47 debiancluster1 ResourceManager[2364]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start Sep 13 00:11:47 debiancluster1 ResourceManager[2364]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start Sep 13 00:11:47 debiancluster1 Filesystem[2609]: INFO: Running start for /dev/drbd0 on /var/lib/mysql Sep 13 00:11:49 debiancluster1 kernel: kjournald starting. Commit interval 5 seconds Sep 13 00:11:49 debiancluster1 kernel: EXT3 FS on drbd0, internal journal Sep 13 00:11:49 debiancluster1 kernel: EXT3-fs: mounted filesystem with ordered data mode. Sep 13 00:11:49 debiancluster1 Filesystem[2545]: INFO: Filesystem Success Sep 13 00:11:49 debiancluster1 ResourceManager[2364]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 start done. RC=0 Sep 13 00:11:49 debiancluster1 heartbeat: [2354]: info: local HA resource acquisition completed (standby). Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: Standby resource acquisition done [foreign]. Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: Initial resource acquisition complete (auto_failback) Sep 13 00:11:49 debiancluster1 heartbeat: [2279]: info: remote resource transition completed. Sep 13 00:13:40 debiancluster1 kernel: drbd0: Resync done (total 127 sec; paused 0 sec; 6524 K/sec) Sep 13 00:13:40 debiancluster1 kernel: drbd0: drbd0_worker [2199]: cstate SyncTarget --> Connected
以下は強制 shutdown した Primary を再起動したときの Secondary 側の syslog。
Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFConnection --> WFReportParams Sep 13 00:11:32 debiancluster2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Sep 13 00:11:32 debiancluster2 kernel: drbd0: Connection established. Sep 13 00:11:32 debiancluster2 kernel: drbd0: I am(P): 1:00000002:00000001:00000003:00000003:10 Sep 13 00:11:32 debiancluster2 kernel: drbd0: Peer(S): 1:00000002:00000001:00000003:00000002:10 Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFReportParams --> WFBitMapS Sep 13 00:11:32 debiancluster2 kernel: drbd0: Primary/Unknown --> Primary/Secondary Sep 13 00:11:32 debiancluster2 kernel: drbd0: drbd0_receiver [3462]: cstate WFBitMapS --> SyncSource Sep 13 00:11:32 debiancluster2 kernel: drbd0: Resync started as SyncSource (need to sync 828856 KB [207214 bits set]). Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Heartbeat restart on node debiancluster1 Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Link debiancluster1:eth1 up. Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status init Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status up Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: debug: StartNextRemoteRscReq(): child count 1 Sep 13 00:11:42 debiancluster2 heartbeat: [6088]: debug: get_delnodelist: delnodelist= Sep 13 00:11:42 debiancluster2 heartbeat: [7260]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Sep 13 00:11:42 debiancluster2 harc[7260]: info: Running /etc/ha.d/rc.d/status status Sep 13 00:11:42 debiancluster2 heartbeat: [7270]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Sep 13 00:11:42 debiancluster2 harc[7270]: info: Running /etc/ha.d/rc.d/status status Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: Status update for node debiancluster1: status active Sep 13 00:11:43 debiancluster2 heartbeat: [7280]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Sep 13 00:11:43 debiancluster2 harc[7280]: info: Running /etc/ha.d/rc.d/status status Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: remote resource transition completed. Sep 13 00:11:43 debiancluster2 heartbeat: [6088]: info: debiancluster2 wants to go standby [foreign] Sep 13 00:11:44 debiancluster2 heartbeat: [6088]: info: standby: debiancluster1 can take our foreign resources Sep 13 00:11:44 debiancluster2 heartbeat: [7290]: info: give up foreign HA resources (standby). Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: info: Releasing resource group: debiancluster1 drbddisk::r0 Filesystem::/dev/drbd0::/var/lib/mysql::ext3 Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop Sep 13 00:11:44 debiancluster2 ResourceManager[7300]: debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: Running stop for /dev/drbd0 on /var/lib/mysql Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: Trying to unmount /var/lib/mysql Sep 13 00:11:45 debiancluster2 Filesystem[7399]: INFO: unmounted /var/lib/mysql successfully Sep 13 00:11:45 debiancluster2 Filesystem[7335]: INFO: Filesystem Success Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/mysql ext3 stop done. RC=0 Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: info: Running /etc/ha.d/resource.d/drbddisk r0 stop Sep 13 00:11:45 debiancluster2 ResourceManager[7300]: debug: Starting /etc/ha.d/resource.d/drbddisk r0 stop Sep 13 00:11:45 debiancluster2 kernel: drbd0: Primary/Secondary --> Secondary/Secondary Sep 13 00:11:46 debiancluster2 ResourceManager[7300]: debug: /etc/ha.d/resource.d/drbddisk r0 stop done. RC=0 Sep 13 00:11:46 debiancluster2 heartbeat: [7290]: info: foreign HA resource release completed (standby). Sep 13 00:11:46 debiancluster2 heartbeat: [6088]: info: Local standby process completed [foreign]. Sep 13 00:11:46 debiancluster2 kernel: drbd0: Secondary/Secondary --> Secondary/Primary Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: WARN: 1 lost packet(s) for [debiancluster1] [17:19] Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: remote resource transition completed. Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: No pkts missing from debiancluster1! Sep 13 00:11:49 debiancluster2 heartbeat: [6088]: info: Other node completed standby takeover of foreign resources. Sep 13 00:13:32 debiancluster2 kernel: drbd0: [drbd0_worker/6924] sock_sendmsg time expired, ko = 4294967295 Sep 13 00:13:40 debiancluster2 kernel: drbd0: Resync done (total 127 sec; paused 0 sec; 6524 K/sec) Sep 13 00:13:40 debiancluster2 kernel: drbd0: drbd0_worker [6924]: cstate SyncSource --> Connected
