Message ID | 1306325431.2290.30.camel@lapoo.opensvc.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
> Well, forget it ... it was just a matter of 'make clean && make' as some > structures changed length. > > The merge is over and pushed to korg. > I catched only to problems. > You might want to review the fixes. > Hmm, there might be other problems. on a 'multipathd reconfigure' command, the uxclient gets stuck and the multipathd daemon strace shows: $ sudo strace -f -p 17721 Process 17721 attached with 7 threads - interrupt to quit [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished ...> [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 17755] ioctl(3, DM_DEV_WAIT <unfinished ...> [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 17723] recvmsg(6, <unfinished ...> [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL <unfinished ...> [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
On 05/25/2011 02:22 PM, Christophe Varoqui wrote: >> Well, forget it ... it was just a matter of 'make clean&& make' as some >> structures changed length. >> >> The merge is over and pushed to korg. >> I catched only to problems. >> You might want to review the fixes. >> > Hmm, there might be other problems. > > on a 'multipathd reconfigure' command, the uxclient gets stuck and the > multipathd daemon strace shows: > > $ sudo strace -f -p 17721 > Process 17721 attached with 7 threads - interrupt to quit > [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL > <unfinished ...> > [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > [pid 17755] ioctl(3, DM_DEV_WAIT<unfinished ...> > [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > [pid 17723] recvmsg(6,<unfinished ...> > [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL > <unfinished ...> > [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL > I've just send some more fixes which have been uncovered during recent tests. - Use refcounting for sysfs devices: We do use sysdev_get() and sysdev_put(), but forgot to refcount them. So occasionally uevents would try to call sysdev_put() on already freed devices. - Race condition during shutdown with stop_waiter_thread(): Trying to access anything inside the waiter structure is suicidal, as the waiter thread might be gone at any time. - Do not handle external renames from multipathd thread: Fallout from the above patch; and what's more, the external rename should be handled by the waiter thread already. So no point in doing so from the daemon itself. Patches should appear on the mailing list pretty soon. Cheers, Hannes
> on a 'multipathd reconfigure' command, the uxclient gets stuck and the > multipathd daemon strace shows: > > $ sudo strace -f -p 17721 > Process 17721 attached with 7 threads - interrupt to quit > [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL > <unfinished ...> > [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > [pid 17755] ioctl(3, DM_DEV_WAIT <unfinished ...> > [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL > <unfinished ...> > [pid 17723] recvmsg(6, <unfinished ...> > [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL > <unfinished ...> > [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL > ok, I dug it to 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296 Here you add acquire/release the vector lock inside multipathd/main.c:reconfigure(), but as seen in the following LCKDBG trace, the lock is already acquired in multipathd/main.c:uxsock_trigger() Hence the lock -> lock = hang. I commited and pushed a partial revert of 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296 But maybe you'd rather see us stop acquiring the lock from uxsock_trigger() to acquire more selectively in the functions called from there ... Please comment. Regards,
On 05/25/2011 11:29 PM, Christophe Varoqui wrote: > >> on a 'multipathd reconfigure' command, the uxclient gets stuck and the >> multipathd daemon strace shows: >> >> $ sudo strace -f -p 17721 >> Process 17721 attached with 7 threads - interrupt to quit >> [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL >> <unfinished ...> >> [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL >> <unfinished ...> >> [pid 17755] ioctl(3, DM_DEV_WAIT<unfinished ...> >> [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL >> <unfinished ...> >> [pid 17723] recvmsg(6,<unfinished ...> >> [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL >> <unfinished ...> >> [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL >> > ok, I dug it to 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296 > > Here you add acquire/release the vector lock inside > multipathd/main.c:reconfigure(), but as seen in the following LCKDBG > trace, the lock is already acquired in > multipathd/main.c:uxsock_trigger() > > Hence the lock -> lock = hang. > > I commited and pushed a partial revert of > 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296 > > But maybe you'd rather see us stop acquiring the lock from > uxsock_trigger() to acquire more selectively in the functions called > from there ... Please comment. > Hmm. Yes, your fix appears to be correct. I had several locking issues during startup (calling cli commands while the daemon is still starting up is a nice way of testing it), and several (unsuccessful) attempts in fixing it. Real cause was a missing locking during initial configuration, so it looks as if 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296 was in fact a left-over from the earlier attempts. So yeah, your patch seems to be fine. Will be doing more testing here. Cheers, Hannes
diff --git a/multipathd/main.c b/multipathd/main.c index 4497609..cc75921 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -272,7 +272,6 @@ ev_add_map (char * dev, char * alias, struct vectors * vecs) alias); dm_reassign(alias); } - FREE(alias); return 0; } 2/ commit 7b541d1e2cee70ad5e61edf12333e7ff49615c8c Author: Christophe Varoqui <christophe.varoqui@opensvc.com> Date: Wed May 25 13:59:53 2011 +0200 Set an internal default for feature Fix segfault when no /etc/multipath.conf is present. diff --git a/libmultipath/config.c b/libmultipath/config.c index 87039f0..4236088 100644 --- a/libmultipath/config.c +++ b/libmultipath/config.c @@ -493,6 +493,7 @@ load_config (char * file) conf->bindings_file = set_default(DEFAULT_BINDINGS_FILE); conf->bindings_read_only = 0; conf->multipath_dir = set_default(DEFAULT_MULTIPATHDIR); + conf->features = set_default(DEFAULT_FEATURES); conf->flush_on_last_del = 0; conf->attribute_flags = 0; conf->reassign_maps = DEFAULT_REASSIGN_MAPS;