diff mbox

SUSE multipath-tools patch resync

Message ID 1306325431.2290.30.camel@lapoo.opensvc.com (mailing list archive)
State Deferred, archived
Headers show

Commit Message

Christophe Varoqui May 25, 2011, 12:10 p.m. UTC
On mer., 2011-05-25 at 13:14 +0200, Christophe Varoqui wrote:
> On mer., 2011-05-25 at 11:08 +0200, Hannes Reinecke wrote:
> > On 05/25/2011 09:51 AM, Christophe Varoqui wrote:
> > > On mer., 2011-05-18 at 17:03 +0200, Hannes Reinecke wrote:
> > >> git.kernel.org:/pub/scm/linux/kernel/git/hare/multipath-tools.git
> > >
> > > I started the merge. I'll post comments along the reading. It seems
> > > there won't be much : this patchset is clearly a must-have.
> > >
> > :-)
> > 
> > With some things so bloody obvious that I keep wondering if I'm the
> > only one seeing these ...
> > 
> The merge is done. No change wrt your branch. Still not pushed to korg.
> I struggle with a hwtable corruption with no /etc/multipath.conf ... can
> you reproduce that ?
> 
Well, forget it ... it was just a matter of 'make clean && make' as some
structures changed length.

The merge is over and pushed to korg.
I catched only to problems.
You might want to review the fixes.

1/
commit e2ae02287aaec0b56ac832841130e3c855a5a471
Author: Christophe Varoqui <christophe.varoqui@opensvc.com>
Date:   Wed May 25 14:00:52 2011 +0200

    Fix segfault in dm reassign code path
    
    alias is allocated and freed in multipathd/main.c:uev_add_map().
    
    Don't free it in ev_add_map() called from uev_add_map() to avoid
    double free.

Comments

Christophe Varoqui May 25, 2011, 12:22 p.m. UTC | #1
> Well, forget it ... it was just a matter of 'make clean && make' as some
> structures changed length.
> 
> The merge is over and pushed to korg.
> I catched only to problems.
> You might want to review the fixes.
> 
Hmm, there might be other problems.

on a 'multipathd reconfigure' command, the uxclient gets stuck and the
multipathd daemon strace shows:

$ sudo strace -f -p 17721
Process 17721 attached with 7 threads - interrupt to quit
[pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL
<unfinished ...>
[pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 17755] ioctl(3, DM_DEV_WAIT <unfinished ...>
[pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 17723] recvmsg(6,  <unfinished ...>
[pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL
<unfinished ...>
[pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
Hannes Reinecke May 25, 2011, 12:45 p.m. UTC | #2
On 05/25/2011 02:22 PM, Christophe Varoqui wrote:
>> Well, forget it ... it was just a matter of 'make clean&&  make' as some
>> structures changed length.
>>
>> The merge is over and pushed to korg.
>> I catched only to problems.
>> You might want to review the fixes.
>>
> Hmm, there might be other problems.
>
> on a 'multipathd reconfigure' command, the uxclient gets stuck and the
> multipathd daemon strace shows:
>
> $ sudo strace -f -p 17721
> Process 17721 attached with 7 threads - interrupt to quit
> [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL
> <unfinished ...>
> [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
> <unfinished ...>
> [pid 17755] ioctl(3, DM_DEV_WAIT<unfinished ...>
> [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
> <unfinished ...>
> [pid 17723] recvmsg(6,<unfinished ...>
> [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL
> <unfinished ...>
> [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
>
I've just send some more fixes which have been uncovered during
recent tests.

- Use refcounting for sysfs devices: We do use sysdev_get() and
   sysdev_put(), but forgot to refcount them. So occasionally
   uevents would try to call sysdev_put() on already freed devices.
- Race condition during shutdown with stop_waiter_thread():
   Trying to access anything inside the waiter structure is
   suicidal, as the waiter thread might be gone at any time.
- Do not handle external renames from multipathd thread:
   Fallout from the above patch; and what's more, the external
   rename should be handled by the waiter thread already.
   So no point in doing so from the daemon itself.

Patches should appear on the mailing list pretty soon.

Cheers,

Hannes
Christophe Varoqui May 25, 2011, 9:29 p.m. UTC | #3
> on a 'multipathd reconfigure' command, the uxclient gets stuck and the
> multipathd daemon strace shows:
> 
> $ sudo strace -f -p 17721
> Process 17721 attached with 7 threads - interrupt to quit
> [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL
> <unfinished ...>
> [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
> <unfinished ...>
> [pid 17755] ioctl(3, DM_DEV_WAIT <unfinished ...>
> [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
> <unfinished ...>
> [pid 17723] recvmsg(6,  <unfinished ...>
> [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL
> <unfinished ...>
> [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
> 
ok, I dug it to 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296

Here you add acquire/release the vector lock inside
multipathd/main.c:reconfigure(), but as seen in the following LCKDBG
trace, the lock is already acquired in
multipathd/main.c:uxsock_trigger()

Hence the lock -> lock = hang.

I commited and pushed a partial revert of
9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296

But maybe you'd rather see us stop acquiring the lock from
uxsock_trigger() to acquire more selectively in the functions called
from there ... Please comment.

Regards,
Hannes Reinecke May 26, 2011, 6:28 a.m. UTC | #4
On 05/25/2011 11:29 PM, Christophe Varoqui wrote:
>
>> on a 'multipathd reconfigure' command, the uxclient gets stuck and the
>> multipathd daemon strace shows:
>>
>> $ sudo strace -f -p 17721
>> Process 17721 attached with 7 threads - interrupt to quit
>> [pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL
>> <unfinished ...>
>> [pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
>> <unfinished ...>
>> [pid 17755] ioctl(3, DM_DEV_WAIT<unfinished ...>
>> [pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
>> <unfinished ...>
>> [pid 17723] recvmsg(6,<unfinished ...>
>> [pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL
>> <unfinished ...>
>> [pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
>>
> ok, I dug it to 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
>
> Here you add acquire/release the vector lock inside
> multipathd/main.c:reconfigure(), but as seen in the following LCKDBG
> trace, the lock is already acquired in
> multipathd/main.c:uxsock_trigger()
>
> Hence the lock ->  lock = hang.
>
> I commited and pushed a partial revert of
> 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
>
> But maybe you'd rather see us stop acquiring the lock from
> uxsock_trigger() to acquire more selectively in the functions called
> from there ... Please comment.
>
Hmm. Yes, your fix appears to be correct.
I had several locking issues during startup (calling cli commands
while the daemon is still starting up is a nice way of testing it),
and several (unsuccessful) attempts in fixing it.
Real cause was a missing locking during initial configuration,
so it looks as if 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
was in fact a left-over from the earlier attempts.

So yeah, your patch seems to be fine.

Will be doing more testing here.

Cheers,

Hannes
diff mbox

Patch

diff --git a/multipathd/main.c b/multipathd/main.c
index 4497609..cc75921 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -272,7 +272,6 @@  ev_add_map (char * dev, char * alias, struct vectors
* vecs)
                                alias);
                        dm_reassign(alias);
                }
-               FREE(alias);
                return 0;
        }

2/
commit 7b541d1e2cee70ad5e61edf12333e7ff49615c8c
Author: Christophe Varoqui <christophe.varoqui@opensvc.com>
Date:   Wed May 25 13:59:53 2011 +0200

    Set an internal default for feature
    
    Fix segfault when no /etc/multipath.conf is present.

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 87039f0..4236088 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -493,6 +493,7 @@  load_config (char * file)
        conf->bindings_file = set_default(DEFAULT_BINDINGS_FILE);
        conf->bindings_read_only = 0;
        conf->multipath_dir = set_default(DEFAULT_MULTIPATHDIR);
+       conf->features = set_default(DEFAULT_FEATURES);
        conf->flush_on_last_del = 0;
        conf->attribute_flags = 0;
        conf->reassign_maps = DEFAULT_REASSIGN_MAPS;