diff mbox

Fix race-condition when reading sysfs attributes.

Message ID 1237908301-2339-2-git-send-email-konrad@virtualiron.com (mailing list archive)
State Not Applicable, archived
Delegated to: christophe varoqui
Headers show

Commit Message

Konrad Rzeszutek March 24, 2009, 3:24 p.m. UTC
This fix re-introduces a wait loop when reading sysfs attributes.
The reason a wait is necessary is due to the way the kernel
sends the event. When a SCSI device is added the SCSI subsystem
pursues this path:

_sysfs_add_sdev:
    calls device_add ...
    [ '/devices/platform/host16/session6/target16:0:0/16:0:0:17'] uevent
        bus_attach_device
            bus_for_each_drv
                driver_probe_device
                    sd_probe
                    ['/class/scsi_disk/16:0:0:17' ] uevent
                        add_disk
                        ['/block/sdai'] [ Here multipath starts its job ]

    calls class_device_add ...
        [ '/class/scsi_device/16:0:0:17' ] uevent
        sg_add:
            [ '/class/scsi_generic/sg35' ] uevent

    done with device_add, and now the kernel adds the  attributes:
    scsi_sysfs_sdev_attrs[i].vendor, model, rev

There are four uevents sent from the kernel in the creation of
a SysFS representation of the device. After the last event, the
SysFs entries for vendor,model, rev are populated. The window
time between the uevent with /block/sdX sent and the sysfs
attributes being populated is small. But it is possible (with
a slow serial link, slow SCSI target), to get us in a race
condition where multipath investigates the new block device and
finds it can't read the vendor, and the block device ends up
being discarded and never added to the device mapper multipath map.
---
 libmultipath/discovery.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

Comments

Kay Sievers March 24, 2009, 3:34 p.m. UTC | #1
On Tue, Mar 24, 2009 at 16:24, Konrad Rzeszutek <konrad@virtualiron.com> wrote:
> This fix re-introduces a wait loop when reading sysfs attributes.
> The reason a wait is necessary is due to the way the kernel
> sends the event.

> There are four uevents sent from the kernel in the creation of
> a SysFS representation of the device. After the last event, the
> SysFs entries for vendor,model, rev are populated. The window
> time between the uevent with /block/sdX sent and the sysfs
> attributes being populated is small.

What kernel version are you using? I thought we fixed all that long ago.

Thanks,
Kay

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Konrad Rzeszutek March 24, 2009, 5:52 p.m. UTC | #2
On Tue, Mar 24, 2009 at 04:34:38PM +0100, Kay Sievers wrote:
> On Tue, Mar 24, 2009 at 16:24, Konrad Rzeszutek <konrad@virtualiron.com> wrote:
> > This fix re-introduces a wait loop when reading sysfs attributes.
> > The reason a wait is necessary is due to the way the kernel
> > sends the event.
> 
> > There are four uevents sent from the kernel in the creation of
> > a SysFS representation of the device. After the last event, the
> > SysFs entries for vendor,model, rev are populated. The window
> > time between the uevent with /block/sdX sent and the sysfs
> > attributes being populated is small.
> 
> What kernel version are you using? I thought we fixed all that long ago.

RHEL5.3. Which I thought had many of the fixes from upstream. Let me build
the most recent version and see if I can trigger these issues on that
version.

> 
> Thanks,
> Kay
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Konrad Rzeszutek March 24, 2009, 9:07 p.m. UTC | #3
> > > There are four uevents sent from the kernel in the creation of
> > > a SysFS representation of the device. After the last event, the
> > > SysFs entries for vendor,model, rev are populated. The window
> > > time between the uevent with /block/sdX sent and the sysfs
> > > attributes being populated is small.
> > 
> > What kernel version are you using? I thought we fixed all that long ago.
> 
> RHEL5.3. Which I thought had many of the fixes from upstream. Let me build
> the most recent version and see if I can trigger these issues on that
> version.

Kay,
You were right. I can't reproduce the failure on the more recent versions
of kernels. Thanks for spotting that.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 9ae2b8f..1d2c396 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -124,11 +124,49 @@  path_discovery (vector pathvec, struct config * conf, int flag)
 	return r;
 }
 
+
+#define WAIT_MAX_SECONDS 5
+#define WAIT_LOOP_PER_SECOND 5
+
+static int
+wait_for_file (char * filename)
+{
+	int loop;
+	struct stat stats;
+	/*
+ 	* the daemon can race udev upon path add,
+ 	* not multipath(8), if ran by udev
+ 	*/
+
+	if (!conf->daemon)
+		return 0;
+
+	loop = WAIT_MAX_SECONDS * WAIT_LOOP_PER_SECOND;
+
+	while (--loop) {
+		if (stat(filename, &stats) == 0)
+			return 0;
+
+		if (errno != ENOENT)
+			return 1;
+
+		usleep(1000 * 1000 / WAIT_LOOP_PER_SECOND);
+	}
+	return 1;
+}
+
 #define declare_sysfs_get_str(fname) \
 extern int \
 sysfs_get_##fname (struct sysfs_device * dev, char * buff, size_t len) \
 { \
 	char *attr; \
+	char attr_path[SYSFS_PATH_SIZE]; \
+\
+	if (safe_sprintf(attr_path, "%s/%s/%s", sysfs_path, dev->devpath, #fname)) \
+		return 1; \
+\
+	if (wait_for_file(attr_path)) \
+		return 1; \
 \
 	attr = sysfs_attr_get_value(dev->devpath, #fname); \
 	if (!attr) \