Message ID | 150309002518.8999.15900049133748830764.stgit@brunhilda (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote: > prevent systemd-udevd from changing a device's sysfs entry > max_sectors_kb back to the default value. > - max_sectors_kb can be tweaked for better performance. > - udev can be triggered by sg_logs -t or scsi_temperature, ... > - sd_revalidate_disk is called from udev by ioctl BLKRRPART Hello Don, Which udev rule changes max_sectors_kb back to the default? Why do you want to change the kernel code instead of modifying that udev rule? What software changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps something else? Thanks, Bart.
> -----Original Message----- > From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com] > Sent: Friday, August 18, 2017 4:06 PM > To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry > Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara > <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott > Benesh <scott.benesh@microsemi.com>; Don Brace > <don.brace@microsemi.com>; Bader Ali - Saleh > <bader.alisaleh@microsemi.com>; Kevin Barnett > <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel > <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley > <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com> > Cc: linux-scsi@vger.kernel.org > Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb > > EXTERNAL EMAIL > > > On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote: > > prevent systemd-udevd from changing a device's sysfs entry > > max_sectors_kb back to the default value. > > - max_sectors_kb can be tweaked for better performance. > > - udev can be triggered by sg_logs -t or scsi_temperature, ... > > - sd_revalidate_disk is called from udev by ioctl BLKRRPART > > Hello Don, > > Which udev rule changes max_sectors_kb back to the default? Why do you > want > to change the kernel code instead of modifying that udev rule? What > software > changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps > something else? > > Thanks, > > Bart. As far as I can see, udev looks for file access in sysfs. I am not exactly sure which rule changes this. It was added in more recent distros. Can someone help me out? I wanted to change the kernel code because it looks to me like anytime sd_revalidate_disk is called max_sectors is reset to its maximum value. Anyone tweaking max_sectors_kb for performance reasons will find that it is not persistent. If this distills down to a simpler rule change, then all the better. From my testing: I set /sys/block/sdd/queue/max_sectors_kb to some value. echo 64 > /sys/block/sdd/queue/max_sectors_kb I run sg_logs -t /dev/sdd and the value is reset back to its original value. Other utilities can also trigger udev to run. udevadm monitor monitor will print the received events for: UDEV - the event which udev sends out after rule processing KERNEL - the kernel uevent KERNEL[8537223.347520] change /devices/pci0000:00/0000:00:03.0/0000:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block) UDEV [8537223.399243] change /devices/pci0000:00/0000:00:03.0/0000:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block) ... manager->fd_inotify = udev_watch_init(manager->udev); sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager); on_inotify (systemd source code: src/udev/udevd.c) synthesize_change ioctl --> BLKRRPART ---------------------- Start of kernel code. ---------------------- blkdev_ioctl (block/ioctl.c) CASE:BLKRRPART: blkdev_reread_part (block/ioctl.c) _blkdev_reread_part (block/ioctl.c) rescan_partitions (block/partition-generic.c) if (disk->fops->revalidate_disk) disk->fops->revalidate_disk(disk); ---------------------------------- sd driver (drivers/scsi/sd.c sd_revalidate_disk Thanks for your input, Don Brace ESC - Smart Storage Microsemi Corporation
On Fri, 2017-08-18 at 21:29 +0000, Don Brace wrote: > As far as I can see, udev looks for file access in sysfs. > I am not exactly sure which rule changes this. It was added in more recent > distros. Can someone help me out? Hello Don, Can you check on your test system which udev rule changes max_sectors_kb? I have checked two recent Linux distro's but haven't been able to find such a udev rule: $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l 0 Thanks, Bart.
> -----Original Message----- > From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com] > Sent: Friday, August 18, 2017 4:47 PM > To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry > Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara > <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott > Benesh <scott.benesh@microsemi.com>; Don Brace > <don.brace@microsemi.com>; Bader Ali - Saleh > <bader.alisaleh@microsemi.com>; Kevin Barnett > <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel > <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley > <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com> > Cc: linux-scsi@vger.kernel.org > Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb > > EXTERNAL EMAIL > > > On Fri, 2017-08-18 at 21:29 +0000, Don Brace wrote: > > As far as I can see, udev looks for file access in sysfs. > > I am not exactly sure which rule changes this. It was added in more recent > > distros. Can someone help me out? > > Hello Don, > > Can you check on your test system which udev rule changes > max_sectors_kb? I > have checked two recent Linux distro's but haven't been able to find such a > udev rule: > $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l > 0 > > Thanks, > > Bart. On my system it is 60-block.rules, and it is the last rule in that rule file. -- # do not edit this file, it will be overwritten on update # enable in-kernel media-presence polling ACTION=="add", SUBSYSTEM=="module", KERNEL=="block", ATTR{parameters/events_dfl_poll_msecs}=="0", \ ATTR{parameters/events_dfl_poll_msecs}="2000" # forward scsi device event to corresponding block device ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", TEST=="block", ATTR{block/*/uevent}="change" # watch metadata changes, caused by tools closing the device node which was opened for writing ACTION!="remove", SUBSYSTEM=="block", KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch"
On Mon, 2017-08-21 at 19:12 +0000, Don Brace wrote: > On Friday Bart Van Assche wrote: > > Can you check on your test system which udev rule changes > > max_sectors_kb? I > > have checked two recent Linux distro's but haven't been able to find > > such a udev rule: > > $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc > > -l > > 0 > > > > Thanks, > > > > Bart. > > On my system it is 60-block.rules, and it is the last rule in that rule > file. > -- > # do not edit this file, it will be overwritten on update > > # enable in-kernel media-presence polling > ACTION=="add", SUBSYSTEM=="module", KERNEL=="block", > ATTR{parameters/events_dfl_poll_msecs}=="0", \ > ATTR{parameters/events_dfl_poll_msecs}="2000" > > # forward scsi device event to corresponding block device > ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", > TEST=="block", ATTR{block/*/uevent}="change" > > # watch metadata changes, caused by tools closing the device node which > was opened for writing > ACTION!="remove", SUBSYSTEM=="block", > KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch" Hello Don, Can you have another look at the udev rules on your test system? The last rule in 60-block.rules looks like a watch rule to me. The same holds for the upstream version of that file (https://github.com/systemd/systemd/blob/maste r/rules/60-block.rules). Bart.
> -----Original Message----- > From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com] > Sent: Monday, August 21, 2017 2:53 PM > To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry > Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara > <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott > Benesh <scott.benesh@microsemi.com>; Don Brace > <don.brace@microsemi.com>; Bader Ali - Saleh > <bader.alisaleh@microsemi.com>; Kevin Barnett > <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel > <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley > <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com> > Cc: linux-scsi@vger.kernel.org > Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb > > EXTERNAL EMAIL > > > On Mon, 2017-08-21 at 19:12 +0000, Don Brace wrote: > > On Friday Bart Van Assche wrote: > > > Can you check on your test system which udev rule changes > > > max_sectors_kb? I > > > have checked two recent Linux distro's but haven't been able to find > > > such a udev rule: > > > $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc > > > -l > > > 0 > > > > > > Thanks, > > > > > > Bart. > > > > On my system it is 60-block.rules, and it is the last rule in that rule > > file. > > -- > > # do not edit this file, it will be overwritten on update > > > > # enable in-kernel media-presence polling > > ACTION=="add", SUBSYSTEM=="module", KERNEL=="block", > > ATTR{parameters/events_dfl_poll_msecs}=="0", \ > > ATTR{parameters/events_dfl_poll_msecs}="2000" > > > > # forward scsi device event to corresponding block device > > ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", > > TEST=="block", ATTR{block/*/uevent}="change" > > > > # watch metadata changes, caused by tools closing the device node which > > was opened for writing > > ACTION!="remove", SUBSYSTEM=="block", > > KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch" > > Hello Don, > > Can you have another look at the udev rules on your test system? The last > rule in 60-block.rules looks like a watch rule to me. The same holds for the > upstream version of that file > (https://github.com/systemd/systemd/blob/maste > r/rules/60-block.rules). > > Bart. It is a watch rule. systemd/src/udev/udevd.c manager_new manager->fd_inotify = udev_watch_init(manager->udev); sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager); on_inotify (systemd source code: src/udev/udevd.c) synthesize_change ioctl --> BLKRRPART This rule ends up calling BLKRRPART. Thanks, Don Brace ESC - Smart Storage Microsemi Corporation
On Tue, 2017-08-29 at 17:42 +0000, Don Brace wrote: > From: Don Brace > > [ ... ] > > Hello Don, > > > > Can you have another look at the udev rules on your test system? The last > > rule in 60-block.rules looks like a watch rule to me. The same holds for the > > upstream version of that file > > (https://github.com/systemd/systemd/blob/maste > > r/rules/60-block.rules). > > > > Bart. > > It is a watch rule. > > systemd/src/udev/udevd.c > manager_new > manager->fd_inotify = udev_watch_init(manager->udev); > sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager); > on_inotify (systemd source code: src/udev/udevd.c) > synthesize_change > ioctl --> BLKRRPART > > This rule ends up calling BLKRRPART. Hello Don, Sorry if I'm slow today, but it's not clear to me how the BLKRRPART ioctl triggers a change of max_sectors_kb? And even if it really is this ioctl that triggers a change of max_sectors_kb, should the kernel code that handles max_sectors_kb writes be modified or should rather a udev rule be added that sets max_sectors_kb to the desired value after each partition rescan? Thanks, Bart.
On Tue, 2017-08-29 at 19:41 +0000, Don Brace wrote: > BLKRRPART ends up in the sd driver function sd_revalidate_disk, which resets > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); > back to the value from VPD page 0xb0. > > When you cat out max_sectors_kb, it obtains it's valuse from q->limits.max_sectors > > Adding a udev rule... Will there be a delay between the watch rule and the new rule? Hello Don, In your original e-mail you wrote that max_sectors_kb was tuned because of performance reasons. So does it really matter that for a very short time max_sectors_kb does not have the optimal value? > One more point, what will I change the value back to? The watch rule > changed it and I do not have the original value. To the value you want to set max_sectors_kb to. Bart.
On Tue, 2017-08-29 at 22:32 +0000, Don Brace wrote: > Users do set max_sectors_kb for performance reasons. They can increase > performance by setting max_sectors_kb to different values depending on > their I/O needs. > > And, if they set the value, and sd_revalidate_disk changes this value, > there is no history of what they set the value to. Hello Don, How about asking these users to create a udev rule instead of directly modifying max_sectors_kb in sysfs? Bart.
Bart, > How about asking these users to create a udev rule instead of directly > modifying max_sectors_kb in sysfs? I looked at this for a bit last week to see if I could come up with an elegant way to accommodate values overridden in sysfs and at the same time honor hardware limits changing. However, it quickly gets messy since some parameters are under the request_queue and some are scsi_disk specific. So that involves having override flags several places. Plus there also the whole re-stacking debacle. So I think I prefer udev for this.
Hello Martin, On Tue, 2017-08-29 at 21:24 -0400, Martin K. Petersen wrote: > I looked at this for a bit last week to see if I could come up with > an > elegant way to accommodate values overridden in sysfs and at the same > time honor hardware limits changing. However, it quickly gets messy > since some parameters are under the request_queue and some are > scsi_disk > specific. So that involves having override flags several places. Plus > there also the whole re-stacking debacle. > > So I think I prefer udev for this. Could you please explain why you think Don's patch is wrong? User settings being discarded because of a BLKRRPART ioctl violates the principle of least surprise. With Don's patch, that won't happen any more. If hardware limits change, whether they increase or decrease, the patch will also do the Right Thing, AFAICS. Increasing hw limits will not automatically increase the sw limit, but IMO that's actually expected. Best Regards, Martin
Martin, > Could you please explain why you think Don's patch is wrong? User > settings being discarded because of a BLKRRPART ioctl violates the > principle of least surprise. With Don's patch, that won't happen any > more. If hardware limits change, whether they increase or decrease, the > patch will also do the Right Thing, AFAICS. Increasing hw limits will > not automatically increase the sw limit, but IMO that's actually > expected. I have been mulling over this for a while. I suggest the following...
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index bea36ad..457dc7c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3055,6 +3055,7 @@ static int sd_revalidate_disk(struct gendisk *disk) sector_t old_capacity = sdkp->capacity; unsigned char *buffer; unsigned int dev_max, rw_max; + unsigned int max_sectors; SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_revalidate_disk\n")); @@ -3128,9 +3129,14 @@ static int sd_revalidate_disk(struct gendisk *disk) rw_max = min_not_zero(logical_to_sectors(sdp, dev_max), (sector_t)BLK_DEF_MAX_SECTORS); - /* Combine with controller limits */ - q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); + /* Check for max_sectors_kb update through sysfs */ + if (q->limits.max_sectors < min(rw_max, queue_max_hw_sectors(q))) + max_sectors = q->limits.max_sectors; + else + max_sectors = min(rw_max, queue_max_hw_sectors(q)); + /* Combine with controller limits */ + q->limits.max_sectors = max_sectors; set_capacity(disk, logical_to_sectors(sdp, sdkp->capacity)); sd_config_write_same(sdkp); kfree(buffer);