diff mbox

sd: preserve sysfs updates to max_sectors_kb

Message ID 150309002518.8999.15900049133748830764.stgit@brunhilda (mailing list archive)
State Superseded
Headers show

Commit Message

Don Brace Aug. 18, 2017, 9 p.m. UTC
prevent systemd-udevd from changing a device's sysfs entry
max_sectors_kb back to the default value.
 - max_sectors_kb can be tweaked for better performance.
 - udev can be triggered by sg_logs -t or scsi_temperature, ...
 - sd_revalidate_disk is called from udev by ioctl BLKRRPART

Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
---
 drivers/scsi/sd.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Aug. 18, 2017, 9:05 p.m. UTC | #1
On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote:
> prevent systemd-udevd from changing a device's sysfs entry

> max_sectors_kb back to the default value.

>  - max_sectors_kb can be tweaked for better performance.

>  - udev can be triggered by sg_logs -t or scsi_temperature, ...

>  - sd_revalidate_disk is called from udev by ioctl BLKRRPART


Hello Don,

Which udev rule changes max_sectors_kb back to the default? Why do you want
to change the kernel code instead of modifying that udev rule? What software
changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps
something else?

Thanks,

Bart.
Don Brace Aug. 18, 2017, 9:29 p.m. UTC | #2
> -----Original Message-----

> From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com]

> Sent: Friday, August 18, 2017 4:06 PM

> To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry

> Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara

> <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott

> Benesh <scott.benesh@microsemi.com>; Don Brace

> <don.brace@microsemi.com>; Bader Ali - Saleh

> <bader.alisaleh@microsemi.com>; Kevin Barnett

> <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel

> <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley

> <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com>

> Cc: linux-scsi@vger.kernel.org

> Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

> 

> EXTERNAL EMAIL

> 

> 

> On Fri, 2017-08-18 at 16:00 -0500, Don Brace wrote:

> > prevent systemd-udevd from changing a device's sysfs entry

> > max_sectors_kb back to the default value.

> >  - max_sectors_kb can be tweaked for better performance.

> >  - udev can be triggered by sg_logs -t or scsi_temperature, ...

> >  - sd_revalidate_disk is called from udev by ioctl BLKRRPART

> 

> Hello Don,

> 

> Which udev rule changes max_sectors_kb back to the default? Why do you

> want

> to change the kernel code instead of modifying that udev rule? What

> software

> changes max_sectors_kb to a smaller value? Is it a udev rule or perhaps

> something else?

> 

> Thanks,

> 

> Bart.


As far as I can see, udev looks for file access in sysfs. 
I am not exactly sure which rule changes this. It was added in more recent
distros. Can someone help me out?

I wanted to change the kernel code because it looks to me like anytime
sd_revalidate_disk is called max_sectors is reset to its maximum value. Anyone
tweaking max_sectors_kb for performance reasons will find that it is not
persistent.

If this distills down to a simpler rule change, then all the better.

From my testing:

I set /sys/block/sdd/queue/max_sectors_kb to some value.
    echo 64 > /sys/block/sdd/queue/max_sectors_kb
I run sg_logs -t /dev/sdd and the value is reset back to its original value.
Other utilities can also trigger udev to run. 

udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[8537223.347520] change   /devices/pci0000:00/0000:00:03.0/0000:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block)
UDEV  [8537223.399243] change   /devices/pci0000:00/0000:00:03.0/0000:08:00.0/host4/port-4:4/end_device-4:4/target4:0:3/4:0:3:0/block/sdd (block)
...
manager->fd_inotify = udev_watch_init(manager->udev);
       sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager);
           on_inotify (systemd source code: src/udev/udevd.c)
              synthesize_change
                    ioctl --> BLKRRPART
                              ----------------------
                              Start of kernel code.
                              ----------------------
                              blkdev_ioctl (block/ioctl.c)
                                   CASE:BLKRRPART: blkdev_reread_part (block/ioctl.c)
                                         _blkdev_reread_part (block/ioctl.c)
                                                rescan_partitions (block/partition-generic.c)
                                                     if (disk->fops->revalidate_disk)
                                                         disk->fops->revalidate_disk(disk);
                                                           ----------------------------------
                                                           sd driver (drivers/scsi/sd.c
                                                           sd_revalidate_disk


Thanks for your input,
Don Brace
ESC - Smart Storage
Microsemi Corporation
Bart Van Assche Aug. 18, 2017, 9:46 p.m. UTC | #3
On Fri, 2017-08-18 at 21:29 +0000, Don Brace wrote:
> As far as I can see, udev looks for file access in sysfs. 

> I am not exactly sure which rule changes this. It was added in more recent

> distros. Can someone help me out?


Hello Don,

Can you check on your test system which udev rule changes max_sectors_kb? I
have checked two recent Linux distro's but haven't been able to find such a
udev rule:
$ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l
0

Thanks,

Bart.
Don Brace Aug. 21, 2017, 7:12 p.m. UTC | #4
> -----Original Message-----

> From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com]

> Sent: Friday, August 18, 2017 4:47 PM

> To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry

> Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara

> <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott

> Benesh <scott.benesh@microsemi.com>; Don Brace

> <don.brace@microsemi.com>; Bader Ali - Saleh

> <bader.alisaleh@microsemi.com>; Kevin Barnett

> <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel

> <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley

> <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com>

> Cc: linux-scsi@vger.kernel.org

> Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

> 

> EXTERNAL EMAIL

> 

> 

> On Fri, 2017-08-18 at 21:29 +0000, Don Brace wrote:

> > As far as I can see, udev looks for file access in sysfs.

> > I am not exactly sure which rule changes this. It was added in more recent

> > distros. Can someone help me out?

> 

> Hello Don,

> 

> Can you check on your test system which udev rule changes

> max_sectors_kb? I

> have checked two recent Linux distro's but haven't been able to find such a

> udev rule:

> $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc -l

> 0

> 

> Thanks,

> 

> Bart.


On my system it is 60-block.rules, and it is the last rule in that rule file.
--
# do not edit this file, it will be overwritten on update

# enable in-kernel media-presence polling
ACTION=="add", SUBSYSTEM=="module", KERNEL=="block", ATTR{parameters/events_dfl_poll_msecs}=="0", \
  ATTR{parameters/events_dfl_poll_msecs}="2000"

# forward scsi device event to corresponding block device
ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", TEST=="block", ATTR{block/*/uevent}="change"

# watch metadata changes, caused by tools closing the device node which was opened for writing
ACTION!="remove", SUBSYSTEM=="block", KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch"
Bart Van Assche Aug. 21, 2017, 7:53 p.m. UTC | #5
On Mon, 2017-08-21 at 19:12 +0000, Don Brace wrote:
> On Friday Bart Van Assche wrote:

> > Can you check on your test system which udev rule changes

> > max_sectors_kb? I

> > have checked two recent Linux distro's but haven't been able to find

> > such a udev rule:

> > $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc

> > -l

> > 0

> > 

> > Thanks,

> > 

> > Bart.

> 

> On my system it is 60-block.rules, and it is the last rule in that rule

> file.

> --

> # do not edit this file, it will be overwritten on update

> 

> # enable in-kernel media-presence polling

> ACTION=="add", SUBSYSTEM=="module", KERNEL=="block",

> ATTR{parameters/events_dfl_poll_msecs}=="0", \

>   ATTR{parameters/events_dfl_poll_msecs}="2000"

> 

> # forward scsi device event to corresponding block device

> ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device",

> TEST=="block", ATTR{block/*/uevent}="change"

> 

> # watch metadata changes, caused by tools closing the device node which

> was opened for writing

> ACTION!="remove", SUBSYSTEM=="block",

> KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch"


Hello Don,

Can you have another look at the udev rules on your test system? The last
rule in 60-block.rules looks like a watch rule to me. The same holds for the
upstream version of that file (https://github.com/systemd/systemd/blob/maste
r/rules/60-block.rules).

Bart.
Don Brace Aug. 21, 2017, 8:14 p.m. UTC | #6
> -----Original Message-----

> From: Bart Van Assche [mailto:Bart.VanAssche@wdc.com]

> Sent: Monday, August 21, 2017 2:53 PM

> To: hch@infradead.org; Viswas G <viswas.g@microsemi.com>; Gerry

> Morong <gerry.morong@microsemi.com>; Mahesh Rajashekhara

> <mahesh.rajashekhara@microsemi.com>; POSWALD@suse.com; Scott

> Benesh <scott.benesh@microsemi.com>; Don Brace

> <don.brace@microsemi.com>; Bader Ali - Saleh

> <bader.alisaleh@microsemi.com>; Kevin Barnett

> <kevin.barnett@microsemi.com>; joseph.szczypek@hpe.com; Scott Teel

> <scott.teel@microsemi.com>; jejb@linux.vnet.ibm.com; Justin Lindley

> <justin.lindley@microsemi.com>; John Hall <John.Hall@microsemi.com>

> Cc: linux-scsi@vger.kernel.org

> Subject: Re: [PATCH] sd: preserve sysfs updates to max_sectors_kb

> 

> EXTERNAL EMAIL

> 

> 

> On Mon, 2017-08-21 at 19:12 +0000, Don Brace wrote:

> > On Friday Bart Van Assche wrote:

> > > Can you check on your test system which udev rule changes

> > > max_sectors_kb? I

> > > have checked two recent Linux distro's but haven't been able to find

> > > such a udev rule:

> > > $ grep -rw max_sectors_kb /usr/lib/udev/rules.d /etc/udev/rules.d | wc

> > > -l

> > > 0

> > >

> > > Thanks,

> > >

> > > Bart.

> >

> > On my system it is 60-block.rules, and it is the last rule in that rule

> > file.

> > --

> > # do not edit this file, it will be overwritten on update

> >

> > # enable in-kernel media-presence polling

> > ACTION=="add", SUBSYSTEM=="module", KERNEL=="block",

> > ATTR{parameters/events_dfl_poll_msecs}=="0", \

> >   ATTR{parameters/events_dfl_poll_msecs}="2000"

> >

> > # forward scsi device event to corresponding block device

> > ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device",

> > TEST=="block", ATTR{block/*/uevent}="change"

> >

> > # watch metadata changes, caused by tools closing the device node which

> > was opened for writing

> > ACTION!="remove", SUBSYSTEM=="block",

> > KERNEL=="loop*|nvme*|sd*|vd*|xvd*|pmem*", OPTIONS+="watch"

> 

> Hello Don,

> 

> Can you have another look at the udev rules on your test system? The last

> rule in 60-block.rules looks like a watch rule to me. The same holds for the

> upstream version of that file

> (https://github.com/systemd/systemd/blob/maste

> r/rules/60-block.rules).

> 

> Bart.


It is a watch rule.

systemd/src/udev/udevd.c
   manager_new
       manager->fd_inotify = udev_watch_init(manager->udev);
       sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager);
           on_inotify (systemd source code: src/udev/udevd.c)
              synthesize_change
                    ioctl --> BLKRRPART

This rule ends up calling BLKRRPART.

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation
Don Brace Aug. 29, 2017, 5:41 p.m. UTC | #7

Don Brace Aug. 29, 2017, 5:42 p.m. UTC | #8

Bart Van Assche Aug. 29, 2017, 6:13 p.m. UTC | #9
On Tue, 2017-08-29 at 17:42 +0000, Don Brace wrote:
> From: Don Brace

> > [ ... ]

> > Hello Don,

> > 

> > Can you have another look at the udev rules on your test system? The last

> > rule in 60-block.rules looks like a watch rule to me. The same holds for the

> > upstream version of that file

> > (https://github.com/systemd/systemd/blob/maste

> > r/rules/60-block.rules).

> > 

> > Bart.

> 

> It is a watch rule.

> 

> systemd/src/udev/udevd.c

>    manager_new

>        manager->fd_inotify = udev_watch_init(manager->udev);

>        sd_event_add_io(manager->event, &manager->inotify_event, manager->fd_inotify, EPOLLIN, on_inotify, manager);

>            on_inotify (systemd source code: src/udev/udevd.c)

>               synthesize_change

>                     ioctl --> BLKRRPART

> 

> This rule ends up calling BLKRRPART.


Hello Don,

Sorry if I'm slow today, but it's not clear to me how the BLKRRPART ioctl
triggers a change of max_sectors_kb? And even if it really is this ioctl that
triggers a change of max_sectors_kb, should the kernel code that handles
max_sectors_kb writes be modified or should rather a udev rule be added that
sets max_sectors_kb to the desired value after each partition rescan?

Thanks,

Bart.
Don Brace Aug. 29, 2017, 6:36 p.m. UTC | #10

Don Brace Aug. 29, 2017, 7:41 p.m. UTC | #11

Bart Van Assche Aug. 29, 2017, 10:25 p.m. UTC | #12
On Tue, 2017-08-29 at 19:41 +0000, Don Brace wrote:
> BLKRRPART ends up in the sd driver function sd_revalidate_disk, which resets

>         q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));

> back to the value from VPD page 0xb0.

> 

> When you cat out max_sectors_kb, it obtains it's valuse from q->limits.max_sectors

> 

> Adding a udev rule... Will there be a delay between the watch rule and the new rule?


Hello Don,

In your original e-mail you wrote that max_sectors_kb was tuned because
of performance reasons. So does it really matter that for a very short
time max_sectors_kb does not have the optimal value?

> One more point, what will I change the value back to? The watch rule 

> changed it and I do not have the original value.


To the value you want to set max_sectors_kb to.

Bart.
Don Brace Aug. 29, 2017, 10:32 p.m. UTC | #13

Bart Van Assche Aug. 29, 2017, 11 p.m. UTC | #14
On Tue, 2017-08-29 at 22:32 +0000, Don Brace wrote:
> Users do set max_sectors_kb for performance reasons. They can increase

> performance by setting max_sectors_kb to different values depending on

> their I/O needs.

> 

> And, if they set the value, and sd_revalidate_disk changes this value, 

> there is no history of what they set the value to. 


Hello Don,

How about asking these users to create a udev rule instead of directly
modifying max_sectors_kb in sysfs?

Bart.
Martin K. Petersen Aug. 30, 2017, 1:24 a.m. UTC | #15
Bart,

> How about asking these users to create a udev rule instead of directly
> modifying max_sectors_kb in sysfs?

I looked at this for a bit last week to see if I could come up with an
elegant way to accommodate values overridden in sysfs and at the same
time honor hardware limits changing. However, it quickly gets messy
since some parameters are under the request_queue and some are scsi_disk
specific. So that involves having override flags several places. Plus
there also the whole re-stacking debacle.

So I think I prefer udev for this.
Martin Wilck Sept. 9, 2017, 10:32 a.m. UTC | #16
Hello Martin,

On Tue, 2017-08-29 at 21:24 -0400, Martin K. Petersen wrote:

> I looked at this for a bit last week to see if I could come up with
> an
> elegant way to accommodate values overridden in sysfs and at the same
> time honor hardware limits changing. However, it quickly gets messy
> since some parameters are under the request_queue and some are
> scsi_disk
> specific. So that involves having override flags several places. Plus
> there also the whole re-stacking debacle.
> 
> So I think I prefer udev for this.

Could you please explain why you think Don's patch is wrong? User
settings being discarded because of a BLKRRPART ioctl violates the
principle of least surprise. With Don's patch, that won't happen any
more. If hardware limits change, whether they increase or decrease, the
patch will also do the Right Thing, AFAICS. Increasing hw limits will
not automatically increase the sw limit, but IMO that's actually
expected.

Best Regards,
Martin
Martin K. Petersen Sept. 28, 2017, 1:37 a.m. UTC | #17
Martin,

> Could you please explain why you think Don's patch is wrong? User
> settings being discarded because of a BLKRRPART ioctl violates the
> principle of least surprise. With Don's patch, that won't happen any
> more. If hardware limits change, whether they increase or decrease, the
> patch will also do the Right Thing, AFAICS. Increasing hw limits will
> not automatically increase the sw limit, but IMO that's actually
> expected.

I have been mulling over this for a while.

I suggest the following...
diff mbox

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bea36ad..457dc7c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3055,6 +3055,7 @@  static int sd_revalidate_disk(struct gendisk *disk)
 	sector_t old_capacity = sdkp->capacity;
 	unsigned char *buffer;
 	unsigned int dev_max, rw_max;
+	unsigned int max_sectors;
 
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
 				      "sd_revalidate_disk\n"));
@@ -3128,9 +3129,14 @@  static int sd_revalidate_disk(struct gendisk *disk)
 		rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
 				      (sector_t)BLK_DEF_MAX_SECTORS);
 
-	/* Combine with controller limits */
-	q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
+	/* Check for max_sectors_kb update through sysfs */
+	if (q->limits.max_sectors < min(rw_max, queue_max_hw_sectors(q)))
+		max_sectors = q->limits.max_sectors;
+	else
+		max_sectors = min(rw_max, queue_max_hw_sectors(q));
 
+	/* Combine with controller limits */
+	q->limits.max_sectors = max_sectors;
 	set_capacity(disk, logical_to_sectors(sdp, sdkp->capacity));
 	sd_config_write_same(sdkp);
 	kfree(buffer);