mbox series

[0/9] Improve I/O priority support

Message ID 20210527010134.32448-1-bvanassche@acm.org (mailing list archive)
Headers show
Series Improve I/O priority support | expand

Message

Bart Van Assche May 27, 2021, 1:01 a.m. UTC
Hi Jens,

A feature that is missing from the Linux kernel for storage devices that
support I/O priorities is to set the I/O priority in requests involving page
cache writeback. Since the identity of the process that triggers page cache
writeback is not known in the writeback code, the priority set by ioprio_set()
is ignored. However, an I/O cgroup is associated with writeback requests
by certain filesystems. Hence this patch series that implements the following
changes for the mq-deadline scheduler:
* Make the I/O priority configurable per I/O cgroup.
* Change the I/O priority of requests to the lower of (request I/O priority,
  cgroup I/O priority).
* Introduce one queue per I/O priority in the mq-deadline scheduler.

Please consider this patch series for kernel v5.14.

Thanks,

Bart.

Bart Van Assche (9):
  block/mq-deadline: Add several comments
  block/mq-deadline: Add two lockdep_assert_held() statements
  block/mq-deadline: Remove two local variables
  block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
  block/mq-deadline: Improve compile-time argument checking
  block/mq-deadline: Reduce the read expiry time for non-rotational
    media
  block/mq-deadline: Reserve 25% of tags for synchronous requests
  block/mq-deadline: Add I/O priority support
  block/mq-deadline: Add cgroup support

 block/Kconfig.iosched      |   6 +
 block/Makefile             |   1 +
 block/mq-deadline-cgroup.c | 206 +++++++++++++++
 block/mq-deadline-cgroup.h |  73 ++++++
 block/mq-deadline.c        | 524 +++++++++++++++++++++++++++++--------
 5 files changed, 695 insertions(+), 115 deletions(-)
 create mode 100644 block/mq-deadline-cgroup.c
 create mode 100644 block/mq-deadline-cgroup.h

Comments

Wang Jianchao May 27, 2021, 6:25 a.m. UTC | #1
On 2021/5/27 9:01 AM, Bart Van Assche wrote:
> Hi Jens,
> 
> A feature that is missing from the Linux kernel for storage devices that
> support I/O priorities is to set the I/O priority in requests involving page
> cache writeback. Since the identity of the process that triggers page cache
> writeback is not known in the writeback code, the priority set by ioprio_set()
> is ignored. However, an I/O cgroup is associated with writeback requests
> by certain filesystems. Hence this patch series that implements the following
> changes for the mq-deadline scheduler:
Hi Bart

How about implement this feature based on the rq-qos framework ?
Maybe it is better to make mq-deadline a pure io-scheduler.

Best regards
Jianchao

> * Make the I/O priority configurable per I/O cgroup.
> * Change the I/O priority of requests to the lower of (request I/O priority,
>   cgroup I/O priority).
> * Introduce one queue per I/O priority in the mq-deadline scheduler.
> 
> Please consider this patch series for kernel v5.14.
> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (9):
>   block/mq-deadline: Add several comments
>   block/mq-deadline: Add two lockdep_assert_held() statements
>   block/mq-deadline: Remove two local variables
>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>   block/mq-deadline: Improve compile-time argument checking
>   block/mq-deadline: Reduce the read expiry time for non-rotational
>     media
>   block/mq-deadline: Reserve 25% of tags for synchronous requests
>   block/mq-deadline: Add I/O priority support
>   block/mq-deadline: Add cgroup support
> 
>  block/Kconfig.iosched      |   6 +
>  block/Makefile             |   1 +
>  block/mq-deadline-cgroup.c | 206 +++++++++++++++
>  block/mq-deadline-cgroup.h |  73 ++++++
>  block/mq-deadline.c        | 524 +++++++++++++++++++++++++++++--------
>  5 files changed, 695 insertions(+), 115 deletions(-)
>  create mode 100644 block/mq-deadline-cgroup.c
>  create mode 100644 block/mq-deadline-cgroup.h
>
Wang Jianchao May 27, 2021, 8:05 a.m. UTC | #2
On 2021/5/27 2:25 PM, Wang Jianchao wrote:
> 
> 
> On 2021/5/27 9:01 AM, Bart Van Assche wrote:
>> Hi Jens,
>>
>> A feature that is missing from the Linux kernel for storage devices that
>> support I/O priorities is to set the I/O priority in requests involving page
>> cache writeback. Since the identity of the process that triggers page cache
>> writeback is not known in the writeback code, the priority set by ioprio_set()
>> is ignored. However, an I/O cgroup is associated with writeback requests
>> by certain filesystems. Hence this patch series that implements the following
>> changes for the mq-deadline scheduler:
> Hi Bart
> 
> How about implement this feature based on the rq-qos framework ?
> Maybe it is better to make mq-deadline a pure io-scheduler.

In addition, it is more flexible to combine different io-scheduler and rq-qos policy
if we take io priority as a new policy ;)

> 
> Best regards
> Jianchao
> 
>> * Make the I/O priority configurable per I/O cgroup.
>> * Change the I/O priority of requests to the lower of (request I/O priority,
>>   cgroup I/O priority).
>> * Introduce one queue per I/O priority in the mq-deadline scheduler.
>>
>> Please consider this patch series for kernel v5.14.
>>
>> Thanks,
>>
>> Bart.
>>
>> Bart Van Assche (9):
>>   block/mq-deadline: Add several comments
>>   block/mq-deadline: Add two lockdep_assert_held() statements
>>   block/mq-deadline: Remove two local variables
>>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>>   block/mq-deadline: Improve compile-time argument checking
>>   block/mq-deadline: Reduce the read expiry time for non-rotational
>>     media
>>   block/mq-deadline: Reserve 25% of tags for synchronous requests
>>   block/mq-deadline: Add I/O priority support
>>   block/mq-deadline: Add cgroup support
>>
>>  block/Kconfig.iosched      |   6 +
>>  block/Makefile             |   1 +
>>  block/mq-deadline-cgroup.c | 206 +++++++++++++++
>>  block/mq-deadline-cgroup.h |  73 ++++++
>>  block/mq-deadline.c        | 524 +++++++++++++++++++++++++++++--------
>>  5 files changed, 695 insertions(+), 115 deletions(-)
>>  create mode 100644 block/mq-deadline-cgroup.c
>>  create mode 100644 block/mq-deadline-cgroup.h
>>
Johannes Thumshirn May 27, 2021, 8:56 a.m. UTC | #3
On 27/05/2021 03:01, Bart Van Assche wrote:
> Hi Jens,
> 
> A feature that is missing from the Linux kernel for storage devices that
> support I/O priorities is to set the I/O priority in requests involving page
> cache writeback. Since the identity of the process that triggers page cache
> writeback is not known in the writeback code, the priority set by ioprio_set()
> is ignored. However, an I/O cgroup is associated with writeback requests
> by certain filesystems. Hence this patch series that implements the following
> changes for the mq-deadline scheduler:
> * Make the I/O priority configurable per I/O cgroup.
> * Change the I/O priority of requests to the lower of (request I/O priority,
>   cgroup I/O priority).
> * Introduce one queue per I/O priority in the mq-deadline scheduler.
> 
> Please consider this patch series for kernel v5.14.
> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (9):
>   block/mq-deadline: Add several comments
>   block/mq-deadline: Add two lockdep_assert_held() statements
>   block/mq-deadline: Remove two local variables
>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>   block/mq-deadline: Improve compile-time argument checking
>

I think the above 5 patches can go in independently as cleanups.
Chaitanya Kulkarni May 27, 2021, 5:23 p.m. UTC | #4
On 5/27/21 01:56, Johannes Thumshirn wrote:
> On 27/05/2021 03:01, Bart Van Assche wrote:
>> Hi Jens,
>>
>> A feature that is missing from the Linux kernel for storage devices that
>> support I/O priorities is to set the I/O priority in requests involving page
>> cache writeback. Since the identity of the process that triggers page cache
>> writeback is not known in the writeback code, the priority set by ioprio_set()
>> is ignored. However, an I/O cgroup is associated with writeback requests
>> by certain filesystems. Hence this patch series that implements the following
>> changes for the mq-deadline scheduler:
>> * Make the I/O priority configurable per I/O cgroup.
>> * Change the I/O priority of requests to the lower of (request I/O priority,
>>   cgroup I/O priority).
>> * Introduce one queue per I/O priority in the mq-deadline scheduler.
>>
>> Please consider this patch series for kernel v5.14.
>>
>> Thanks,
>>
>> Bart.
>>
>> Bart Van Assche (9):
>>   block/mq-deadline: Add several comments
>>   block/mq-deadline: Add two lockdep_assert_held() statements
>>   block/mq-deadline: Remove two local variables
>>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>>   block/mq-deadline: Improve compile-time argument checking
>>
> I think the above 5 patches can go in independently as cleanups.
>

Yes they seems purely cleanup from review point of view.
Adam Manzanares May 27, 2021, 5:58 p.m. UTC | #5
On Wed, 2021-05-26 at 18:01 -0700, Bart Van Assche wrote:
> Hi Jens,
> 
> A feature that is missing from the Linux kernel for storage devices
> that
> support I/O priorities is to set the I/O priority in requests
> involving page
> cache writeback. 

Hello Bart,

When I worked in this area the goal was to control tail latencies for a
prioritized app. Once the page cache is involved app control over IO is
handed off suggesting that the priorities passed down to the device
aren't as meaningful anymore. 

Is passing the priority to the device making an impact to performance
with your test case? If not, could BFQ be seen as a viable alternative.

Take care,
Adam

> Since the identity of the process that triggers page cache
> writeback is not known in the writeback code, the priority set by
> ioprio_set()
> is ignored. However, an I/O cgroup is associated with writeback
> requests
> by certain filesystems. Hence this patch series that implements the
> following
> changes for the mq-deadline scheduler:
> * Make the I/O priority configurable per I/O cgroup.
> * Change the I/O priority of requests to the lower of (request I/O
> priority,
>   cgroup I/O priority).
> * Introduce one queue per I/O priority in the mq-deadline scheduler.
> 
> Please consider this patch series for kernel v5.14.
> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (9):
>   block/mq-deadline: Add several comments
>   block/mq-deadline: Add two lockdep_assert_held() statements
>   block/mq-deadline: Remove two local variables
>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>   block/mq-deadline: Improve compile-time argument checking
>   block/mq-deadline: Reduce the read expiry time for non-rotational
>     media
>   block/mq-deadline: Reserve 25% of tags for synchronous requests
>   block/mq-deadline: Add I/O priority support
>   block/mq-deadline: Add cgroup support
> 
>  block/Kconfig.iosched      |   6 +
>  block/Makefile             |   1 +
>  block/mq-deadline-cgroup.c | 206 +++++++++++++++
>  block/mq-deadline-cgroup.h |  73 ++++++
>  block/mq-deadline.c        | 524 +++++++++++++++++++++++++++++------
> --
>  5 files changed, 695 insertions(+), 115 deletions(-)
>  create mode 100644 block/mq-deadline-cgroup.c
>  create mode 100644 block/mq-deadline-cgroup.h
>
Bart Van Assche May 27, 2021, 6:40 p.m. UTC | #6
On 5/27/21 1:05 AM, Wang Jianchao wrote:
> On 2021/5/27 2:25 PM, Wang Jianchao wrote:
>> On 2021/5/27 9:01 AM, Bart Van Assche wrote:
>>> A feature that is missing from the Linux kernel for storage devices that
>>> support I/O priorities is to set the I/O priority in requests involving page
>>> cache writeback. Since the identity of the process that triggers page cache
>>> writeback is not known in the writeback code, the priority set by ioprio_set()
>>> is ignored. However, an I/O cgroup is associated with writeback requests
>>> by certain filesystems. Hence this patch series that implements the following
>>> changes for the mq-deadline scheduler:
>>
>> How about implement this feature based on the rq-qos framework ?
>> Maybe it is better to make mq-deadline a pure io-scheduler.
> 
> In addition, it is more flexible to combine different io-scheduler and rq-qos policy
> if we take io priority as a new policy ;)

Hi Jianchao,

That's an interesting question. I'm in favor of flexibility. However,
I'm not sure whether it would be possible to combine an rq-qos policy
that submits high priority requests first with an I/O scheduler that
ignores I/O priorities. Since a typical I/O scheduler reorders requests,
such a combination would lead to requests being submitted in the wrong
order to storage devices. In other words, when using an I/O scheduler,
proper support for I/O priority in the I/O scheduler is essential. The
BFQ I/O scheduler is still maturing. The purpose of the Kyber I/O
scheduler is to control latency by reducing the queue depth without
differentiating between requests of the same type. The mq-deadline
scheduler is already being used in combination with storage devices that
support I/O priorities in their controller. Hence the choice for the
mq-deadline scheduler.

Thanks,

Bart.
Bart Van Assche May 27, 2021, 6:53 p.m. UTC | #7
On 5/27/21 10:58 AM, Adam Manzanares wrote:
> On Wed, 2021-05-26 at 18:01 -0700, Bart Van Assche wrote:
>> A feature that is missing from the Linux kernel for storage
>> devices that support I/O priorities is to set the I/O priority in
>> requests involving page cache writeback.
> 
> When I worked in this area the goal was to control tail latencies for
> a prioritized app. Once the page cache is involved app control over
> IO is handed off suggesting that the priorities passed down to the
> device aren't as meaningful anymore.
> 
> Is passing the priority to the device making an impact to
> performance with your test case? If not, could BFQ be seen as a
> viable alternative.

Hi Adam,

As we all know complexity is the enemy of reliability. BFQ is
complicated so I am hesitant to use BFQ in a context where reliability
is important. Additionally, the BFQ scheduler uses heuristics to detect
the application type. As became clear recently, there heuristics can be
misled easily and fixing this is nontrivial (see also
https://lore.kernel.org/linux-block/20210521131034.GL18952@quack2.suse.cz/).
I want the application launcher to create one cgroup for foreground
applications and another cgroup for background applications. Configuring
different I/O priorities per cgroup is an easy and reliable approach to
communicate information about the application type and latency
expectations to the block layer.

Some database applications use buffered I/O and flush that data by
calling fsync(). We want to support such applications. So it seems like
we have a different opinion about whether or not an I/O priority should
be assigned to I/O that is the result of page cache writeback?

Thanks,

Bart.
Bart Van Assche May 27, 2021, 7 p.m. UTC | #8
On 5/27/21 10:23 AM, Chaitanya Kulkarni wrote:
> On 5/27/21 01:56, Johannes Thumshirn wrote:
>> On 27/05/2021 03:01, Bart Van Assche wrote:
>>> Bart Van Assche (9):
>>>   block/mq-deadline: Add several comments
>>>   block/mq-deadline: Add two lockdep_assert_held() statements
>>>   block/mq-deadline: Remove two local variables
>>>   block/mq-deadline: Rename dd_init_queue() and dd_exit_queue()
>>>   block/mq-deadline: Improve compile-time argument checking
>>>
>> I think the above 5 patches can go in independently as cleanups.
> 
> Yes they seems purely cleanup from review point of view.

That's right. This patch series has been organized such that the cleanup
patches occur first.

I have posted all 9 patches as a series to make it easier to review and
apply all these patches.

Thanks,

Bart.
Adam Manzanares May 27, 2021, 10:45 p.m. UTC | #9
On Thu, 2021-05-27 at 11:53 -0700, Bart Van Assche wrote:
> On 5/27/21 10:58 AM, Adam Manzanares wrote:
> > On Wed, 2021-05-26 at 18:01 -0700, Bart Van Assche wrote:
> > > A feature that is missing from the Linux kernel for storage
> > > devices that support I/O priorities is to set the I/O priority in
> > > requests involving page cache writeback.
> > 
> > When I worked in this area the goal was to control tail latencies
> > for
> > a prioritized app. Once the page cache is involved app control over
> > IO is handed off suggesting that the priorities passed down to the
> > device aren't as meaningful anymore.
> > 
> > Is passing the priority to the device making an impact to
> > performance with your test case? If not, could BFQ be seen as a
> > viable alternative.
> 
> Hi Adam,
> 
> As we all know complexity is the enemy of reliability. BFQ is
> complicated so I am hesitant to use BFQ in a context where
> reliability
> is important. Additionally, the BFQ scheduler uses heuristics to
> detect
> the application type. As became clear recently, there heuristics can
> be
> misled easily and fixing this is nontrivial (see also
> https://lore.kernel.org/linux-block/20210521131034.GL18952@quack2.suse.cz/
> ).
> I want the application launcher to create one cgroup for foreground
> applications and another cgroup for background applications.
> Configuring
> different I/O priorities per cgroup is an easy and reliable approach
> to
> communicate information about the application type and latency
> expectations to the block layer.

Ok, now I know why you are staying away from BFQ. Thanks for sharing
this information.

> 
> Some database applications use buffered I/O and flush that data by
> calling fsync(). We want to support such applications. So it seems
> like
> we have a different opinion about whether or not an I/O priority
> should
> be assigned to I/O that is the result of page cache writeback?

Not necessarily, I am just a bit cautious because in my experience
prioritized I/O has benefits as well as limitations. You just have to
make sure that the page cache writeback does not push the hardware to a
point where the I/O prioritization is no longer beneficial.  

> Thanks,
> 
> Bart.
>
Wang Jianchao May 28, 2021, 2:05 a.m. UTC | #10
On 2021/5/28 2:40 AM, Bart Van Assche wrote:
> On 5/27/21 1:05 AM, Wang Jianchao wrote:
>> On 2021/5/27 2:25 PM, Wang Jianchao wrote:
>>> On 2021/5/27 9:01 AM, Bart Van Assche wrote:
>>>> A feature that is missing from the Linux kernel for storage devices that
>>>> support I/O priorities is to set the I/O priority in requests involving page
>>>> cache writeback. Since the identity of the process that triggers page cache
>>>> writeback is not known in the writeback code, the priority set by ioprio_set()
>>>> is ignored. However, an I/O cgroup is associated with writeback requests
>>>> by certain filesystems. Hence this patch series that implements the following
>>>> changes for the mq-deadline scheduler:
>>>
>>> How about implement this feature based on the rq-qos framework ?
>>> Maybe it is better to make mq-deadline a pure io-scheduler.
>>
>> In addition, it is more flexible to combine different io-scheduler and rq-qos policy
>> if we take io priority as a new policy ;)
> 
> Hi Jianchao,
> 
> That's an interesting question. I'm in favor of flexibility. However,
> I'm not sure whether it would be possible to combine an rq-qos policy
> that submits high priority requests first with an I/O scheduler that
> ignores I/O priorities. Since a typical I/O scheduler reorders requests,
> such a combination would lead to requests being submitted in the wrong
> order to storage devices. In other words, when using an I/O scheduler,> proper support for I/O priority in the I/O scheduler is essential. The

Hi Bart,

Does it really matter that issue IO request by the order of io priority ?

Given a device with a 32 depth queue and doesn't support io priority, even if
we issue the request by the order of io priority, will the fw of device handle
them by the same order ? Or in the other word, we always issue 32 io requests
to device one time and then the fw of device decides how to handle them.
The order of dispatching request from queue may only work when the device's
queue is full and we have a deeper queue in io scheduler. And at this moment,
maybe the user needs to check why their application saturates the block device.

As for the qos policy of io priority, it seems similar with wbt in which read,
sync write and background write have different priority. Since we always want
the io with higher priority to be served more by the device, adapting the depth
of queue of different io priority may work ;)

Best regards
Jianchao

> BFQ I/O scheduler is still maturing. The purpose of the Kyber I/O
> scheduler is to control latency by reducing the queue depth without
> differentiating between requests of the same type. The mq-deadline
> scheduler is already being used in combination with storage devices that
> support I/O priorities in their controller. Hence the choice for the
> mq-deadline scheduler.
> 
> Thanks,
> 
> Bart.
>
Paolo Valente May 28, 2021, 8:43 a.m. UTC | #11
> Il giorno 28 mag 2021, alle ore 04:05, Wang Jianchao <jianchao.wan9@gmail.com> ha scritto:
> 
> 
> 
> On 2021/5/28 2:40 AM, Bart Van Assche wrote:
>> On 5/27/21 1:05 AM, Wang Jianchao wrote:
>>> On 2021/5/27 2:25 PM, Wang Jianchao wrote:
>>>> On 2021/5/27 9:01 AM, Bart Van Assche wrote:
>>>>> A feature that is missing from the Linux kernel for storage devices that
>>>>> support I/O priorities is to set the I/O priority in requests involving page
>>>>> cache writeback. Since the identity of the process that triggers page cache
>>>>> writeback is not known in the writeback code, the priority set by ioprio_set()
>>>>> is ignored. However, an I/O cgroup is associated with writeback requests
>>>>> by certain filesystems. Hence this patch series that implements the following
>>>>> changes for the mq-deadline scheduler:
>>>> 
>>>> How about implement this feature based on the rq-qos framework ?
>>>> Maybe it is better to make mq-deadline a pure io-scheduler.
>>> 
>>> In addition, it is more flexible to combine different io-scheduler and rq-qos policy
>>> if we take io priority as a new policy ;)
>> 
>> Hi Jianchao,
>> 
>> That's an interesting question. I'm in favor of flexibility. However,
>> I'm not sure whether it would be possible to combine an rq-qos policy
>> that submits high priority requests first with an I/O scheduler that
>> ignores I/O priorities. Since a typical I/O scheduler reorders requests,
>> such a combination would lead to requests being submitted in the wrong
>> order to storage devices. In other words, when using an I/O scheduler,> proper support for I/O priority in the I/O scheduler is essential. The
> 
> Hi Bart,
> 
> Does it really matter that issue IO request by the order of io priority ?
> 
> Given a device with a 32 depth queue and doesn't support io priority, even if
> we issue the request by the order of io priority, will the fw of device handle
> them by the same order ? Or in the other word, we always issue 32 io requests
> to device one time and then the fw of device decides how to handle them.
> The order of dispatching request from queue may only work when the device's
> queue is full and we have a deeper queue in io scheduler. And at this moment,
> maybe the user needs to check why their application saturates the block device.
> 
> As for the qos policy of io priority, it seems similar with wbt in which read,
> sync write and background write have different priority. Since we always want
> the io with higher priority to be served more by the device, adapting the depth
> of queue of different io priority may work ;)
> 

That's exactly what BFQ does, as this is the only option to truly
control I/O in a device with internal queueing :)

Thanks,
Paolo

> Best regards
> Jianchao
> 
>> BFQ I/O scheduler is still maturing. The purpose of the Kyber I/O
>> scheduler is to control latency by reducing the queue depth without
>> differentiating between requests of the same type. The mq-deadline
>> scheduler is already being used in combination with storage devices that
>> support I/O priorities in their controller. Hence the choice for the
>> mq-deadline scheduler.
>> 
>> Thanks,
>> 
>> Bart.
Bart Van Assche May 28, 2021, 4:28 p.m. UTC | #12
On 5/27/21 7:05 PM, Wang Jianchao wrote:
> Does it really matter that issue IO request by the order of io priority ?
> 
> Given a device with a 32 depth queue and doesn't support io priority, even if
> we issue the request by the order of io priority, will the fw of device handle
> them by the same order ? Or in the other word, we always issue 32 io requests
> to device one time and then the fw of device decides how to handle them.
> The order of dispatching request from queue may only work when the device's
> queue is full and we have a deeper queue in io scheduler. And at this moment,
> maybe the user needs to check why their application saturates the block device.
> 
> As for the qos policy of io priority, it seems similar with wbt in which read,
> sync write and background write have different priority. Since we always want
> the io with higher priority to be served more by the device, adapting the depth
> of queue of different io priority may work ;)

Hi Jianchao,

Our conclusion from the extensive measurements we ran is that we cannot
reach our latency goals for high-priority I/O without I/O priority
support in the storage device. This is why we are working with device
vendors on implementing I/O priority support in the storage devices that
matter to us.

Thanks,

Bart.