mbox series

[for-6.1,0/2] iopoll bio pcpu cache fix

Message ID cover.1666886282.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series iopoll bio pcpu cache fix | expand

Message

Pavel Begunkov Oct. 27, 2022, 10:14 p.m. UTC
There are ways to deprive bioset mempool of requests using pcpu caches
and never return them back, which breaks forward progress guarantees bioset
tried to provide. Fix it.

Pavel Begunkov (2):
  mempool: introduce mempool_is_saturated
  bio: don't rob bios from starving bioset

 block/bio.c             | 2 ++
 include/linux/mempool.h | 5 +++++
 2 files changed, 7 insertions(+)

Comments

Jens Axboe Oct. 27, 2022, 11:27 p.m. UTC | #1
On 10/27/22 4:14 PM, Pavel Begunkov wrote:
> There are ways to deprive bioset mempool of requests using pcpu caches
> and never return them back, which breaks forward progress guarantees bioset
> tried to provide. Fix it.
> 
> Pavel Begunkov (2):
>   mempool: introduce mempool_is_saturated
>   bio: don't rob bios from starving bioset
> 
>  block/bio.c             | 2 ++
>  include/linux/mempool.h | 5 +++++
>  2 files changed, 7 insertions(+)
> 

This isn't really a concern for 6.1 and earlier, because the caching is
just for polled IO. Polled IO will not be grabbing any of the reserved
inflight units on the mempool side, which is what guarantees the forward
progress.

It'll be a concern for the 6.2 irq based general caching, so it would
need to be handled there. So perhaps this can be a pre-series for a
reposting of that patchset.
Jens Axboe Oct. 27, 2022, 11:55 p.m. UTC | #2
On 10/27/22 5:27 PM, Jens Axboe wrote:
> On 10/27/22 4:14 PM, Pavel Begunkov wrote:
>> There are ways to deprive bioset mempool of requests using pcpu caches
>> and never return them back, which breaks forward progress guarantees bioset
>> tried to provide. Fix it.
>>
>> Pavel Begunkov (2):
>>   mempool: introduce mempool_is_saturated
>>   bio: don't rob bios from starving bioset
>>
>>  block/bio.c             | 2 ++
>>  include/linux/mempool.h | 5 +++++
>>  2 files changed, 7 insertions(+)
>>
> 
> This isn't really a concern for 6.1 and earlier, because the caching is
> just for polled IO. Polled IO will not be grabbing any of the reserved
> inflight units on the mempool side, which is what guarantees the forward
> progress.
> 
> It'll be a concern for the 6.2 irq based general caching, so it would
> need to be handled there. So perhaps this can be a pre-series for a
> reposting of that patchset.

Just a followup, since we had some out-of-band discussion. This is
a potential concern on the bio side, though not on the request side.
But this approach is racy, we'll figure something else out.
Pavel Begunkov Oct. 28, 2022, 12:13 a.m. UTC | #3
On 10/28/22 00:55, Jens Axboe wrote:
> On 10/27/22 5:27 PM, Jens Axboe wrote:
>> On 10/27/22 4:14 PM, Pavel Begunkov wrote:
>>> There are ways to deprive bioset mempool of requests using pcpu caches
>>> and never return them back, which breaks forward progress guarantees bioset
>>> tried to provide. Fix it.
>>>
>>> Pavel Begunkov (2):
>>>    mempool: introduce mempool_is_saturated
>>>    bio: don't rob bios from starving bioset
>>>
>>>   block/bio.c             | 2 ++
>>>   include/linux/mempool.h | 5 +++++
>>>   2 files changed, 7 insertions(+)
>>>
>>
>> This isn't really a concern for 6.1 and earlier, because the caching is
>> just for polled IO. Polled IO will not be grabbing any of the reserved
>> inflight units on the mempool side, which is what guarantees the forward
>> progress.

And after looking it up, apparently it can allocate from reserves.


>> It'll be a concern for the 6.2 irq based general caching, so it would
>> need to be handled there. So perhaps this can be a pre-series for a
>> reposting of that patchset.
> 
> Just a followup, since we had some out-of-band discussion. This is
> a potential concern on the bio side, though not on the request side.
> But this approach is racy, we'll figure something else out.

I agree that it _may_ be, but I can't think of an counter example
for arches w/ strong ordering and would need to think more if there
is an issue in the looser ephemeral world.
Pavel Begunkov Oct. 28, 2022, 1:16 p.m. UTC | #4
On 10/28/22 01:13, Pavel Begunkov wrote:
> On 10/28/22 00:55, Jens Axboe wrote:
>> On 10/27/22 5:27 PM, Jens Axboe wrote:
>>> On 10/27/22 4:14 PM, Pavel Begunkov wrote:
>>>> There are ways to deprive bioset mempool of requests using pcpu caches
>>>> and never return them back, which breaks forward progress guarantees bioset
>>>> tried to provide. Fix it.
>>>>
>>>> Pavel Begunkov (2):
>>>>    mempool: introduce mempool_is_saturated
>>>>    bio: don't rob bios from starving bioset
>>>>
>>>>   block/bio.c             | 2 ++
>>>>   include/linux/mempool.h | 5 +++++
>>>>   2 files changed, 7 insertions(+)
>>>>
>>>
>>> This isn't really a concern for 6.1 and earlier, because the caching is
>>> just for polled IO. Polled IO will not be grabbing any of the reserved
>>> inflight units on the mempool side, which is what guarantees the forward
>>> progress.
> 
> And after looking it up, apparently it can allocate from reserves.
> 
> 
>>> It'll be a concern for the 6.2 irq based general caching, so it would
>>> need to be handled there. So perhaps this can be a pre-series for a
>>> reposting of that patchset.
>>
>> Just a followup, since we had some out-of-band discussion. This is
>> a potential concern on the bio side, though not on the request side.
>> But this approach is racy, we'll figure something else out.
> 
> I agree that it _may_ be, but I can't think of an counter example
> for arches w/ strong ordering and would need to think more if there
> is an issue in the looser ephemeral world.

Let me try to prove it's not a problem:

It's not interesting if didn't allocate from reserves as it won't affect
mempool. Neither we care about falsely avoiding pcpu cache, i.e. trying
to return into the mempool more than needed. So, the only potential race
problem that requires explanation is when the bio is from reserves but
we see that the mempool is full and it's not. Because of spinlocks
around ->curr_nr modifications and the fact that we won't see random
invented values, it might have only happened if there were both bio
allocations and bio puts strictly after our allocation, after
spin_unlock to be more specific. But then those puts and allocs are
serialised by the mutex, we can take a bio alloc that happened right
after a put that filled the mempool full and recurse all the reasoning.

I'm rusty on formal proves and it can be more formal, but would be
interesting to see if it's flawed and/or get a counter example.
In any case, it only returns more bios into mempool, which won't hurt
the issue.