[0/8] blk-mq: fix request UAF related with iterating over tagset requests

Message ID	20210425085753.2617424-1-ming.lei@redhat.com (mailing list archive)
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Ming Lei <ming.lei@redhat.com> To: linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, "Martin K . Petersen" <martin.petersen@oracle.com>, Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org>, Khazhy Kumykov <khazhy@google.com>, Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>, Hannes Reinecke <hare@suse.de>, John Garry <john.garry@huawei.com>, David Jeffery <djeffery@redhat.com>, Ming Lei <ming.lei@redhat.com> Subject: [PATCH 0/8] blk-mq: fix request UAF related with iterating over tagset requests Date: Sun, 25 Apr 2021 16:57:45 +0800 Message-Id: <20210425085753.2617424-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	blk-mq: fix request UAF related with iterating over tagset requests \| expand [0/8] blk-mq: fix request UAF related with iterating over tagset requests [1/8] Revert "blk-mq: Fix races between blk_mq_update_nr_hw_queues() and iterating over tags" [2/8] Revert "blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list" [3/8] Revert "blk-mq: Fix races between iterating over requests and freeing requests" [4/8] Revert "blk-mq: Introduce atomic variants of blk_mq_(all_tag\|tagset_busy)_iter" [5/8] blk-mq: blk_mq_complete_request_locally [6/8] block: drivers: complete request locally from blk_mq_tagset_busy_iter [7/8] blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter [8/8] blk-mq: clear stale request in tags->rq[] before freeing one request pool

Ming Lei April 25, 2021, 8:57 a.m. UTC

Hi Guys,

Revert 4 patches from Bart which try to fix request UAF issue related
with iterating over tagset wide requests, because:

1) request UAF caused by normal completion vs. async completion during
iterating can't be covered[1]

2) clearing ->rqs[] is added in fast path, which causes performance loss
by 1% according to Bart's test

3) Bart's approach is too complicated, and some changes aren't needed,
such as adding two versions of tagset iteration

This patchset fixes the request UAF issue by one simpler approach,
without any change in fast path.

1) always complete request synchronously when the completing is run
via blk_mq_tagset_busy_iter(), done in 1st two patches

2) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
and release it after calling ->fn, so ->fn won't be called for one
request if its queue is frozen, done in 3rd patch

3) clearing any stale request referred in ->rqs[] before freeing the
request pool, one per-tags spinlock is added for protecting
grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
in bt_tags_iter() is avoided, done in 4th patch.


[1] https://lore.kernel.org/linux-block/YISzLal7Ur7jyuiy@T590/T/#u

Ming Lei (8):
  Revert "blk-mq: Fix races between blk_mq_update_nr_hw_queues() and
    iterating over tags"
  Revert "blk-mq: Make it safe to use RCU to iterate over
    blk_mq_tag_set.tag_list"
  Revert "blk-mq: Fix races between iterating over requests and freeing
    requests"
  Revert "blk-mq: Introduce atomic variants of
    blk_mq_(all_tag|tagset_busy)_iter"
  blk-mq: blk_mq_complete_request_locally
  block: drivers: complete request locally from blk_mq_tagset_busy_iter
  blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter
  blk-mq: clear stale request in tags->rq[] before freeing one request
    pool

 block/blk-core.c                  |  34 +------
 block/blk-mq-tag.c                | 147 ++++++------------------------
 block/blk-mq-tag.h                |   7 +-
 block/blk-mq.c                    | 100 +++++++++++++-------
 block/blk-mq.h                    |   2 +-
 block/blk.h                       |   2 -
 block/elevator.c                  |   1 -
 drivers/block/mtip32xx/mtip32xx.c |   2 +-
 drivers/block/nbd.c               |   2 +-
 drivers/nvme/host/core.c          |   2 +-
 drivers/scsi/hosts.c              |  16 ++--
 drivers/scsi/scsi_lib.c           |   6 +-
 drivers/scsi/ufs/ufshcd.c         |   4 +-
 include/linux/blk-mq.h            |   3 +-
 14 files changed, 119 insertions(+), 209 deletions(-)

Ming Lei April 25, 2021, 9:27 a.m. UTC | #1

On Sun, Apr 25, 2021 at 04:57:45PM +0800, Ming Lei wrote:
> Hi Guys,
> 
> Revert 4 patches from Bart which try to fix request UAF issue related
> with iterating over tagset wide requests, because:
> 
> 1) request UAF caused by normal completion vs. async completion during
> iterating can't be covered[1]
> 
> 2) clearing ->rqs[] is added in fast path, which causes performance loss
> by 1% according to Bart's test
> 
> 3) Bart's approach is too complicated, and some changes aren't needed,
> such as adding two versions of tagset iteration

4) synchronize_rcu() is added before shutting down one request queue,
which may slow down reboot/poweroff very much on big systems with lots of
HBAs in which lots of LUNs are attached.

5) freeing request pool in updating nr_requests isn't covered.

Thanks,
Ming

Jens Axboe April 25, 2021, 4:17 p.m. UTC | #2

On 4/25/21 2:57 AM, Ming Lei wrote:
> Hi Guys,
> 
> Revert 4 patches from Bart which try to fix request UAF issue related
> with iterating over tagset wide requests, because:
> 
> 1) request UAF caused by normal completion vs. async completion during
> iterating can't be covered[1]
> 
> 2) clearing ->rqs[] is added in fast path, which causes performance loss
> by 1% according to Bart's test
> 
> 3) Bart's approach is too complicated, and some changes aren't needed,
> such as adding two versions of tagset iteration
> 
> This patchset fixes the request UAF issue by one simpler approach,
> without any change in fast path.
> 
> 1) always complete request synchronously when the completing is run
> via blk_mq_tagset_busy_iter(), done in 1st two patches
> 
> 2) grab request's ref before calling ->fn in blk_mq_tagset_busy_iter,
> and release it after calling ->fn, so ->fn won't be called for one
> request if its queue is frozen, done in 3rd patch
> 
> 3) clearing any stale request referred in ->rqs[] before freeing the
> request pool, one per-tags spinlock is added for protecting
> grabbing request ref vs. clearing ->rqs[tag], so UAF by refcount_inc_not_zero
> in bt_tags_iter() is avoided, done in 4th patch.

I'm going to pull the UAF series for now so we don't need to do a series
of reverts if we deem this a better approach. I'll take a further look
at it tomorrow.

Bart Van Assche April 25, 2021, 6:39 p.m. UTC | #3

On 4/25/21 9:17 AM, Jens Axboe wrote:
> I'm going to pull the UAF series for now so we don't need to do a series
> of reverts if we deem this a better approach. I'll take a further look
> at it tomorrow.

Please give me the time to post review comments. My opinion about this
series is that (a) the analysis on which some patches are based is wrong
and also (b) that some patches introduce new bugs that are worse than
the current state of the for-next branch and also make it worse than the
v5.11 kernel.

Bart.

Jens Axboe April 25, 2021, 8:18 p.m. UTC | #4

On 4/25/21 12:39 PM, Bart Van Assche wrote:
> On 4/25/21 9:17 AM, Jens Axboe wrote:
>> I'm going to pull the UAF series for now so we don't need to do a series
>> of reverts if we deem this a better approach. I'll take a further look
>> at it tomorrow.
> 
> Please give me the time to post review comments. My opinion about this
> series is that (a) the analysis on which some patches are based is wrong
> and also (b) that some patches introduce new bugs that are worse than
> the current state of the for-next branch and also make it worse than the
> v5.11 kernel.

Just to be clear, I did not apply this series, "pull the UAF series" just
meant un-applying your series so we have a sane baseline for something
that everybody likes. There's plenty of time to figure this out for
5.13.

Bart Van Assche April 25, 2021, 8:53 p.m. UTC | #5

On 4/25/21 2:27 AM, Ming Lei wrote:
> On Sun, Apr 25, 2021 at 04:57:45PM +0800, Ming Lei wrote:
>> Revert 4 patches from Bart which try to fix request UAF issue related
>> with iterating over tagset wide requests, because:

Where were you during the four weeks that my patch series was out for
review? I haven't seen any feedback from you on my patch series.

>> 1) request UAF caused by normal completion vs. async completion during
>> iterating can't be covered[1]

I do not agree with the above. Patches 5/8 and 6/8 from this series can
be applied without reverting any of my patches.

> 4) synchronize_rcu() is added before shutting down one request queue,
> which may slow down reboot/poweroff very much on big systems with lots of
> HBAs in which lots of LUNs are attached.

The synchronize_rcu() can be removed by using a semaphore
(<linux/semaphore.h>) instead of an RCU reader lock inside bt_tags_iter().

> 5) freeing request pool in updating nr_requests isn't covered.

This can be addressed easily on top of my patch series.

Bart.

Ming Lei April 26, 2021, 1:19 a.m. UTC | #6

On Sun, Apr 25, 2021 at 01:53:16PM -0700, Bart Van Assche wrote:
> On 4/25/21 2:27 AM, Ming Lei wrote:
> > On Sun, Apr 25, 2021 at 04:57:45PM +0800, Ming Lei wrote:
> >> Revert 4 patches from Bart which try to fix request UAF issue related
> >> with iterating over tagset wide requests, because:
> 
> Where were you during the four weeks that my patch series was out for
> review? I haven't seen any feedback from you on my patch series.

To be honest, it is just two days ago I have to take a close look
at your patchset because we may have to backport your patches for
addressing one RH report with high priority.

David is in CC list, and Laurence/David is looking the report too.

> 
> >> 1) request UAF caused by normal completion vs. async completion during
> >> iterating can't be covered[1]
> 
> I do not agree with the above. Patches 5/8 and 6/8 from this series can
> be applied without reverting any of my patches.

The thing is that 5 ~ 8 can fix the issue in a simpler way without
adding extra cost in fast path, and the idea is easier to be proved.

BTW, as a downstream kernel developer, I really hope all fix are simple and
easy to backport. More importantly, I do prefer to approaches in patch which
can be proved/verified easily, so further regression can be avoided.

> 
> > 4) synchronize_rcu() is added before shutting down one request queue,
> > which may slow down reboot/poweroff very much on big systems with lots of
> > HBAs in which lots of LUNs are attached.
> 
> The synchronize_rcu() can be removed by using a semaphore
> (<linux/semaphore.h>) instead of an RCU reader lock inside bt_tags_iter().

I am not sure you can, because some iteration is done in atomic context.

Thanks,
Ming

Bart Van Assche April 26, 2021, 1:57 a.m. UTC | #7

On 4/25/21 6:19 PM, Ming Lei wrote:
> On Sun, Apr 25, 2021 at 01:53:16PM -0700, Bart Van Assche wrote:
>> On 4/25/21 2:27 AM, Ming Lei wrote:
>>> 4) synchronize_rcu() is added before shutting down one request queue,
>>> which may slow down reboot/poweroff very much on big systems with lots of
>>> HBAs in which lots of LUNs are attached.
>>
>> The synchronize_rcu() can be removed by using a semaphore
>> (<linux/semaphore.h>) instead of an RCU reader lock inside bt_tags_iter().
> 
> I am not sure you can, because some iteration is done in atomic context.

I meant <linux/rwlock.h>. The functions like read_lock_irq() that are
declared in that header file are appropriate for atomic context.

Bart.

[0/8] blk-mq: fix request UAF related with iterating over tagset requests

Message

Comments