[v2] virtio_blk: implement init_hctx MQ operation

Message ID	20240807224129.34237-1-mgurtovoy@nvidia.com (mailing list archive)
State	New, archived
Headers	show Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2086.outbound.protection.outlook.com [40.107.243.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B30F84037; Wed, 7 Aug 2024 22:41:49 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C From: Max Gurtovoy <mgurtovoy@nvidia.com> To: <stefanha@redhat.com>, <virtualization@lists.linux.dev>, <mst@redhat.com>, <axboe@kernel.dk> CC: <kvm@vger.kernel.org>, <linux-block@vger.kernel.org>, <oren@nvidia.com>, Max Gurtovoy <mgurtovoy@nvidia.com> Subject: [PATCH v2] virtio_blk: implement init_hctx MQ operation Date: Thu, 8 Aug 2024 01:41:29 +0300 Message-ID: <20240807224129.34237-1-mgurtovoy@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain
Series	[v2] virtio_blk: implement init_hctx MQ operation \| expand [v2] virtio_blk: implement init_hctx MQ operation

Max Gurtovoy Aug. 7, 2024, 10:41 p.m. UTC

Set the driver data of the hardware context (hctx) to point directly to
the virtio block queue. This cleanup improves code readability and
reduces the number of dereferences in the fast path.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 drivers/block/virtio_blk.c | 42 ++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

John Garry Aug. 8, 2024, 6:53 a.m. UTC | #1

On 07/08/2024 23:41, Max Gurtovoy wrote:

Feel free to add:
Reviewed-by: John Garry <john.g.garry@oracle.com>

Jens Axboe Aug. 8, 2024, 1:37 p.m. UTC | #2

On 8/7/24 4:41 PM, Max Gurtovoy wrote:
> Set the driver data of the hardware context (hctx) to point directly to
> the virtio block queue. This cleanup improves code readability and
> reduces the number of dereferences in the fast path.

Looks good, and that is the idiomatic way to do this.

Reviewed-by: Jens Axboe <axboe@kernel.dk>

Christoph Hellwig Aug. 12, 2024, 11:15 a.m. UTC | #3

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Marek Szyprowski Sept. 12, 2024, 6:46 a.m. UTC | #4

Dear All,

On 08.08.2024 00:41, Max Gurtovoy wrote:
> Set the driver data of the hardware context (hctx) to point directly to
> the virtio block queue. This cleanup improves code readability and
> reduces the number of dereferences in the fast path.
>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>   drivers/block/virtio_blk.c | 42 ++++++++++++++++++++------------------
>   1 file changed, 22 insertions(+), 20 deletions(-)

This patch landed in recent linux-next as commit 8d04556131c1 
("virtio_blk: implement init_hctx MQ operation"). In my tests I found 
that it introduces a regression in system suspend/resume operation. From 
time to time system crashes during suspend/resume cycle. Reverting this 
patch on top of next-20240911 fixes this problem.

I've even managed to catch a kernel panic log of this problem on QEMU's 
ARM64 'virt' machine:

root@target:~# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 12 07:11:52 2024
Unable to handle kernel NULL pointer dereference at virtual address 
0000000000000090
Mem abort info:
   ESR = 0x0000000096000046
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x06: level 2 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046bbb000
...
Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0H Not tainted 6.11.0-rc6+ #9024
Hardware name: linux,dummy-virt (DT)
Workqueue: kblockd blk_mq_requeue_work
pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : virtqueue_add_split+0x458/0x63c
lr : virtqueue_add_split+0x1d0/0x63c
...
Call trace:
  virtqueue_add_split+0x458/0x63c
  virtqueue_add_sgs+0xc4/0xec
  virtblk_add_req+0x8c/0xf4
  virtio_queue_rq+0x6c/0x1bc
  blk_mq_dispatch_rq_list+0x21c/0x714
  __blk_mq_sched_dispatch_requests+0xb4/0x58c
  blk_mq_sched_dispatch_requests+0x30/0x6c
  blk_mq_run_hw_queue+0x14c/0x40c
  blk_mq_run_hw_queues+0x64/0x124
  blk_mq_requeue_work+0x188/0x1bc
  process_one_work+0x20c/0x608
  worker_thread+0x238/0x370
  kthread+0x124/0x128
  ret_from_fork+0x10/0x20
Code: f9404282 79401c21 b9004a81 f94047e1 (f8206841)
---[ end trace 0000000000000000 ]---
note: kworker/0:0H[9] exited with irqs disabled
note: kworker/0:0H[9] exited with preempt_count 1


> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2351f411fa46..35a7a586f6f5 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -129,14 +129,6 @@ static inline blk_status_t virtblk_result(u8 status)
>   	}
>   }
>   
> -static inline struct virtio_blk_vq *get_virtio_blk_vq(struct blk_mq_hw_ctx *hctx)
> -{
> -	struct virtio_blk *vblk = hctx->queue->queuedata;
> -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> -
> -	return vq;
> -}
> -
>   static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr)
>   {
>   	struct scatterlist out_hdr, in_hdr, *sgs[3];
> @@ -377,8 +369,7 @@ static void virtblk_done(struct virtqueue *vq)
>   
>   static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>   {
> -	struct virtio_blk *vblk = hctx->queue->queuedata;
> -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> +	struct virtio_blk_vq *vq = hctx->driver_data;
>   	bool kick;
>   
>   	spin_lock_irq(&vq->lock);
> @@ -428,10 +419,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>   			   const struct blk_mq_queue_data *bd)
>   {
>   	struct virtio_blk *vblk = hctx->queue->queuedata;
> +	struct virtio_blk_vq *vq = hctx->driver_data;
>   	struct request *req = bd->rq;
>   	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>   	unsigned long flags;
> -	int qid = hctx->queue_num;
>   	bool notify = false;
>   	blk_status_t status;
>   	int err;
> @@ -440,26 +431,26 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	if (unlikely(status))
>   		return status;
>   
> -	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> -	err = virtblk_add_req(vblk->vqs[qid].vq, vbr);
> +	spin_lock_irqsave(&vq->lock, flags);
> +	err = virtblk_add_req(vq->vq, vbr);
>   	if (err) {
> -		virtqueue_kick(vblk->vqs[qid].vq);
> +		virtqueue_kick(vq->vq);
>   		/* Don't stop the queue if -ENOMEM: we may have failed to
>   		 * bounce the buffer due to global resource outage.
>   		 */
>   		if (err == -ENOSPC)
>   			blk_mq_stop_hw_queue(hctx);
> -		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> +		spin_unlock_irqrestore(&vq->lock, flags);
>   		virtblk_unmap_data(req, vbr);
>   		return virtblk_fail_to_queue(req, err);
>   	}
>   
> -	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
> +	if (bd->last && virtqueue_kick_prepare(vq->vq))
>   		notify = true;
> -	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> +	spin_unlock_irqrestore(&vq->lock, flags);
>   
>   	if (notify)
> -		virtqueue_notify(vblk->vqs[qid].vq);
> +		virtqueue_notify(vq->vq);
>   	return BLK_STS_OK;
>   }
>   
> @@ -504,7 +495,7 @@ static void virtio_queue_rqs(struct request **rqlist)
>   	struct request *requeue_list = NULL;
>   
>   	rq_list_for_each_safe(rqlist, req, next) {
> -		struct virtio_blk_vq *vq = get_virtio_blk_vq(req->mq_hctx);
> +		struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
>   		bool kick;
>   
>   		if (!virtblk_prep_rq_batch(req)) {
> @@ -1164,6 +1155,16 @@ static const struct attribute_group *virtblk_attr_groups[] = {
>   	NULL,
>   };
>   
> +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> +		unsigned int hctx_idx)
> +{
> +	struct virtio_blk *vblk = data;
> +	struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
> +
> +	hctx->driver_data = vq;
> +	return 0;
> +}
> +
>   static void virtblk_map_queues(struct blk_mq_tag_set *set)
>   {
>   	struct virtio_blk *vblk = set->driver_data;
> @@ -1205,7 +1206,7 @@ static void virtblk_complete_batch(struct io_comp_batch *iob)
>   static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>   {
>   	struct virtio_blk *vblk = hctx->queue->queuedata;
> -	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
> +	struct virtio_blk_vq *vq = hctx->driver_data;
>   	struct virtblk_req *vbr;
>   	unsigned long flags;
>   	unsigned int len;
> @@ -1236,6 +1237,7 @@ static const struct blk_mq_ops virtio_mq_ops = {
>   	.queue_rqs	= virtio_queue_rqs,
>   	.commit_rqs	= virtio_commit_rqs,
>   	.complete	= virtblk_request_done,
> +	.init_hctx	= virtblk_init_hctx,
>   	.map_queues	= virtblk_map_queues,
>   	.poll		= virtblk_poll,
>   };

Best regards

Michael S. Tsirkin Sept. 12, 2024, 6:57 a.m. UTC | #5

On Thu, Sep 12, 2024 at 08:46:15AM +0200, Marek Szyprowski wrote:
> Dear All,
> 
> On 08.08.2024 00:41, Max Gurtovoy wrote:
> > Set the driver data of the hardware context (hctx) to point directly to
> > the virtio block queue. This cleanup improves code readability and
> > reduces the number of dereferences in the fast path.
> >
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > ---
> >   drivers/block/virtio_blk.c | 42 ++++++++++++++++++++------------------
> >   1 file changed, 22 insertions(+), 20 deletions(-)
> 
> This patch landed in recent linux-next as commit 8d04556131c1 
> ("virtio_blk: implement init_hctx MQ operation"). In my tests I found 
> that it introduces a regression in system suspend/resume operation. From 
> time to time system crashes during suspend/resume cycle. Reverting this 
> patch on top of next-20240911 fixes this problem.
> 
> I've even managed to catch a kernel panic log of this problem on QEMU's 
> ARM64 'virt' machine:
> 
> root@target:~# time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 12 07:11:52 2024
> Unable to handle kernel NULL pointer dereference at virtual address 
> 0000000000000090
> Mem abort info:
>    ESR = 0x0000000096000046
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x06: level 2 translation fault
> Data abort info:
>    ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
>    CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046bbb000
> ...
> Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6
> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0H Not tainted 6.11.0-rc6+ #9024
> Hardware name: linux,dummy-virt (DT)
> Workqueue: kblockd blk_mq_requeue_work
> pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : virtqueue_add_split+0x458/0x63c
> lr : virtqueue_add_split+0x1d0/0x63c
> ...
> Call trace:
>   virtqueue_add_split+0x458/0x63c
>   virtqueue_add_sgs+0xc4/0xec
>   virtblk_add_req+0x8c/0xf4
>   virtio_queue_rq+0x6c/0x1bc
>   blk_mq_dispatch_rq_list+0x21c/0x714
>   __blk_mq_sched_dispatch_requests+0xb4/0x58c
>   blk_mq_sched_dispatch_requests+0x30/0x6c
>   blk_mq_run_hw_queue+0x14c/0x40c
>   blk_mq_run_hw_queues+0x64/0x124
>   blk_mq_requeue_work+0x188/0x1bc
>   process_one_work+0x20c/0x608
>   worker_thread+0x238/0x370
>   kthread+0x124/0x128
>   ret_from_fork+0x10/0x20
> Code: f9404282 79401c21 b9004a81 f94047e1 (f8206841)
> ---[ end trace 0000000000000000 ]---
> note: kworker/0:0H[9] exited with irqs disabled
> note: kworker/0:0H[9] exited with preempt_count 1
> 

OK I'll drop from next for now, pls try to debug
and repost.


> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index 2351f411fa46..35a7a586f6f5 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -129,14 +129,6 @@ static inline blk_status_t virtblk_result(u8 status)
> >   	}
> >   }
> >   
> > -static inline struct virtio_blk_vq *get_virtio_blk_vq(struct blk_mq_hw_ctx *hctx)
> > -{
> > -	struct virtio_blk *vblk = hctx->queue->queuedata;
> > -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > -
> > -	return vq;
> > -}
> > -
> >   static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr)
> >   {
> >   	struct scatterlist out_hdr, in_hdr, *sgs[3];
> > @@ -377,8 +369,7 @@ static void virtblk_done(struct virtqueue *vq)
> >   
> >   static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
> >   {
> > -	struct virtio_blk *vblk = hctx->queue->queuedata;
> > -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
> > +	struct virtio_blk_vq *vq = hctx->driver_data;
> >   	bool kick;
> >   
> >   	spin_lock_irq(&vq->lock);
> > @@ -428,10 +419,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >   			   const struct blk_mq_queue_data *bd)
> >   {
> >   	struct virtio_blk *vblk = hctx->queue->queuedata;
> > +	struct virtio_blk_vq *vq = hctx->driver_data;
> >   	struct request *req = bd->rq;
> >   	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
> >   	unsigned long flags;
> > -	int qid = hctx->queue_num;
> >   	bool notify = false;
> >   	blk_status_t status;
> >   	int err;
> > @@ -440,26 +431,26 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> >   	if (unlikely(status))
> >   		return status;
> >   
> > -	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
> > -	err = virtblk_add_req(vblk->vqs[qid].vq, vbr);
> > +	spin_lock_irqsave(&vq->lock, flags);
> > +	err = virtblk_add_req(vq->vq, vbr);
> >   	if (err) {
> > -		virtqueue_kick(vblk->vqs[qid].vq);
> > +		virtqueue_kick(vq->vq);
> >   		/* Don't stop the queue if -ENOMEM: we may have failed to
> >   		 * bounce the buffer due to global resource outage.
> >   		 */
> >   		if (err == -ENOSPC)
> >   			blk_mq_stop_hw_queue(hctx);
> > -		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> > +		spin_unlock_irqrestore(&vq->lock, flags);
> >   		virtblk_unmap_data(req, vbr);
> >   		return virtblk_fail_to_queue(req, err);
> >   	}
> >   
> > -	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
> > +	if (bd->last && virtqueue_kick_prepare(vq->vq))
> >   		notify = true;
> > -	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
> > +	spin_unlock_irqrestore(&vq->lock, flags);
> >   
> >   	if (notify)
> > -		virtqueue_notify(vblk->vqs[qid].vq);
> > +		virtqueue_notify(vq->vq);
> >   	return BLK_STS_OK;
> >   }
> >   
> > @@ -504,7 +495,7 @@ static void virtio_queue_rqs(struct request **rqlist)
> >   	struct request *requeue_list = NULL;
> >   
> >   	rq_list_for_each_safe(rqlist, req, next) {
> > -		struct virtio_blk_vq *vq = get_virtio_blk_vq(req->mq_hctx);
> > +		struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
> >   		bool kick;
> >   
> >   		if (!virtblk_prep_rq_batch(req)) {
> > @@ -1164,6 +1155,16 @@ static const struct attribute_group *virtblk_attr_groups[] = {
> >   	NULL,
> >   };
> >   
> > +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> > +		unsigned int hctx_idx)
> > +{
> > +	struct virtio_blk *vblk = data;
> > +	struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
> > +
> > +	hctx->driver_data = vq;
> > +	return 0;
> > +}
> > +
> >   static void virtblk_map_queues(struct blk_mq_tag_set *set)
> >   {
> >   	struct virtio_blk *vblk = set->driver_data;
> > @@ -1205,7 +1206,7 @@ static void virtblk_complete_batch(struct io_comp_batch *iob)
> >   static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> >   {
> >   	struct virtio_blk *vblk = hctx->queue->queuedata;
> > -	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
> > +	struct virtio_blk_vq *vq = hctx->driver_data;
> >   	struct virtblk_req *vbr;
> >   	unsigned long flags;
> >   	unsigned int len;
> > @@ -1236,6 +1237,7 @@ static const struct blk_mq_ops virtio_mq_ops = {
> >   	.queue_rqs	= virtio_queue_rqs,
> >   	.commit_rqs	= virtio_commit_rqs,
> >   	.complete	= virtblk_request_done,
> > +	.init_hctx	= virtblk_init_hctx,
> >   	.map_queues	= virtblk_map_queues,
> >   	.poll		= virtblk_poll,
> >   };
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland

Max Gurtovoy Sept. 16, 2024, 10:06 p.m. UTC | #6

Hi Marek,

On 12/09/2024 9:46, Marek Szyprowski wrote:
> Dear All,
>
> On 08.08.2024 00:41, Max Gurtovoy wrote:
>> Set the driver data of the hardware context (hctx) to point directly to
>> the virtio block queue. This cleanup improves code readability and
>> reduces the number of dereferences in the fast path.
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>    drivers/block/virtio_blk.c | 42 ++++++++++++++++++++------------------
>>    1 file changed, 22 insertions(+), 20 deletions(-)
> This patch landed in recent linux-next as commit 8d04556131c1
> ("virtio_blk: implement init_hctx MQ operation"). In my tests I found
> that it introduces a regression in system suspend/resume operation. From
> time to time system crashes during suspend/resume cycle. Reverting this
> patch on top of next-20240911 fixes this problem.

Could you please provide a detailed explanation of the system 
suspend/resume operation and the specific testing methodology employed?

The occurrence of a kernel panic from this commit is unexpected, given 
that it primarily involves pointer reassignment without altering the 
lifecycle of vblk/vqs.

In the virtqueue_add_split function, which pointer is becoming null and 
causing the issue? A detailed analysis would be helpful.

The report indicates that the crash occurs sporadically rather than 
consistently.

is it possible that this is a race condition introduced by a different 
commit? How can we rule out this possibility?

Prior to applying this commit, what were the test results? Specifically, 
out of 100 test runs, how many passed successfully?

After applying this commit, what are the updated test results? Again, 
out of 100 test runs, how many passed successfully?


>
> I've even managed to catch a kernel panic log of this problem on QEMU's
> ARM64 'virt' machine:
>
> root@target:~# time rtcwake -s10 -mmem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 12 07:11:52 2024
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000090
> Mem abort info:
>     ESR = 0x0000000096000046
>     EC = 0x25: DABT (current EL), IL = 32 bits
>     SET = 0, FnV = 0
>     EA = 0, S1PTW = 0
>     FSC = 0x06: level 2 translation fault
> Data abort info:
>     ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
>     CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046bbb000
> ...
> Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6
> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0H Not tainted 6.11.0-rc6+ #9024
> Hardware name: linux,dummy-virt (DT)
> Workqueue: kblockd blk_mq_requeue_work
> pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : virtqueue_add_split+0x458/0x63c
> lr : virtqueue_add_split+0x1d0/0x63c
> ...
> Call trace:
>    virtqueue_add_split+0x458/0x63c
>    virtqueue_add_sgs+0xc4/0xec
>    virtblk_add_req+0x8c/0xf4
>    virtio_queue_rq+0x6c/0x1bc
>    blk_mq_dispatch_rq_list+0x21c/0x714
>    __blk_mq_sched_dispatch_requests+0xb4/0x58c
>    blk_mq_sched_dispatch_requests+0x30/0x6c
>    blk_mq_run_hw_queue+0x14c/0x40c
>    blk_mq_run_hw_queues+0x64/0x124
>    blk_mq_requeue_work+0x188/0x1bc
>    process_one_work+0x20c/0x608
>    worker_thread+0x238/0x370
>    kthread+0x124/0x128
>    ret_from_fork+0x10/0x20
> Code: f9404282 79401c21 b9004a81 f94047e1 (f8206841)
> ---[ end trace 0000000000000000 ]---
> note: kworker/0:0H[9] exited with irqs disabled
> note: kworker/0:0H[9] exited with preempt_count 1
>
>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 2351f411fa46..35a7a586f6f5 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -129,14 +129,6 @@ static inline blk_status_t virtblk_result(u8 status)
>>    	}
>>    }
>>    
>> -static inline struct virtio_blk_vq *get_virtio_blk_vq(struct blk_mq_hw_ctx *hctx)
>> -{
>> -	struct virtio_blk *vblk = hctx->queue->queuedata;
>> -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>> -
>> -	return vq;
>> -}
>> -
>>    static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr)
>>    {
>>    	struct scatterlist out_hdr, in_hdr, *sgs[3];
>> @@ -377,8 +369,7 @@ static void virtblk_done(struct virtqueue *vq)
>>    
>>    static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>    {
>> -	struct virtio_blk *vblk = hctx->queue->queuedata;
>> -	struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>> +	struct virtio_blk_vq *vq = hctx->driver_data;
>>    	bool kick;
>>    
>>    	spin_lock_irq(&vq->lock);
>> @@ -428,10 +419,10 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>    			   const struct blk_mq_queue_data *bd)
>>    {
>>    	struct virtio_blk *vblk = hctx->queue->queuedata;
>> +	struct virtio_blk_vq *vq = hctx->driver_data;
>>    	struct request *req = bd->rq;
>>    	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>>    	unsigned long flags;
>> -	int qid = hctx->queue_num;
>>    	bool notify = false;
>>    	blk_status_t status;
>>    	int err;
>> @@ -440,26 +431,26 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>>    	if (unlikely(status))
>>    		return status;
>>    
>> -	spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>> -	err = virtblk_add_req(vblk->vqs[qid].vq, vbr);
>> +	spin_lock_irqsave(&vq->lock, flags);
>> +	err = virtblk_add_req(vq->vq, vbr);
>>    	if (err) {
>> -		virtqueue_kick(vblk->vqs[qid].vq);
>> +		virtqueue_kick(vq->vq);
>>    		/* Don't stop the queue if -ENOMEM: we may have failed to
>>    		 * bounce the buffer due to global resource outage.
>>    		 */
>>    		if (err == -ENOSPC)
>>    			blk_mq_stop_hw_queue(hctx);
>> -		spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>> +		spin_unlock_irqrestore(&vq->lock, flags);
>>    		virtblk_unmap_data(req, vbr);
>>    		return virtblk_fail_to_queue(req, err);
>>    	}
>>    
>> -	if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
>> +	if (bd->last && virtqueue_kick_prepare(vq->vq))
>>    		notify = true;
>> -	spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>> +	spin_unlock_irqrestore(&vq->lock, flags);
>>    
>>    	if (notify)
>> -		virtqueue_notify(vblk->vqs[qid].vq);
>> +		virtqueue_notify(vq->vq);
>>    	return BLK_STS_OK;
>>    }
>>    
>> @@ -504,7 +495,7 @@ static void virtio_queue_rqs(struct request **rqlist)
>>    	struct request *requeue_list = NULL;
>>    
>>    	rq_list_for_each_safe(rqlist, req, next) {
>> -		struct virtio_blk_vq *vq = get_virtio_blk_vq(req->mq_hctx);
>> +		struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
>>    		bool kick;
>>    
>>    		if (!virtblk_prep_rq_batch(req)) {
>> @@ -1164,6 +1155,16 @@ static const struct attribute_group *virtblk_attr_groups[] = {
>>    	NULL,
>>    };
>>    
>> +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>> +		unsigned int hctx_idx)
>> +{
>> +	struct virtio_blk *vblk = data;
>> +	struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
>> +
>> +	hctx->driver_data = vq;
>> +	return 0;
>> +}
>> +
>>    static void virtblk_map_queues(struct blk_mq_tag_set *set)
>>    {
>>    	struct virtio_blk *vblk = set->driver_data;
>> @@ -1205,7 +1206,7 @@ static void virtblk_complete_batch(struct io_comp_batch *iob)
>>    static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
>>    {
>>    	struct virtio_blk *vblk = hctx->queue->queuedata;
>> -	struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
>> +	struct virtio_blk_vq *vq = hctx->driver_data;
>>    	struct virtblk_req *vbr;
>>    	unsigned long flags;
>>    	unsigned int len;
>> @@ -1236,6 +1237,7 @@ static const struct blk_mq_ops virtio_mq_ops = {
>>    	.queue_rqs	= virtio_queue_rqs,
>>    	.commit_rqs	= virtio_commit_rqs,
>>    	.complete	= virtblk_request_done,
>> +	.init_hctx	= virtblk_init_hctx,
>>    	.map_queues	= virtblk_map_queues,
>>    	.poll		= virtblk_poll,
>>    };
> Best regards

Marek Szyprowski Sept. 17, 2024, 2:09 p.m. UTC | #7

Hi Max,

On 17.09.2024 00:06, Max Gurtovoy wrote:
>
> On 12/09/2024 9:46, Marek Szyprowski wrote:
>> Dear All,
>>
>> On 08.08.2024 00:41, Max Gurtovoy wrote:
>>> Set the driver data of the hardware context (hctx) to point directly to
>>> the virtio block queue. This cleanup improves code readability and
>>> reduces the number of dereferences in the fast path.
>>>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>> ---
>>>    drivers/block/virtio_blk.c | 42 
>>> ++++++++++++++++++++------------------
>>>    1 file changed, 22 insertions(+), 20 deletions(-)
>> This patch landed in recent linux-next as commit 8d04556131c1
>> ("virtio_blk: implement init_hctx MQ operation"). In my tests I found
>> that it introduces a regression in system suspend/resume operation. From
>> time to time system crashes during suspend/resume cycle. Reverting this
>> patch on top of next-20240911 fixes this problem.
>
> Could you please provide a detailed explanation of the system 
> suspend/resume operation and the specific testing methodology employed?

In my tests I just call the 'rtcwake -s10 -mmem' command many times in a 
loop. I use standard Debian image under QEMU/ARM64. Nothing really special.

>
> The occurrence of a kernel panic from this commit is unexpected, given 
> that it primarily involves pointer reassignment without altering the 
> lifecycle of vblk/vqs.
>
> In the virtqueue_add_split function, which pointer is becoming null 
> and causing the issue? A detailed analysis would be helpful.
>
> The report indicates that the crash occurs sporadically rather than 
> consistently.
>
> is it possible that this is a race condition introduced by a different 
> commit? How can we rule out this possibility?
This is the commit pointed by bisecting between v6.11-rc1 and 
next-20240911. The problem is reproducible, it just need a few calls to 
the rtcwake command.
>
> Prior to applying this commit, what were the test results? 
> Specifically, out of 100 test runs, how many passed successfully?

All 100 were successful, see https://pastebin.com/3yETvXK9 (kernel is 
compiled from 6d17035a7402, which is a parent of $subject in linux-next).

>
> After applying this commit, what are the updated test results? Again, 
> out of 100 test runs, how many passed successfully?

Usually it freezes or panics after the second try, see 
https://pastebin.com/u5n9K1Dz (kernel compiled from 8d04556131c1, which 
is $subject in linux-next).

>
>>
>> I've even managed to catch a kernel panic log of this problem on QEMU's
>> ARM64 'virt' machine:
>>
>> root@target:~# time rtcwake -s10 -mmem
>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 12 07:11:52 2024
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0000000000000090
>> Mem abort info:
>>     ESR = 0x0000000096000046
>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>     SET = 0, FnV = 0
>>     EA = 0, S1PTW = 0
>>     FSC = 0x06: level 2 translation fault
>> Data abort info:
>>     ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
>>     CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>>     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046bbb000
>> ...
>> Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6
>> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0H Not tainted 6.11.0-rc6+ #9024
>> Hardware name: linux,dummy-virt (DT)
>> Workqueue: kblockd blk_mq_requeue_work
>> pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : virtqueue_add_split+0x458/0x63c
>> lr : virtqueue_add_split+0x1d0/0x63c
>> ...
>> Call trace:
>>    virtqueue_add_split+0x458/0x63c
>>    virtqueue_add_sgs+0xc4/0xec
>>    virtblk_add_req+0x8c/0xf4
>>    virtio_queue_rq+0x6c/0x1bc
>>    blk_mq_dispatch_rq_list+0x21c/0x714
>>    __blk_mq_sched_dispatch_requests+0xb4/0x58c
>>    blk_mq_sched_dispatch_requests+0x30/0x6c
>>    blk_mq_run_hw_queue+0x14c/0x40c
>>    blk_mq_run_hw_queues+0x64/0x124
>>    blk_mq_requeue_work+0x188/0x1bc
>>    process_one_work+0x20c/0x608
>>    worker_thread+0x238/0x370
>>    kthread+0x124/0x128
>>    ret_from_fork+0x10/0x20
>> Code: f9404282 79401c21 b9004a81 f94047e1 (f8206841)
>> ---[ end trace 0000000000000000 ]---
>> note: kworker/0:0H[9] exited with irqs disabled
>> note: kworker/0:0H[9] exited with preempt_count 1
>>
>>
>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>> index 2351f411fa46..35a7a586f6f5 100644
>>> --- a/drivers/block/virtio_blk.c
>>> +++ b/drivers/block/virtio_blk.c
>>> @@ -129,14 +129,6 @@ static inline blk_status_t virtblk_result(u8 
>>> status)
>>>        }
>>>    }
>>>    -static inline struct virtio_blk_vq *get_virtio_blk_vq(struct 
>>> blk_mq_hw_ctx *hctx)
>>> -{
>>> -    struct virtio_blk *vblk = hctx->queue->queuedata;
>>> -    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>> -
>>> -    return vq;
>>> -}
>>> -
>>>    static int virtblk_add_req(struct virtqueue *vq, struct 
>>> virtblk_req *vbr)
>>>    {
>>>        struct scatterlist out_hdr, in_hdr, *sgs[3];
>>> @@ -377,8 +369,7 @@ static void virtblk_done(struct virtqueue *vq)
>>>       static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>    {
>>> -    struct virtio_blk *vblk = hctx->queue->queuedata;
>>> -    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>        bool kick;
>>>           spin_lock_irq(&vq->lock);
>>> @@ -428,10 +419,10 @@ static blk_status_t virtio_queue_rq(struct 
>>> blk_mq_hw_ctx *hctx,
>>>                   const struct blk_mq_queue_data *bd)
>>>    {
>>>        struct virtio_blk *vblk = hctx->queue->queuedata;
>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>        struct request *req = bd->rq;
>>>        struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>>>        unsigned long flags;
>>> -    int qid = hctx->queue_num;
>>>        bool notify = false;
>>>        blk_status_t status;
>>>        int err;
>>> @@ -440,26 +431,26 @@ static blk_status_t virtio_queue_rq(struct 
>>> blk_mq_hw_ctx *hctx,
>>>        if (unlikely(status))
>>>            return status;
>>>    -    spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>>> -    err = virtblk_add_req(vblk->vqs[qid].vq, vbr);
>>> +    spin_lock_irqsave(&vq->lock, flags);
>>> +    err = virtblk_add_req(vq->vq, vbr);
>>>        if (err) {
>>> -        virtqueue_kick(vblk->vqs[qid].vq);
>>> +        virtqueue_kick(vq->vq);
>>>            /* Don't stop the queue if -ENOMEM: we may have failed to
>>>             * bounce the buffer due to global resource outage.
>>>             */
>>>            if (err == -ENOSPC)
>>>                blk_mq_stop_hw_queue(hctx);
>>> -        spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>> +        spin_unlock_irqrestore(&vq->lock, flags);
>>>            virtblk_unmap_data(req, vbr);
>>>            return virtblk_fail_to_queue(req, err);
>>>        }
>>>    -    if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
>>> +    if (bd->last && virtqueue_kick_prepare(vq->vq))
>>>            notify = true;
>>> -    spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>> +    spin_unlock_irqrestore(&vq->lock, flags);
>>>           if (notify)
>>> -        virtqueue_notify(vblk->vqs[qid].vq);
>>> +        virtqueue_notify(vq->vq);
>>>        return BLK_STS_OK;
>>>    }
>>>    @@ -504,7 +495,7 @@ static void virtio_queue_rqs(struct request 
>>> **rqlist)
>>>        struct request *requeue_list = NULL;
>>>           rq_list_for_each_safe(rqlist, req, next) {
>>> -        struct virtio_blk_vq *vq = get_virtio_blk_vq(req->mq_hctx);
>>> +        struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
>>>            bool kick;
>>>               if (!virtblk_prep_rq_batch(req)) {
>>> @@ -1164,6 +1155,16 @@ static const struct attribute_group 
>>> *virtblk_attr_groups[] = {
>>>        NULL,
>>>    };
>>>    +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void 
>>> *data,
>>> +        unsigned int hctx_idx)
>>> +{
>>> +    struct virtio_blk *vblk = data;
>>> +    struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
>>> +
>>> +    hctx->driver_data = vq;
>>> +    return 0;
>>> +}
>>> +
>>>    static void virtblk_map_queues(struct blk_mq_tag_set *set)
>>>    {
>>>        struct virtio_blk *vblk = set->driver_data;
>>> @@ -1205,7 +1206,7 @@ static void virtblk_complete_batch(struct 
>>> io_comp_batch *iob)
>>>    static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct 
>>> io_comp_batch *iob)
>>>    {
>>>        struct virtio_blk *vblk = hctx->queue->queuedata;
>>> -    struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>        struct virtblk_req *vbr;
>>>        unsigned long flags;
>>>        unsigned int len;
>>> @@ -1236,6 +1237,7 @@ static const struct blk_mq_ops virtio_mq_ops = {
>>>        .queue_rqs    = virtio_queue_rqs,
>>>        .commit_rqs    = virtio_commit_rqs,
>>>        .complete    = virtblk_request_done,
>>> +    .init_hctx    = virtblk_init_hctx,
>>>        .map_queues    = virtblk_map_queues,
>>>        .poll        = virtblk_poll,
>>>    };
>> Best regards
>
Best regards

Max Gurtovoy Sept. 22, 2024, 10:47 p.m. UTC | #8

On 17/09/2024 17:09, Marek Szyprowski wrote:
> Hi Max,
>
> On 17.09.2024 00:06, Max Gurtovoy wrote:
>> On 12/09/2024 9:46, Marek Szyprowski wrote:
>>> Dear All,
>>>
>>> On 08.08.2024 00:41, Max Gurtovoy wrote:
>>>> Set the driver data of the hardware context (hctx) to point directly to
>>>> the virtio block queue. This cleanup improves code readability and
>>>> reduces the number of dereferences in the fast path.
>>>>
>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>     drivers/block/virtio_blk.c | 42
>>>> ++++++++++++++++++++------------------
>>>>     1 file changed, 22 insertions(+), 20 deletions(-)
>>> This patch landed in recent linux-next as commit 8d04556131c1
>>> ("virtio_blk: implement init_hctx MQ operation"). In my tests I found
>>> that it introduces a regression in system suspend/resume operation. From
>>> time to time system crashes during suspend/resume cycle. Reverting this
>>> patch on top of next-20240911 fixes this problem.
>> Could you please provide a detailed explanation of the system
>> suspend/resume operation and the specific testing methodology employed?
> In my tests I just call the 'rtcwake -s10 -mmem' command many times in a
> loop. I use standard Debian image under QEMU/ARM64. Nothing really special.

I run this test on my bare metal x86 server in a loop with fio in the 
background.

The test passed.

Can you please re-test with the linux/master branch with applying this 
patch on top ?

>
>> The occurrence of a kernel panic from this commit is unexpected, given
>> that it primarily involves pointer reassignment without altering the
>> lifecycle of vblk/vqs.
>>
>> In the virtqueue_add_split function, which pointer is becoming null
>> and causing the issue? A detailed analysis would be helpful.
>>
>> The report indicates that the crash occurs sporadically rather than
>> consistently.
>>
>> is it possible that this is a race condition introduced by a different
>> commit? How can we rule out this possibility?
> This is the commit pointed by bisecting between v6.11-rc1 and
> next-20240911. The problem is reproducible, it just need a few calls to
> the rtcwake command.
>> Prior to applying this commit, what were the test results?
>> Specifically, out of 100 test runs, how many passed successfully?
> All 100 were successful, see https://pastebin.com/3yETvXK9 (kernel is
> compiled from 6d17035a7402, which is a parent of $subject in linux-next).
>
>> After applying this commit, what are the updated test results? Again,
>> out of 100 test runs, how many passed successfully?
> Usually it freezes or panics after the second try, see
> https://pastebin.com/u5n9K1Dz (kernel compiled from 8d04556131c1, which
> is $subject in linux-next).
>
>>> I've even managed to catch a kernel panic log of this problem on QEMU's
>>> ARM64 'virt' machine:
>>>
>>> root@target:~# time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Sep 12 07:11:52 2024
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 0000000000000090
>>> Mem abort info:
>>>      ESR = 0x0000000096000046
>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>      SET = 0, FnV = 0
>>>      EA = 0, S1PTW = 0
>>>      FSC = 0x06: level 2 translation fault
>>> Data abort info:
>>>      ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
>>>      CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>>>      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046bbb000
>>> ...
>>> Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6
>>> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0H Not tainted 6.11.0-rc6+ #9024
>>> Hardware name: linux,dummy-virt (DT)
>>> Workqueue: kblockd blk_mq_requeue_work
>>> pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : virtqueue_add_split+0x458/0x63c
>>> lr : virtqueue_add_split+0x1d0/0x63c
>>> ...
>>> Call trace:
>>>     virtqueue_add_split+0x458/0x63c
>>>     virtqueue_add_sgs+0xc4/0xec
>>>     virtblk_add_req+0x8c/0xf4
>>>     virtio_queue_rq+0x6c/0x1bc
>>>     blk_mq_dispatch_rq_list+0x21c/0x714
>>>     __blk_mq_sched_dispatch_requests+0xb4/0x58c
>>>     blk_mq_sched_dispatch_requests+0x30/0x6c
>>>     blk_mq_run_hw_queue+0x14c/0x40c
>>>     blk_mq_run_hw_queues+0x64/0x124
>>>     blk_mq_requeue_work+0x188/0x1bc
>>>     process_one_work+0x20c/0x608
>>>     worker_thread+0x238/0x370
>>>     kthread+0x124/0x128
>>>     ret_from_fork+0x10/0x20
>>> Code: f9404282 79401c21 b9004a81 f94047e1 (f8206841)
>>> ---[ end trace 0000000000000000 ]---
>>> note: kworker/0:0H[9] exited with irqs disabled
>>> note: kworker/0:0H[9] exited with preempt_count 1
>>>
>>>
>>>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>>>> index 2351f411fa46..35a7a586f6f5 100644
>>>> --- a/drivers/block/virtio_blk.c
>>>> +++ b/drivers/block/virtio_blk.c
>>>> @@ -129,14 +129,6 @@ static inline blk_status_t virtblk_result(u8
>>>> status)
>>>>         }
>>>>     }
>>>>     -static inline struct virtio_blk_vq *get_virtio_blk_vq(struct
>>>> blk_mq_hw_ctx *hctx)
>>>> -{
>>>> -    struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> -    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>> -
>>>> -    return vq;
>>>> -}
>>>> -
>>>>     static int virtblk_add_req(struct virtqueue *vq, struct
>>>> virtblk_req *vbr)
>>>>     {
>>>>         struct scatterlist out_hdr, in_hdr, *sgs[3];
>>>> @@ -377,8 +369,7 @@ static void virtblk_done(struct virtqueue *vq)
>>>>        static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>>>     {
>>>> -    struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> -    struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num];
>>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>>         bool kick;
>>>>            spin_lock_irq(&vq->lock);
>>>> @@ -428,10 +419,10 @@ static blk_status_t virtio_queue_rq(struct
>>>> blk_mq_hw_ctx *hctx,
>>>>                    const struct blk_mq_queue_data *bd)
>>>>     {
>>>>         struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>>         struct request *req = bd->rq;
>>>>         struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>>>>         unsigned long flags;
>>>> -    int qid = hctx->queue_num;
>>>>         bool notify = false;
>>>>         blk_status_t status;
>>>>         int err;
>>>> @@ -440,26 +431,26 @@ static blk_status_t virtio_queue_rq(struct
>>>> blk_mq_hw_ctx *hctx,
>>>>         if (unlikely(status))
>>>>             return status;
>>>>     -    spin_lock_irqsave(&vblk->vqs[qid].lock, flags);
>>>> -    err = virtblk_add_req(vblk->vqs[qid].vq, vbr);
>>>> +    spin_lock_irqsave(&vq->lock, flags);
>>>> +    err = virtblk_add_req(vq->vq, vbr);
>>>>         if (err) {
>>>> -        virtqueue_kick(vblk->vqs[qid].vq);
>>>> +        virtqueue_kick(vq->vq);
>>>>             /* Don't stop the queue if -ENOMEM: we may have failed to
>>>>              * bounce the buffer due to global resource outage.
>>>>              */
>>>>             if (err == -ENOSPC)
>>>>                 blk_mq_stop_hw_queue(hctx);
>>>> -        spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>> +        spin_unlock_irqrestore(&vq->lock, flags);
>>>>             virtblk_unmap_data(req, vbr);
>>>>             return virtblk_fail_to_queue(req, err);
>>>>         }
>>>>     -    if (bd->last && virtqueue_kick_prepare(vblk->vqs[qid].vq))
>>>> +    if (bd->last && virtqueue_kick_prepare(vq->vq))
>>>>             notify = true;
>>>> -    spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
>>>> +    spin_unlock_irqrestore(&vq->lock, flags);
>>>>            if (notify)
>>>> -        virtqueue_notify(vblk->vqs[qid].vq);
>>>> +        virtqueue_notify(vq->vq);
>>>>         return BLK_STS_OK;
>>>>     }
>>>>     @@ -504,7 +495,7 @@ static void virtio_queue_rqs(struct request
>>>> **rqlist)
>>>>         struct request *requeue_list = NULL;
>>>>            rq_list_for_each_safe(rqlist, req, next) {
>>>> -        struct virtio_blk_vq *vq = get_virtio_blk_vq(req->mq_hctx);
>>>> +        struct virtio_blk_vq *vq = req->mq_hctx->driver_data;
>>>>             bool kick;
>>>>                if (!virtblk_prep_rq_batch(req)) {
>>>> @@ -1164,6 +1155,16 @@ static const struct attribute_group
>>>> *virtblk_attr_groups[] = {
>>>>         NULL,
>>>>     };
>>>>     +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void
>>>> *data,
>>>> +        unsigned int hctx_idx)
>>>> +{
>>>> +    struct virtio_blk *vblk = data;
>>>> +    struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
>>>> +
>>>> +    hctx->driver_data = vq;
>>>> +    return 0;
>>>> +}
>>>> +
>>>>     static void virtblk_map_queues(struct blk_mq_tag_set *set)
>>>>     {
>>>>         struct virtio_blk *vblk = set->driver_data;
>>>> @@ -1205,7 +1206,7 @@ static void virtblk_complete_batch(struct
>>>> io_comp_batch *iob)
>>>>     static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct
>>>> io_comp_batch *iob)
>>>>     {
>>>>         struct virtio_blk *vblk = hctx->queue->queuedata;
>>>> -    struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx);
>>>> +    struct virtio_blk_vq *vq = hctx->driver_data;
>>>>         struct virtblk_req *vbr;
>>>>         unsigned long flags;
>>>>         unsigned int len;
>>>> @@ -1236,6 +1237,7 @@ static const struct blk_mq_ops virtio_mq_ops = {
>>>>         .queue_rqs    = virtio_queue_rqs,
>>>>         .commit_rqs    = virtio_commit_rqs,
>>>>         .complete    = virtblk_request_done,
>>>> +    .init_hctx    = virtblk_init_hctx,
>>>>         .map_queues    = virtblk_map_queues,
>>>>         .poll        = virtblk_poll,
>>>>     };
>>> Best regards
> Best regards

Francesco Lavra Sept. 24, 2024, 8:06 a.m. UTC | #9

On Mon, 2024-09-23 at 01:47 +0300, Max Gurtovoy wrote:
> 
> On 17/09/2024 17:09, Marek Szyprowski wrote:
> > Hi Max,
> > 
> > On 17.09.2024 00:06, Max Gurtovoy wrote:
> > > On 12/09/2024 9:46, Marek Szyprowski wrote:
> > > > Dear All,
> > > > 
> > > > On 08.08.2024 00:41, Max Gurtovoy wrote:
> > > > > Set the driver data of the hardware context (hctx) to point
> > > > > directly to
> > > > > the virtio block queue. This cleanup improves code
> > > > > readability and
> > > > > reduces the number of dereferences in the fast path.
> > > > > 
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > ---
> > > > >     drivers/block/virtio_blk.c | 42
> > > > > ++++++++++++++++++++------------------
> > > > >     1 file changed, 22 insertions(+), 20 deletions(-)
> > > > This patch landed in recent linux-next as commit 8d04556131c1
> > > > ("virtio_blk: implement init_hctx MQ operation"). In my tests I
> > > > found
> > > > that it introduces a regression in system suspend/resume
> > > > operation. From
> > > > time to time system crashes during suspend/resume cycle.
> > > > Reverting this
> > > > patch on top of next-20240911 fixes this problem.
> > > Could you please provide a detailed explanation of the system
> > > suspend/resume operation and the specific testing methodology
> > > employed?
> > In my tests I just call the 'rtcwake -s10 -mmem' command many times
> > in a
> > loop. I use standard Debian image under QEMU/ARM64. Nothing really
> > special.
> 
> I run this test on my bare metal x86 server in a loop with fio in the
> background.
> 
> The test passed.

If your kernel is running on bare metal, it's not using the virtio_blk
driver, is it?

Max Gurtovoy Sept. 24, 2024, 8:11 a.m. UTC | #10

On 24/09/2024 11:06, Francesco Lavra wrote:
> On Mon, 2024-09-23 at 01:47 +0300, Max Gurtovoy wrote:
>> On 17/09/2024 17:09, Marek Szyprowski wrote:
>>> Hi Max,
>>>
>>> On 17.09.2024 00:06, Max Gurtovoy wrote:
>>>> On 12/09/2024 9:46, Marek Szyprowski wrote:
>>>>> Dear All,
>>>>>
>>>>> On 08.08.2024 00:41, Max Gurtovoy wrote:
>>>>>> Set the driver data of the hardware context (hctx) to point
>>>>>> directly to
>>>>>> the virtio block queue. This cleanup improves code
>>>>>> readability and
>>>>>> reduces the number of dereferences in the fast path.
>>>>>>
>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>>> ---
>>>>>>      drivers/block/virtio_blk.c | 42
>>>>>> ++++++++++++++++++++------------------
>>>>>>      1 file changed, 22 insertions(+), 20 deletions(-)
>>>>> This patch landed in recent linux-next as commit 8d04556131c1
>>>>> ("virtio_blk: implement init_hctx MQ operation"). In my tests I
>>>>> found
>>>>> that it introduces a regression in system suspend/resume
>>>>> operation. From
>>>>> time to time system crashes during suspend/resume cycle.
>>>>> Reverting this
>>>>> patch on top of next-20240911 fixes this problem.
>>>> Could you please provide a detailed explanation of the system
>>>> suspend/resume operation and the specific testing methodology
>>>> employed?
>>> In my tests I just call the 'rtcwake -s10 -mmem' command many times
>>> in a
>>> loop. I use standard Debian image under QEMU/ARM64. Nothing really
>>> special.
>> I run this test on my bare metal x86 server in a loop with fio in the
>> background.
>>
>> The test passed.
> If your kernel is running on bare metal, it's not using the virtio_blk
> driver, is it?

It is using virtio_blk driver.
I'm using NVIDIA BlueField-3 Virtio-blk device.

Marek Szyprowski Sept. 24, 2024, 10:26 a.m. UTC | #11

Hi Max,

On 23.09.2024 00:47, Max Gurtovoy wrote:
>
> On 17/09/2024 17:09, Marek Szyprowski wrote:
>> On 17.09.2024 00:06, Max Gurtovoy wrote:
>>> On 12/09/2024 9:46, Marek Szyprowski wrote:
>>>> Dear All,
>>>>
>>>> On 08.08.2024 00:41, Max Gurtovoy wrote:
>>>>> Set the driver data of the hardware context (hctx) to point 
>>>>> directly to
>>>>> the virtio block queue. This cleanup improves code readability and
>>>>> reduces the number of dereferences in the fast path.
>>>>>
>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>>> ---
>>>>>     drivers/block/virtio_blk.c | 42
>>>>> ++++++++++++++++++++------------------
>>>>>     1 file changed, 22 insertions(+), 20 deletions(-)
>>>> This patch landed in recent linux-next as commit 8d04556131c1
>>>> ("virtio_blk: implement init_hctx MQ operation"). In my tests I found
>>>> that it introduces a regression in system suspend/resume operation. 
>>>> From
>>>> time to time system crashes during suspend/resume cycle. Reverting 
>>>> this
>>>> patch on top of next-20240911 fixes this problem.
>>> Could you please provide a detailed explanation of the system
>>> suspend/resume operation and the specific testing methodology employed?
>> In my tests I just call the 'rtcwake -s10 -mmem' command many times in a
>> loop. I use standard Debian image under QEMU/ARM64. Nothing really 
>> special.
>
> I run this test on my bare metal x86 server in a loop with fio in the 
> background.
>
> The test passed.


Maybe QEMU is a bit slower and exposes some kind of race caused by this 
change.


>
> Can you please re-test with the linux/master branch with applying this 
> patch on top ?


This issue is fully reproducible with vanilla v6.11 and v6.10 from Linus 
and $subject patch applied on top of it.


I've even checked it with x86_64 QEMU and first random Debian 
preinstalled image I've found.


Here is a detailed setup if You like to check it by yourself (tested on 
Ubuntu 22.04 LTS x86_64 host):


1. download x86_64 preinstalled Debian image:

# wget 
https://dietpi.com/downloads/images/DietPi_NativePC-BIOS-x86_64-Bookworm.img.xz
# xz -d DietPi_NativePC-BIOS-x86_64-Bookworm.img.xz


2. build kernel:

# make x86_64_defconfig
# make -j12


3. run QEMU:

# sudo qemu-system-x86_64 -enable-kvm     \
     -kernel PATH_TO_YOUR_KERNEL_DIR/arch/x86/boot/bzImage \
     -append "console=ttyS0 root=/dev/vda1 rootwait noapic tsc=unstable 
init=/bin/sh" \
     -smp 2 -m 2048     \
     -drive 
file=DietPi_NativePC-BIOS-x86_64-Bookworm.img,format=raw,if=virtio  \
     -netdev user,id=net0 -device virtio-net,netdev=net0        \
     -serial mon:stdio -nographic


4. let it boot, then type (copy&paste line by line) in the init shell:

# mount proc /proc -t proc
# mount sys /sys -t sysfs
# n=10; for i in `seq 1 $n`; do echo Test $i of $n; rtcwake -s10 -mmem; 
date; echo Test $i done; done


5. Use 'Ctrl-a' then 'x' to exit QEMU console.


 > ...


Best regards

[v2] virtio_blk: implement init_hctx MQ operation

Commit Message

Comments

Patch