block: retry blk_rq_map_user_iov() on failure

Message ID	c8718f10-0442-de4f-edfc-ecc6fe28b768@kernel.dk (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=XfYF=YY=vger.kernel.org=linux-block-owner@kernel.org> To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org> From: Jens Axboe <axboe@kernel.dk> Subject: [PATCH] block: retry blk_rq_map_user_iov() on failure Message-ID: <c8718f10-0442-de4f-edfc-ecc6fe28b768@kernel.dk> Date: Thu, 31 Oct 2019 13:26:11 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	block: retry blk_rq_map_user_iov() on failure \| expand block: retry blk_rq_map_user_iov() on failure

Message ID

c8718f10-0442-de4f-edfc-ecc6fe28b768@kernel.dk (mailing list archive)

State

New, archived

Headers

To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
From: Jens Axboe <axboe@kernel.dk>
Subject: [PATCH] block: retry blk_rq_map_user_iov() on failure
Message-ID: <c8718f10-0442-de4f-edfc-ecc6fe28b768@kernel.dk>
Date: Thu, 31 Oct 2019 13:26:11 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.9.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

block: retry blk_rq_map_user_iov() on failure | expand

Commit Message

Jens Axboe Oct. 31, 2019, 7:26 p.m. UTC

We ran into an issue in production where reading an NVMe drive log
randomly fails. The request is a big one, 1056K, and the application
(nvme-cli) nicely aligns the buffer. This means the kernel will attempt
to map the pages in for zero-copy DMA, but we ultimately fail adding
enough pages to the request to satisfy the size requirement as we ran
out of segments supported by the hardware.

If we fail a user map that attempts to map pages in directly, retry with
copy == true set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

Comments

Jens Axboe Oct. 31, 2019, 7:30 p.m. UTC | #1

On 10/31/19 1:26 PM, Jens Axboe wrote:
> We ran into an issue in production where reading an NVMe drive log
> randomly fails. The request is a big one, 1056K, and the application
> (nvme-cli) nicely aligns the buffer. This means the kernel will attempt
> to map the pages in for zero-copy DMA, but we ultimately fail adding
> enough pages to the request to satisfy the size requirement as we ran
> out of segments supported by the hardware.
> 
> If we fail a user map that attempts to map pages in directly, retry with
> copy == true set.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

I did actually add comments to this, but apparently sent out the version
without comments... Current one below, only difference is the addition
of the comment explaining why we retry.


diff --git a/block/blk-map.c b/block/blk-map.c
index 3a62e471d81b..4ca7c5db7d78 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -137,6 +137,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	else if (queue_virt_boundary(q))
 		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
 
+retry:
 	i = *iter;
 	do {
 		ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy);
@@ -154,6 +155,18 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	__blk_rq_unmap_user(bio);
 fail:
 	rq->bio = NULL;
+	/*
+	 * If we fail and we're in direct map mode, retry once with copying
+	 * enabled instead. This can fix the case where userspace wants to
+	 * do a large IO, but fails because the application memory is
+	 * fragmented and we end up exceeding the number of segments for the
+	 * request. With a copy == true retry, we aren't at the mercy of
+	 * application memory layout.
+	 */
+	if (ret && !copy) {
+		copy = true;
+		goto retry;
+	}
 	return ret;
 }
 EXPORT_SYMBOL(blk_rq_map_user_iov);

Jens Axboe Nov. 1, 2019, 1:35 a.m. UTC | #2

On 10/31/19 1:30 PM, Jens Axboe wrote:
> On 10/31/19 1:26 PM, Jens Axboe wrote:
>> We ran into an issue in production where reading an NVMe drive log
>> randomly fails. The request is a big one, 1056K, and the application
>> (nvme-cli) nicely aligns the buffer. This means the kernel will attempt
>> to map the pages in for zero-copy DMA, but we ultimately fail adding
>> enough pages to the request to satisfy the size requirement as we ran
>> out of segments supported by the hardware.
>>
>> If we fail a user map that attempts to map pages in directly, retry with
>> copy == true set.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> I did actually add comments to this, but apparently sent out the version
> without comments... Current one below, only difference is the addition
> of the comment explaining why we retry.

Disregard this - I had a patch to make bio_copy_user_iov() handle strings
of bios that this would also need, it's not enough by itself.

diff --git a/block/blk-map.c b/block/blk-map.c
index db9373bd31ac..2dc5286baec6 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -131,6 +131,7 @@  int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	else if (queue_virt_boundary(q))
 		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
 
+retry:
 	i = *iter;
 	do {
 		ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy);
@@ -148,6 +149,10 @@  int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 	__blk_rq_unmap_user(bio);
 fail:
 	rq->bio = NULL;
+	if (ret && !copy) {
+		copy = true;
+		goto retry;
+	}
 	return ret;
 }
 EXPORT_SYMBOL(blk_rq_map_user_iov);

block: retry blk_rq_map_user_iov() on failure

Commit Message

Comments

Patch