Message ID | c8718f10-0442-de4f-edfc-ecc6fe28b768@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: retry blk_rq_map_user_iov() on failure | expand |
On 10/31/19 1:26 PM, Jens Axboe wrote: > We ran into an issue in production where reading an NVMe drive log > randomly fails. The request is a big one, 1056K, and the application > (nvme-cli) nicely aligns the buffer. This means the kernel will attempt > to map the pages in for zero-copy DMA, but we ultimately fail adding > enough pages to the request to satisfy the size requirement as we ran > out of segments supported by the hardware. > > If we fail a user map that attempts to map pages in directly, retry with > copy == true set. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> I did actually add comments to this, but apparently sent out the version without comments... Current one below, only difference is the addition of the comment explaining why we retry. diff --git a/block/blk-map.c b/block/blk-map.c index 3a62e471d81b..4ca7c5db7d78 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -137,6 +137,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, else if (queue_virt_boundary(q)) copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter); +retry: i = *iter; do { ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy); @@ -154,6 +155,18 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, __blk_rq_unmap_user(bio); fail: rq->bio = NULL; + /* + * If we fail and we're in direct map mode, retry once with copying + * enabled instead. This can fix the case where userspace wants to + * do a large IO, but fails because the application memory is + * fragmented and we end up exceeding the number of segments for the + * request. With a copy == true retry, we aren't at the mercy of + * application memory layout. + */ + if (ret && !copy) { + copy = true; + goto retry; + } return ret; } EXPORT_SYMBOL(blk_rq_map_user_iov);
On 10/31/19 1:30 PM, Jens Axboe wrote: > On 10/31/19 1:26 PM, Jens Axboe wrote: >> We ran into an issue in production where reading an NVMe drive log >> randomly fails. The request is a big one, 1056K, and the application >> (nvme-cli) nicely aligns the buffer. This means the kernel will attempt >> to map the pages in for zero-copy DMA, but we ultimately fail adding >> enough pages to the request to satisfy the size requirement as we ran >> out of segments supported by the hardware. >> >> If we fail a user map that attempts to map pages in directly, retry with >> copy == true set. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > > I did actually add comments to this, but apparently sent out the version > without comments... Current one below, only difference is the addition > of the comment explaining why we retry. Disregard this - I had a patch to make bio_copy_user_iov() handle strings of bios that this would also need, it's not enough by itself.
diff --git a/block/blk-map.c b/block/blk-map.c index db9373bd31ac..2dc5286baec6 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -131,6 +131,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, else if (queue_virt_boundary(q)) copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter); +retry: i = *iter; do { ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy); @@ -148,6 +149,10 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, __blk_rq_unmap_user(bio); fail: rq->bio = NULL; + if (ret && !copy) { + copy = true; + goto retry; + } return ret; } EXPORT_SYMBOL(blk_rq_map_user_iov);
We ran into an issue in production where reading an NVMe drive log randomly fails. The request is a big one, 1056K, and the application (nvme-cli) nicely aligns the buffer. This means the kernel will attempt to map the pages in for zero-copy DMA, but we ultimately fail adding enough pages to the request to satisfy the size requirement as we ran out of segments supported by the hardware. If we fail a user map that attempts to map pages in directly, retry with copy == true set. Signed-off-by: Jens Axboe <axboe@kernel.dk> ---