Message ID | 20241112170050.1612998-3-hch@lst.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] block: remove the write_hint field from struct request | expand |
On 12/11/2024 17:00, Christoph Hellwig wrote: > if (rq->bio->bi_write_hint != bio->bi_write_hint) > return false; > + if (rq->bio->bi_ioprio != bio->bi_ioprio) At this point bio_prio() looks to only be used in md code. Is it worth deleting that wrapper? I doubt its value. And at many points bio->bi_ioprio is written without using bio_set_prio(), so I doubt the use of that one as well.. I do realize that all this is outside the scope of this series. > + return false; > } > > - if (rq->ioprio != bio_prio(bio)) > - return false; > -
On 11/12/24 9:00 AM, Christoph Hellwig wrote: > The request ioprio is only initialized from the first attached bio, > so requests without a bio already never set it. Directly use the > bio field instead. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On Tue, Nov 12, 2024 at 06:22:24PM +0000, John Garry wrote: > And at many points bio->bi_ioprio is written without using bio_set_prio(), > so I doubt the use of that one as well.. > > I do realize that all this is outside the scope of this series. Agreed to all of the above. If anyone is in for an easy cleanup, go for it!
Hi Christoph, This patch causes a regression on E850-96 board. Specifically, there are two noticeable time lags when booting Debian rootfs: 1. When systemd reports this stage: "Reached target getty.target - Login Prompts." it hangs for 5 seconds or so, before going further. 2. When I'm logging in, the system hangs for 60 seconds or so before the shell prompt appears. It only seems to reproduce the first time (during the boot). My attempt to re-start the mentioned systemd target or run "login" command again worked fine. Reverting commit 6975c1a486a4 ("block: remove the ioprio field from struct request") fixes the issue for me. Do you have any ideas by chance what might be the reason? Or maybe you have any pointers on debugging it? Thanks!
On Thu, Nov 21, 2024 at 11:04:19PM -0600, Sam Protsenko wrote: > Hi Christoph, > > This patch causes a regression on E850-96 board. Specifically, there are > two noticeable time lags when booting Debian rootfs: What storage driver does this board use? Anything else interesting about the setup?
On Fri, Nov 22, 2024 at 6:04 AM Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Nov 21, 2024 at 11:04:19PM -0600, Sam Protsenko wrote: > > Hi Christoph, > > > > This patch causes a regression on E850-96 board. Specifically, there are > > two noticeable time lags when booting Debian rootfs: > > What storage driver does this board use? Anything else interesting > about the setup? > It's an Exynos based board with eMMC, so it uses DW MMC driver, with Exynos glue layer on top of it, so: drivers/mmc/host/dw_mmc.c drivers/mmc/host/dw_mmc-exynos.c I'm using the regular ARM64 defconfig. Nothing fancy about this setup neither, the device tree with eMMC definition (mmc_0) is here: arch/arm64/boot/dts/exynos/exynos850-e850-96.dts FWIW, I was able to narrow down the issue to dd_insert_request() function. With this hack the freeze is gone: 8<-------------------------------------------------------------------->8 diff --git a/block/mq-deadline.c b/block/mq-deadline.c index acdc28756d9d..83d272b66e71 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; const enum dd_data_dir data_dir = rq_data_dir(rq); - u16 ioprio = req_get_ioprio(rq); + u16 ioprio = 0; /* the same as old req->ioprio */ u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); struct dd_per_prio *per_prio; enum dd_prio prio; 8<-------------------------------------------------------------------->8 Does it tell you anything about where the possible issue can be? Thanks!
On 11/22/24 1:55 PM, Sam Protsenko wrote: > On Fri, Nov 22, 2024 at 6:04 AM Christoph Hellwig <hch@lst.de> wrote: >> >> On Thu, Nov 21, 2024 at 11:04:19PM -0600, Sam Protsenko wrote: >>> Hi Christoph, >>> >>> This patch causes a regression on E850-96 board. Specifically, there are >>> two noticeable time lags when booting Debian rootfs: >> >> What storage driver does this board use? Anything else interesting >> about the setup? >> > > It's an Exynos based board with eMMC, so it uses DW MMC driver, with > Exynos glue layer on top of it, so: > > drivers/mmc/host/dw_mmc.c > drivers/mmc/host/dw_mmc-exynos.c > > I'm using the regular ARM64 defconfig. Nothing fancy about this setup > neither, the device tree with eMMC definition (mmc_0) is here: > > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts > > FWIW, I was able to narrow down the issue to dd_insert_request() > function. With this hack the freeze is gone: > > 8<-------------------------------------------------------------------->8 > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index acdc28756d9d..83d272b66e71 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx > *hctx, struct request *rq, > struct request_queue *q = hctx->queue; > struct deadline_data *dd = q->elevator->elevator_data; > const enum dd_data_dir data_dir = rq_data_dir(rq); > - u16 ioprio = req_get_ioprio(rq); > + u16 ioprio = 0; /* the same as old req->ioprio */ > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > struct dd_per_prio *per_prio; > enum dd_prio prio; > 8<-------------------------------------------------------------------->8 > > Does it tell you anything about where the possible issue can be? It seems like eMMC devices do not tolerate I/O prioritization. How about disabling I/O prioritization for eMMC setups? Is the ioprio cgroup controller perhaps activated by the user space software that is running on this setup? Thanks, Bart.
On Fri, Nov 22, 2024 at 03:55:23PM -0600, Sam Protsenko wrote: > It's an Exynos based board with eMMC, so it uses DW MMC driver, with > Exynos glue layer on top of it, so: > > drivers/mmc/host/dw_mmc.c > drivers/mmc/host/dw_mmc-exynos.c > > I'm using the regular ARM64 defconfig. Nothing fancy about this setup > neither, the device tree with eMMC definition (mmc_0) is here: > > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts Thanks. eMMC itself never looks at the ioprio field. > FWIW, I was able to narrow down the issue to dd_insert_request() > function. With this hack the freeze is gone: Sounds like it isn't the driver that matters here, but the scheduler. > > 8<-------------------------------------------------------------------->8 > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index acdc28756d9d..83d272b66e71 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx > *hctx, struct request *rq, > struct request_queue *q = hctx->queue; > struct deadline_data *dd = q->elevator->elevator_data; > const enum dd_data_dir data_dir = rq_data_dir(rq); > - u16 ioprio = req_get_ioprio(rq); > + u16 ioprio = 0; /* the same as old req->ioprio */ > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > struct dd_per_prio *per_prio; > enum dd_prio prio; > 8<-------------------------------------------------------------------->8 > > Does it tell you anything about where the possible issue can be? Can you dump the ioprities you see here with and without the reverted patch?
Hi Bart, On Fri, Nov 22, 2024 at 4:18 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 11/22/24 1:55 PM, Sam Protsenko wrote: > > On Fri, Nov 22, 2024 at 6:04 AM Christoph Hellwig <hch@lst.de> wrote: > >> > >> On Thu, Nov 21, 2024 at 11:04:19PM -0600, Sam Protsenko wrote: > >>> Hi Christoph, > >>> > >>> This patch causes a regression on E850-96 board. Specifically, there are > >>> two noticeable time lags when booting Debian rootfs: > >> > >> What storage driver does this board use? Anything else interesting > >> about the setup? > >> > > > > It's an Exynos based board with eMMC, so it uses DW MMC driver, with > > Exynos glue layer on top of it, so: > > > > drivers/mmc/host/dw_mmc.c > > drivers/mmc/host/dw_mmc-exynos.c > > > > I'm using the regular ARM64 defconfig. Nothing fancy about this setup > > neither, the device tree with eMMC definition (mmc_0) is here: > > > > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts > > > > FWIW, I was able to narrow down the issue to dd_insert_request() > > function. With this hack the freeze is gone: > > > > 8<-------------------------------------------------------------------->8 > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > > index acdc28756d9d..83d272b66e71 100644 > > --- a/block/mq-deadline.c > > +++ b/block/mq-deadline.c > > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx > > *hctx, struct request *rq, > > struct request_queue *q = hctx->queue; > > struct deadline_data *dd = q->elevator->elevator_data; > > const enum dd_data_dir data_dir = rq_data_dir(rq); > > - u16 ioprio = req_get_ioprio(rq); > > + u16 ioprio = 0; /* the same as old req->ioprio */ > > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > > struct dd_per_prio *per_prio; > > enum dd_prio prio; > > 8<-------------------------------------------------------------------->8 > > > > Does it tell you anything about where the possible issue can be? > > It seems like eMMC devices do not tolerate I/O prioritization. How about > disabling I/O prioritization for eMMC setups? Is the ioprio cgroup > controller perhaps activated by the user space software that is running > on this setup? > Can you please elaborate on why eMMC devices might not play well with prios? Do they lack some particular hardware support, like required commands? Also, I've noticed (probably) the same issue reported recently [1] for USB SSD drives. so it'd nice to have some references showing it's actually the case specifically for eMMC. Do you know if we have any config options to disable I/O prioritization? Not sure how exactly we can do that specifically for eMMC devices too. I'm not an expert in block layer, would appreciate some help with this one. For ioprio cgroup: CONFIG_BLK_CGROUP_IOPRIO option is disabled in my case (ARM64 defconfig), so I don't think it has to do with ioprio cgroup? [1] https://lore.kernel.org/all/CAP-bSRbCo7=wfUBZ8H7c3Q-7XSG+SB=R4MHHNNGPvBoinsVSZg@mail.gmail.com/ Thanks! > Thanks, > > Bart.
Hi Christoph, On Mon, Nov 25, 2024 at 1:37 AM Christoph Hellwig <hch@lst.de> wrote: > > On Fri, Nov 22, 2024 at 03:55:23PM -0600, Sam Protsenko wrote: > > It's an Exynos based board with eMMC, so it uses DW MMC driver, with > > Exynos glue layer on top of it, so: > > > > drivers/mmc/host/dw_mmc.c > > drivers/mmc/host/dw_mmc-exynos.c > > > > I'm using the regular ARM64 defconfig. Nothing fancy about this setup > > neither, the device tree with eMMC definition (mmc_0) is here: > > > > arch/arm64/boot/dts/exynos/exynos850-e850-96.dts > > Thanks. eMMC itself never looks at the ioprio field. > > > FWIW, I was able to narrow down the issue to dd_insert_request() > > function. With this hack the freeze is gone: > > Sounds like it isn't the driver that matters here, but the scheduler. > > > > > 8<-------------------------------------------------------------------->8 > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > > index acdc28756d9d..83d272b66e71 100644 > > --- a/block/mq-deadline.c > > +++ b/block/mq-deadline.c > > @@ -676,7 +676,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx > > *hctx, struct request *rq, > > struct request_queue *q = hctx->queue; > > struct deadline_data *dd = q->elevator->elevator_data; > > const enum dd_data_dir data_dir = rq_data_dir(rq); > > - u16 ioprio = req_get_ioprio(rq); > > + u16 ioprio = 0; /* the same as old req->ioprio */ > > u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); > > struct dd_per_prio *per_prio; > > enum dd_prio prio; > > 8<-------------------------------------------------------------------->8 > > > > Does it tell you anything about where the possible issue can be? > > Can you dump the ioprities you see here with and without the reverted > patch? > Collected the logs for you: - with patch reverted (ioprio is always 0): [1] - with patch present: [2] This code was added for printing the traces: 8<---------------------------------------------------------------------->8 static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_insert_t flags, struct list_head *free) { +#define IOPRIO_N_LIMIT 100 + static int ioprio_prev = 0, ioprio_n = 0; struct request_queue *q = hctx->queue; struct deadline_data *dd = q->elevator->elevator_data; const enum dd_data_dir data_dir = rq_data_dir(rq); u16 ioprio = req_get_ioprio(rq); u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio); struct dd_per_prio *per_prio; enum dd_prio prio; + ioprio_n++; + if (ioprio != ioprio_prev || ioprio_n == IOPRIO_N_LIMIT) { + pr_err("### %-20d : %d times\n", ioprio_prev, ioprio_n); + ioprio_n = 0; + } + ioprio_prev = ioprio; + lockdep_assert_held(&dd->lock); 8<---------------------------------------------------------------------->8 Specifically I'd pay attention to the next two places in [2], where the delays were introduced: 1. Starting getty service (5 second delay): 8<---------------------------------------------------------------------->8 [ 14.875199] ### 24580 : 1 times ... [ OK ] Started getty@tty1.service - Getty on tty1. [ OK ] Started serial-getty@ttySAâice - Serial Getty on ttySAC0. [ OK ] Reached target getty.target - Login Prompts. [ 19.425354] ### 0 : 100 times ... 8<---------------------------------------------------------------------->8 2. Login (more than 60 seconds delay): 8<---------------------------------------------------------------------->8 runner-vwmj3eza-project-40964107-concurrent-0 login: root ... [ 22.827432] ### 0 : 100 times ... [ 100.100402] ### 24580 : 1 times # 8<---------------------------------------------------------------------->8 I guess the results look similar to the logs from the neighboring thread [3]. Not sure if those tools (getty service and login tool) are running ioprio_set() internally, or it's just I/O scheduler acting up, but the freeze is happening consistently in those two places. Please let me know if I can help you debug that further. Not a block layer expert by any means, but that looks like a regression, at least on E850-96 board. Wonder if it's possible to reproduce that on other platforms. Thanks! [1] https://termbin.com/aus7 [2] https://termbin.com/za3t [3] https://lore.kernel.org/all/CAP-bSRab1C-_aaATfrgWjt9w0fcYUCQCG7u+TCb1FSPSd6CEaA@mail.gmail.com/
Hi Sam, please try the patch below: diff --git a/block/mq-deadline.c b/block/mq-deadline.c index acdc28756d9d..91b3789f710e 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -685,10 +685,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, prio = ioprio_class_to_prio[ioprio_class]; per_prio = &dd->per_prio[prio]; - if (!rq->elv.priv[0]) { + if (!rq->elv.priv[0]) per_prio->stats.inserted++; - rq->elv.priv[0] = (void *)(uintptr_t)1; - } + rq->elv.priv[0] = per_prio; if (blk_mq_sched_try_insert_merge(q, rq, free)) return; @@ -753,18 +752,14 @@ static void dd_prepare_request(struct request *rq) */ static void dd_finish_request(struct request *rq) { - struct request_queue *q = rq->q; - struct deadline_data *dd = q->elevator->elevator_data; - const u8 ioprio_class = dd_rq_ioclass(rq); - const enum dd_prio prio = ioprio_class_to_prio[ioprio_class]; - struct dd_per_prio *per_prio = &dd->per_prio[prio]; + struct dd_per_prio *per_prio = rq->elv.priv[0]; /* * The block layer core may call dd_finish_request() without having * called dd_insert_requests(). Skip requests that bypassed I/O * scheduling. See also blk_mq_request_bypass_insert(). */ - if (rq->elv.priv[0]) + if (per_prio) atomic_inc(&per_prio->stats.completed); }
On Tue, Nov 26, 2024 at 12:52 AM Christoph Hellwig <hch@lst.de> wrote: > > Hi Sam, > > please try the patch below: > I can confirm the patch fixes the issue on my side. Please add me to Cc: if you're going to send the fix. And I'm always available to run more tests on E850-96 board if you need it. Thanks! > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index acdc28756d9d..91b3789f710e 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -685,10 +685,9 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, > > prio = ioprio_class_to_prio[ioprio_class]; > per_prio = &dd->per_prio[prio]; > - if (!rq->elv.priv[0]) { > + if (!rq->elv.priv[0]) > per_prio->stats.inserted++; > - rq->elv.priv[0] = (void *)(uintptr_t)1; > - } > + rq->elv.priv[0] = per_prio; > > if (blk_mq_sched_try_insert_merge(q, rq, free)) > return; > @@ -753,18 +752,14 @@ static void dd_prepare_request(struct request *rq) > */ > static void dd_finish_request(struct request *rq) > { > - struct request_queue *q = rq->q; > - struct deadline_data *dd = q->elevator->elevator_data; > - const u8 ioprio_class = dd_rq_ioclass(rq); > - const enum dd_prio prio = ioprio_class_to_prio[ioprio_class]; > - struct dd_per_prio *per_prio = &dd->per_prio[prio]; > + struct dd_per_prio *per_prio = rq->elv.priv[0]; > > /* > * The block layer core may call dd_finish_request() without having > * called dd_insert_requests(). Skip requests that bypassed I/O > * scheduling. See also blk_mq_request_bypass_insert(). > */ > - if (rq->elv.priv[0]) > + if (per_prio) > atomic_inc(&per_prio->stats.completed); > } >
diff --git a/block/blk-merge.c b/block/blk-merge.c index 2306014c108d..df36f83f3738 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -871,11 +871,10 @@ static struct request *attempt_merge(struct request_queue *q, /* Don't merge requests with different write hints. */ if (req->bio->bi_write_hint != next->bio->bi_write_hint) return NULL; + if (req->bio->bi_ioprio != next->bio->bi_ioprio) + return NULL; } - if (req->ioprio != next->ioprio) - return NULL; - if (!blk_atomic_write_mergeable_rqs(req, next)) return NULL; @@ -1007,11 +1006,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) /* Don't merge requests with different write hints. */ if (rq->bio->bi_write_hint != bio->bi_write_hint) return false; + if (rq->bio->bi_ioprio != bio->bi_ioprio) + return false; } - if (rq->ioprio != bio_prio(bio)) - return false; - if (blk_atomic_write_mergeable_rq_bio(rq, bio) == false) return false; diff --git a/block/blk-mq.c b/block/blk-mq.c index 65e6b86d341c..3c6cadba75e3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -842,7 +842,7 @@ static void blk_print_req_error(struct request *req, blk_status_t status) blk_op_str(req_op(req)), (__force u32)(req->cmd_flags & ~REQ_OP_MASK), req->nr_phys_segments, - IOPRIO_PRIO_CLASS(req->ioprio)); + IOPRIO_PRIO_CLASS(req_get_ioprio(req))); } /* @@ -3306,7 +3306,6 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src, rq->special_vec = rq_src->special_vec; } rq->nr_phys_segments = rq_src->nr_phys_segments; - rq->ioprio = rq_src->ioprio; if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0) goto free_and_out; diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 2804fe181d9d..a28264442948 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -156,8 +156,6 @@ struct request { struct blk_crypto_keyslot *crypt_keyslot; #endif - unsigned short ioprio; - enum mq_rq_state state; atomic_t ref; @@ -221,7 +219,9 @@ static inline bool blk_rq_is_passthrough(struct request *rq) static inline unsigned short req_get_ioprio(struct request *req) { - return req->ioprio; + if (req->bio) + return req->bio->bi_ioprio; + return 0; } #define rq_data_dir(rq) (op_is_write(req_op(rq)) ? WRITE : READ) @@ -984,7 +984,6 @@ static inline void blk_rq_bio_prep(struct request *rq, struct bio *bio, rq->nr_phys_segments = nr_segs; rq->__data_len = bio->bi_iter.bi_size; rq->bio = rq->biotail = bio; - rq->ioprio = bio_prio(bio); } void blk_mq_hctx_set_fq_lock_class(struct blk_mq_hw_ctx *hctx, diff --git a/include/trace/events/block.h b/include/trace/events/block.h index 1527d5d45e01..bd0ea07338eb 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/block.h @@ -99,7 +99,7 @@ TRACE_EVENT(block_rq_requeue, __entry->dev = rq->q->disk ? disk_devt(rq->q->disk) : 0; __entry->sector = blk_rq_trace_sector(rq); __entry->nr_sector = blk_rq_trace_nr_sectors(rq); - __entry->ioprio = rq->ioprio; + __entry->ioprio = req_get_ioprio(rq); blk_fill_rwbs(__entry->rwbs, rq->cmd_flags); __get_str(cmd)[0] = '\0'; @@ -136,7 +136,7 @@ DECLARE_EVENT_CLASS(block_rq_completion, __entry->sector = blk_rq_pos(rq); __entry->nr_sector = nr_bytes >> 9; __entry->error = blk_status_to_errno(error); - __entry->ioprio = rq->ioprio; + __entry->ioprio = req_get_ioprio(rq); blk_fill_rwbs(__entry->rwbs, rq->cmd_flags); __get_str(cmd)[0] = '\0'; @@ -209,7 +209,7 @@ DECLARE_EVENT_CLASS(block_rq, __entry->sector = blk_rq_trace_sector(rq); __entry->nr_sector = blk_rq_trace_nr_sectors(rq); __entry->bytes = blk_rq_bytes(rq); - __entry->ioprio = rq->ioprio; + __entry->ioprio = req_get_ioprio(rq); blk_fill_rwbs(__entry->rwbs, rq->cmd_flags); __get_str(cmd)[0] = '\0';
The request ioprio is only initialized from the first attached bio, so requests without a bio already never set it. Directly use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-merge.c | 10 ++++------ block/blk-mq.c | 3 +-- include/linux/blk-mq.h | 7 +++---- include/trace/events/block.h | 6 +++--- 4 files changed, 11 insertions(+), 15 deletions(-)