Message ID | 1638356341-17014-1-git-send-email-huangzhaoyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] mm: count zram read/write into PSI_IO_WAIT | expand |
There is no chance for zram reading/writing to be counted in PSI_IO_WAIT so far as zram will deal with the request just in current context without invoking submit_bio and io_schedule. On Wed, Dec 1, 2021 at 6:59 PM Huangzhaoyang <huangzhaoyang@gmail.com> wrote: > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > Have zram reading/writing be counted in PSI_IO_WAIT. > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > --- > drivers/block/zram/zram_drv.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index fcaf275..b0e4766 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -34,6 +34,7 @@ > #include <linux/debugfs.h> > #include <linux/cpuhotplug.h> > #include <linux/part_stat.h> > +#include <linux/psi.h> > > #include "zram_drv.h" > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > zram_get_element(zram, index), > bio, partial_io); > } > - > +#ifdef CONFIG_PSI > + psi_task_change(current, 0, TSK_IOWAIT); > +#endif > handle = zram_get_handle(zram, index); > if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) { > unsigned long value; > @@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > zram_fill_page(mem, PAGE_SIZE, value); > kunmap_atomic(mem); > zram_slot_unlock(zram, index); > +#ifdef CONFIG_PSI > + psi_task_change(current, TSK_IOWAIT, 0); > +#endif > return 0; > } > > @@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > if (WARN_ON(ret)) > pr_err("Decompression failed! err=%d, page=%u\n", ret, index); > > +#ifdef CONFIG_PSI > + psi_task_change(current, TSK_IOWAIT, 0); > +#endif > return ret; > } > > @@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, > vec.bv_offset = 0; > } > > +#ifdef CONFIG_PSI > + psi_task_change(current, 0, TSK_IOWAIT); > +#endif > ret = __zram_bvec_write(zram, &vec, index, bio); > +#ifdef CONFIG_PSI > + psi_task_change(current, TSK_IOWAIT, 0); > +#endif > out: > if (is_partial_io(bvec)) > __free_page(page); > @@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio) > atomic64_inc(&zram->stats.invalid_io); > goto error; > } > - > __zram_make_request(zram, bio); > return BLK_QC_T_NONE; > > -- > 1.9.1 >
On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote: > There is no chance for zram reading/writing to be counted in > PSI_IO_WAIT so far as zram will deal with the request just in current > context without invoking submit_bio and io_schedule. Hm, but you're also not waiting for a real io device - during which the CPU could be doing something else e.g. You're waiting for decompression. The thread also isn't in D-state during that time. What scenario would benefit from this accounting? How is IO pressure from comp/decomp paths actionable to you? What about when you use zram with disk writeback enabled, and you see a mix of decompression and actual disk IO. Wouldn't you want to be able to tell the two apart, to see if you're short on CPU or short on IO bandwidth in this setup? Your patch would make that impossible. This needs a much more comprehensive changelog. > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > > zram_get_element(zram, index), > > bio, partial_io); > > } > > - > > +#ifdef CONFIG_PSI > > + psi_task_change(current, 0, TSK_IOWAIT); > > +#endif Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs.
On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote: > > There is no chance for zram reading/writing to be counted in > > PSI_IO_WAIT so far as zram will deal with the request just in current > > context without invoking submit_bio and io_schedule. > > Hm, but you're also not waiting for a real io device - during which > the CPU could be doing something else e.g. You're waiting for > decompression. The thread also isn't in D-state during that time. What > scenario would benefit from this accounting? How is IO pressure from > comp/decomp paths actionable to you? No. Block device related D-state will be counted in via psi_dequeue(io_wait). What I am proposing here is do NOT ignore the influence on non-productive time by huge numbers of in-context swap in/out (zram like). This can help to make IO pressure more accurate and coordinate with the number of PSWPIN/OUT. It is like counting the IO time within filemap_fault->wait_on_page_bit_common into psi_mem_stall, which introduces memory pressure high by IO. > > What about when you use zram with disk writeback enabled, and you see > a mix of decompression and actual disk IO. Wouldn't you want to be > able to tell the two apart, to see if you're short on CPU or short on > IO bandwidth in this setup? Your patch would make that impossible. OK. Is it better to start the IO counting from pageout? Both of the bdev and ram backed swap would benefit from it. > > This needs a much more comprehensive changelog. > > > > @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, > > > zram_get_element(zram, index), > > > bio, partial_io); > > > } > > > - > > > +#ifdef CONFIG_PSI > > > + psi_task_change(current, 0, TSK_IOWAIT); > > > +#endif > > Add psi_iostall_enter() and leave helpers that encapsulate the ifdefs. OK.
On Fri, Dec 03, 2021 at 05:16:47PM +0800, Zhaoyang Huang wrote: > On Fri, Dec 3, 2021 at 12:28 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > On Wed, Dec 01, 2021 at 07:12:30PM +0800, Zhaoyang Huang wrote: > > > There is no chance for zram reading/writing to be counted in > > > PSI_IO_WAIT so far as zram will deal with the request just in current > > > context without invoking submit_bio and io_schedule. > > > > Hm, but you're also not waiting for a real io device - during which > > the CPU could be doing something else e.g. You're waiting for > > decompression. The thread also isn't in D-state during that time. What > > scenario would benefit from this accounting? How is IO pressure from > > comp/decomp paths actionable to you? > No. Block device related D-state will be counted in via > psi_dequeue(io_wait). What I am proposing here is do NOT ignore the > influence on non-productive time by huge numbers of in-context swap > in/out (zram like). This can help to make IO pressure more accurate > and coordinate with the number of PSWPIN/OUT. It is like counting the > IO time within filemap_fault->wait_on_page_bit_common into > psi_mem_stall, which introduces memory pressure high by IO. It's not ignored, it shows up as memory pressure. Because those delays occur due to a lack of memory. On the other hand, having a faster IO device would make no difference to the time spent on compression and decompression. Counting this time as IO pressure makes no sense to me. I'm against merging this patch.
Zhaoyang Huang writes: >No. Block device related D-state will be counted in via >psi_dequeue(io_wait). What I am proposing here is do NOT ignore the >influence on non-productive time by huge numbers of in-context swap >in/out (zram like). This can help to make IO pressure more accurate >and coordinate with the number of PSWPIN/OUT. It is like counting the >IO time within filemap_fault->wait_on_page_bit_common into >psi_mem_stall, which introduces memory pressure high by IO. I think part of the confusion here is that the name "io" doesn't really just mean "io", it means "disk I/O". As in, we are targeting real, physical or network disk I/O. Of course, we can only do what's reasonable if the device we're accounting for is layers upon layers eventually leading to a memory-backed device, but _intentionally_ polluting that with more memory-bound accesses doesn't make any sense when we already have separate accounting for memory. Why would anyone want that? I'm with Johannes here, I think this would actively make memory pressure monitoring less useful. This is a NAK from my perspective as someone who actually uses these things in production.
Chris Down writes: >I'm with Johannes here, I think this would actively make memory >pressure monitoring less useful. This is a NAK from my perspective as >someone who actually uses these things in production. s/memory/io/
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index fcaf275..b0e4766 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -34,6 +34,7 @@ #include <linux/debugfs.h> #include <linux/cpuhotplug.h> #include <linux/part_stat.h> +#include <linux/psi.h> #include "zram_drv.h" @@ -1246,7 +1247,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, zram_get_element(zram, index), bio, partial_io); } - +#ifdef CONFIG_PSI + psi_task_change(current, 0, TSK_IOWAIT); +#endif handle = zram_get_handle(zram, index); if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) { unsigned long value; @@ -1257,6 +1260,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, zram_fill_page(mem, PAGE_SIZE, value); kunmap_atomic(mem); zram_slot_unlock(zram, index); +#ifdef CONFIG_PSI + psi_task_change(current, TSK_IOWAIT, 0); +#endif return 0; } @@ -1284,6 +1290,9 @@ static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index, if (WARN_ON(ret)) pr_err("Decompression failed! err=%d, page=%u\n", ret, index); +#ifdef CONFIG_PSI + psi_task_change(current, TSK_IOWAIT, 0); +#endif return ret; } @@ -1471,7 +1480,13 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, vec.bv_offset = 0; } +#ifdef CONFIG_PSI + psi_task_change(current, 0, TSK_IOWAIT); +#endif ret = __zram_bvec_write(zram, &vec, index, bio); +#ifdef CONFIG_PSI + psi_task_change(current, TSK_IOWAIT, 0); +#endif out: if (is_partial_io(bvec)) __free_page(page); @@ -1607,7 +1622,6 @@ static blk_qc_t zram_submit_bio(struct bio *bio) atomic64_inc(&zram->stats.invalid_io); goto error; } - __zram_make_request(zram, bio); return BLK_QC_T_NONE;