Message ID | 1631015968-9779-1-git-send-email-huangzhaoyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm : bail out from psi memstall after submit_bio in swap_readpage | expand |
On 9/7/21 13:59, Huangzhaoyang wrote: > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > It doesn't make sense to count IO time into psi memstall. Bail out after > bio submitted. Isn't that the point if psi, to observe real stalls, which include IO? Anyway, CCing Johannes. > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > --- > mm/page_io.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index c493ce9..1d131fc 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous) > count_vm_event(PSWPIN); > bio_get(bio); > qc = submit_bio(bio); > + psi_memstall_leave(&pflags); > while (synchronous) { > set_current_state(TASK_UNINTERRUPTIBLE); > if (!READ_ONCE(bio->bi_private)) > @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous) > } > __set_current_state(TASK_RUNNING); > bio_put(bio); > - > + return ret; > out: > psi_memstall_leave(&pflags); > return ret; >
On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 9/7/21 13:59, Huangzhaoyang wrote: > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > It doesn't make sense to count IO time into psi memstall. Bail out after > > bio submitted. > > Isn't that the point if psi, to observe real stalls, which include IO? > Anyway, CCing Johannes. IO stalls could be observed within blk_io_schedule. The time cost of the data from block device to RAM is counted here. The original purpose is to deal with the ZRAM alike devices which deal with the bio locally instead of submitting it to request queue. > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > --- > > mm/page_io.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > index c493ce9..1d131fc 100644 > > --- a/mm/page_io.c > > +++ b/mm/page_io.c > > @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous) > > count_vm_event(PSWPIN); > > bio_get(bio); > > qc = submit_bio(bio); > > + psi_memstall_leave(&pflags); > > while (synchronous) { > > set_current_state(TASK_UNINTERRUPTIBLE); > > if (!READ_ONCE(bio->bi_private)) > > @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous) > > } > > __set_current_state(TASK_RUNNING); > > bio_put(bio); > > - > > + return ret; > > out: > > psi_memstall_leave(&pflags); > > return ret; > > >
On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote: > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > On 9/7/21 13:59, Huangzhaoyang wrote: > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > > > It doesn't make sense to count IO time into psi memstall. Bail out after > > > bio submitted. > > > > Isn't that the point if psi, to observe real stalls, which include IO? Yes, correct. > IO stalls could be observed within blk_io_schedule. The time cost of > the data from block device to RAM is counted here. Yes, that is on purpose. The time a thread waits for swap read IO is time in which the thread is not productive due to a lack of memory. For async-submitted IO, this happens in lock_page() called from do_swap_page(). If the submitting thread directly waits after the submit_bio(), then that should be accounted too. This patch doesn't make sense to me.
On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote: > > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > > On 9/7/21 13:59, Huangzhaoyang wrote: > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > > > > > It doesn't make sense to count IO time into psi memstall. Bail out after > > > > bio submitted. > > > > > > Isn't that the point if psi, to observe real stalls, which include IO? > > Yes, correct. > > > IO stalls could be observed within blk_io_schedule. The time cost of > > the data from block device to RAM is counted here. > > Yes, that is on purpose. The time a thread waits for swap read IO is > time in which the thread is not productive due to a lack of memory. > > For async-submitted IO, this happens in lock_page() called from > do_swap_page(). If the submitting thread directly waits after the > submit_bio(), then that should be accounted too. IMO, memstall counting should be terminated by bio submitted. blk driver fetching request and the operation on the real device shouldn't be counted in. It especially doesn't make sense in a virtualization system like XEN etc, where the blk driver is implemented via backend-frontend way that introduce memory irrelevant latency > > This patch doesn't make sense to me.
On Wed, Sep 08, 2021 at 11:35:40AM +0800, Zhaoyang Huang wrote: > On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote: > > > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > > > > On 9/7/21 13:59, Huangzhaoyang wrote: > > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com> > > > > > > > > > > It doesn't make sense to count IO time into psi memstall. Bail out after > > > > > bio submitted. > > > > > > > > Isn't that the point if psi, to observe real stalls, which include IO? > > > > Yes, correct. > > > > > IO stalls could be observed within blk_io_schedule. The time cost of > > > the data from block device to RAM is counted here. > > > > Yes, that is on purpose. The time a thread waits for swap read IO is > > time in which the thread is not productive due to a lack of memory. > > > > For async-submitted IO, this happens in lock_page() called from > > do_swap_page(). If the submitting thread directly waits after the > > submit_bio(), then that should be accounted too. > IMO, memstall counting should be terminated by bio submitted. blk > driver fetching request and the operation on the real device shouldn't > be counted in. It especially doesn't make sense in a virtualization > system like XEN etc, where the blk driver is implemented via > backend-frontend way that introduce memory irrelevant latency Yes but the entire IO operation and all the associated latency only happens due to a shortage of memory in the first place. The thread is incurring these delays due to a lack of memory. What is a memstall if not the latencies and wait times incurred in the process of reloading pages that were evicted prematurely?
diff --git a/mm/page_io.c b/mm/page_io.c index c493ce9..1d131fc 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous) count_vm_event(PSWPIN); bio_get(bio); qc = submit_bio(bio); + psi_memstall_leave(&pflags); while (synchronous) { set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio->bi_private)) @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous) } __set_current_state(TASK_RUNNING); bio_put(bio); - + return ret; out: psi_memstall_leave(&pflags); return ret;