diff mbox series

mm : bail out from psi memstall after submit_bio in swap_readpage

Message ID 1631015968-9779-1-git-send-email-huangzhaoyang@gmail.com (mailing list archive)
State New
Headers show
Series mm : bail out from psi memstall after submit_bio in swap_readpage | expand

Commit Message

Zhaoyang Huang Sept. 7, 2021, 11:59 a.m. UTC
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

It doesn't make sense to count IO time into psi memstall. Bail out after
bio submitted.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/page_io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Vlastimil Babka Sept. 7, 2021, 12:02 p.m. UTC | #1
On 9/7/21 13:59, Huangzhaoyang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> It doesn't make sense to count IO time into psi memstall. Bail out after
> bio submitted.

Isn't that the point if psi, to observe real stalls, which include IO?
Anyway, CCing Johannes.

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  mm/page_io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c493ce9..1d131fc 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous)
>  	count_vm_event(PSWPIN);
>  	bio_get(bio);
>  	qc = submit_bio(bio);
> +	psi_memstall_leave(&pflags);
>  	while (synchronous) {
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		if (!READ_ONCE(bio->bi_private))
> @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
>  	}
>  	__set_current_state(TASK_RUNNING);
>  	bio_put(bio);
> -
> +	return ret;
>  out:
>  	psi_memstall_leave(&pflags);
>  	return ret;
>
Zhaoyang Huang Sept. 7, 2021, 12:15 p.m. UTC | #2
On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/7/21 13:59, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > It doesn't make sense to count IO time into psi memstall. Bail out after
> > bio submitted.
>
> Isn't that the point if psi, to observe real stalls, which include IO?
> Anyway, CCing Johannes.
IO stalls could be observed within blk_io_schedule. The time cost of
the data from block device to RAM is counted here. The original
purpose is to deal with the ZRAM alike devices which deal with the bio
locally instead of submitting it to request queue.
>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  mm/page_io.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index c493ce9..1d131fc 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -423,6 +423,7 @@ int swap_readpage(struct page *page, bool synchronous)
> >       count_vm_event(PSWPIN);
> >       bio_get(bio);
> >       qc = submit_bio(bio);
> > +     psi_memstall_leave(&pflags);
> >       while (synchronous) {
> >               set_current_state(TASK_UNINTERRUPTIBLE);
> >               if (!READ_ONCE(bio->bi_private))
> > @@ -433,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
> >       }
> >       __set_current_state(TASK_RUNNING);
> >       bio_put(bio);
> > -
> > +     return ret;
> >  out:
> >       psi_memstall_leave(&pflags);
> >       return ret;
> >
>
Johannes Weiner Sept. 7, 2021, 1:26 p.m. UTC | #3
On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > bio submitted.
> >
> > Isn't that the point if psi, to observe real stalls, which include IO?

Yes, correct.

> IO stalls could be observed within blk_io_schedule. The time cost of
> the data from block device to RAM is counted here.

Yes, that is on purpose. The time a thread waits for swap read IO is
time in which the thread is not productive due to a lack of memory.

For async-submitted IO, this happens in lock_page() called from
do_swap_page(). If the submitting thread directly waits after the
submit_bio(), then that should be accounted too.

This patch doesn't make sense to me.
Zhaoyang Huang Sept. 8, 2021, 3:35 a.m. UTC | #4
On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > > bio submitted.
> > >
> > > Isn't that the point if psi, to observe real stalls, which include IO?
>
> Yes, correct.
>
> > IO stalls could be observed within blk_io_schedule. The time cost of
> > the data from block device to RAM is counted here.
>
> Yes, that is on purpose. The time a thread waits for swap read IO is
> time in which the thread is not productive due to a lack of memory.
>
> For async-submitted IO, this happens in lock_page() called from
> do_swap_page(). If the submitting thread directly waits after the
> submit_bio(), then that should be accounted too.
IMO, memstall counting should be terminated by bio submitted. blk
driver fetching request and the operation on the real device shouldn't
be counted in. It especially doesn't make sense in a virtualization
system like XEN etc, where the blk driver is implemented via
backend-frontend way that introduce  memory irrelevant latency

>
> This patch doesn't make sense to me.
Johannes Weiner Sept. 8, 2021, 11:36 a.m. UTC | #5
On Wed, Sep 08, 2021 at 11:35:40AM +0800, Zhaoyang Huang wrote:
> On Tue, Sep 7, 2021 at 9:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Tue, Sep 07, 2021 at 08:15:30PM +0800, Zhaoyang Huang wrote:
> > > On Tue, Sep 7, 2021 at 8:03 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > On 9/7/21 13:59, Huangzhaoyang wrote:
> > > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > > >
> > > > > It doesn't make sense to count IO time into psi memstall. Bail out after
> > > > > bio submitted.
> > > >
> > > > Isn't that the point if psi, to observe real stalls, which include IO?
> >
> > Yes, correct.
> >
> > > IO stalls could be observed within blk_io_schedule. The time cost of
> > > the data from block device to RAM is counted here.
> >
> > Yes, that is on purpose. The time a thread waits for swap read IO is
> > time in which the thread is not productive due to a lack of memory.
> >
> > For async-submitted IO, this happens in lock_page() called from
> > do_swap_page(). If the submitting thread directly waits after the
> > submit_bio(), then that should be accounted too.
> IMO, memstall counting should be terminated by bio submitted. blk
> driver fetching request and the operation on the real device shouldn't
> be counted in. It especially doesn't make sense in a virtualization
> system like XEN etc, where the blk driver is implemented via
> backend-frontend way that introduce  memory irrelevant latency

Yes but the entire IO operation and all the associated latency only
happens due to a shortage of memory in the first place. The thread is
incurring these delays due to a lack of memory.

What is a memstall if not the latencies and wait times incurred in the
process of reloading pages that were evicted prematurely?
diff mbox series

Patch

diff --git a/mm/page_io.c b/mm/page_io.c
index c493ce9..1d131fc 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -423,6 +423,7 @@  int swap_readpage(struct page *page, bool synchronous)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	qc = submit_bio(bio);
+	psi_memstall_leave(&pflags);
 	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
@@ -433,7 +434,7 @@  int swap_readpage(struct page *page, bool synchronous)
 	}
 	__set_current_state(TASK_RUNNING);
 	bio_put(bio);
-
+	return ret;
 out:
 	psi_memstall_leave(&pflags);
 	return ret;