diff mbox series

[RFC] delayacct: add memory reclaim delay in get_page_from_freelist

Message ID SH0PR01MB058780B26AB21BEB8E76B6BBC11CA@SH0PR01MB0587.CHNPR01.prod.partner.outlook.cn (mailing list archive)
State New
Headers show
Series [RFC] delayacct: add memory reclaim delay in get_page_from_freelist | expand

Commit Message

liwenyu01@bilibili.com Aug. 23, 2023, 9:54 a.m. UTC
The current memory reclaim delay statistics only count the direct memory
reclaim of the task in do_try_to_free_pages(). In systems with NUMA
open, some tasks occasionally experience slower response times, but the
total count of reclaim does not increase, using ftrace can show that
node_reclaim has occurred.

The memory reclaim occurring in get_page_from_freelist() is also due to
heavy memory load. To get the impact of tasks in memory reclaim, this
patch adds the statistics of the memory reclaim delay statistics for
__node_reclaim().

Signed-off-by: Wen Yu Li <liwenyu01@bilibili.com>
---
mm/vmscan.c | 2 ++
1 file changed, 2 insertions(+)

--
2.30.2


本?件??指定收件人使用并可能包含保密信息,若??收到本?件,敬?通知?件人,并立即?除本?件及所有副本。?不得擅自?播、??、保存或?制此?件(含附件)。感??的理解与配合。
This message may contain confidential information, and is intended only for the use of the addressee(s) named above. If you have received this message in error, please contact the sender immediately and delete all copies from your system. You are hereby notified that any dissemination, distribution, preservation or copying of this message and/or attachments is strictly prohibited. Thank you for your understanding and cooperation.

Comments

Andrew Morton Sept. 2, 2023, 11:44 p.m. UTC | #1
On Thu, 31 Aug 2023 07:26:20 +0000 "liwenyu01@bilibili.com" <liwenyu01@bilibili.com> wrote:

> reclaim of the task in do_try_to_free_pages(). In systems with NUMA
> open, some tasks occasionally experience slower response times, but the
> total count of reclaim does not increase, using ftrace can show that
> node_reclaim has occurred.
> 
> The memory reclaim occurring in get_page_from_freelist() is also due to
> heavy memory load. To get the impact of tasks in memory reclaim, this
> patch adds the statistics of the memory reclaim delay statistics for
> __node_reclaim().
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -8010,6 +8010,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> 
>         cond_resched();
>         psi_memstall_enter(&pflags);
> +       delayacct_freepages_start();
>         fs_reclaim_acquire(sc.gfp_mask);
>         /*
>          * We need to be able to allocate from the reserves for RECLAIM_UNMAP
> @@ -8032,6 +8033,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>         memalloc_noreclaim_restore(noreclaim_flag);
>         fs_reclaim_release(sc.gfp_mask);
>         psi_memstall_leave(&pflags);
> +       delayacct_freepages_end();
> 
>         trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);

__node_reclaim() calls shrink_node() which at some point will call
do_try_to_free_pages() (yes?), which calls delayacct_freepages_start().

So we're effectively nesting calls to delayacct_freepages_start(),
which isn't designed for that?
liwenyu01@bilibili.com Sept. 5, 2023, 2:56 a.m. UTC | #2
reclaim of the task in do_try_to_free_pages(). In systems with NUMA
> open, some tasks occasionally experience slower response times, but the
> total count of reclaim does not increase, using ftrace can show that
> node_reclaim has occurred.
>
> The memory reclaim occurring in get_page_from_freelist() is also due to
> heavy memory load. To get the impact of tasks in memory reclaim, this
> patch adds the statistics of the memory reclaim delay statistics for
> __node_reclaim().




>
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -8010,6 +8010,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>
>         cond_resched();
>         psi_memstall_enter(&pflags);
> +       delayacct_freepages_start();
>         fs_reclaim_acquire(sc.gfp_mask);
>         /*
>          * We need to be able to allocate from the reserves for RECLAIM_UNMAP
> @@ -8032,6 +8033,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>         memalloc_noreclaim_restore(noreclaim_flag);
>         fs_reclaim_release(sc.gfp_mask);
>         psi_memstall_leave(&pflags);
> +       delayacct_freepages_end();
>
>         trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);

__node_reclaim() calls shrink_node() which at some point will call
do_try_to_free_pages() (yes?), which calls delayacct_freepages_start().

So we're effectively nesting calls to delayacct_freepages_start(),
which isn't designed for that?


本?件??指定收件人使用并可能包含保密信息,若您?收到本?件,敬?通知?件人,并立即?除本?件及所有副本。您不得擅自?播、??、保存或复制此?件(含附件)。感?您的理解与配合。
This message may contain confidential information, and is intended only for the use of the addressee(s) named above. If you have received this message in error, please contact the sender immediately and delete all copies from your system. You are hereby notified that any dissemination, distribution, preservation or copying of this message and/or attachments is strictly prohibited. Thank you for your understanding and cooperation.
liwenyu01@bilibili.com Sept. 5, 2023, 5:32 a.m. UTC | #3
>> reclaim of the task in do_try_to_free_pages(). In systems with NUMA
>> open, some tasks occasionally experience slower response times, but the
>> total count of reclaim does not increase, using ftrace can show that
>> node_reclaim has occurred.
>>
>> The memory reclaim occurring in get_page_from_freelist() is also due to
>> heavy memory load. To get the impact of tasks in memory reclaim, this
>> patch adds the statistics of the memory reclaim delay statistics for
>> __node_reclaim().
>>
>> ...
>>
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -8010,6 +8010,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>>
>>         cond_resched();
>>         psi_memstall_enter(&pflags);
>> +       delayacct_freepages_start();
>>         fs_reclaim_acquire(sc.gfp_mask);
>>         /*
>>          * We need to be able to allocate from the reserves for RECLAIM_UNMAP
>> @@ -8032,6 +8033,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>>         memalloc_noreclaim_restore(noreclaim_flag);
>>         fs_reclaim_release(sc.gfp_mask);
>>         psi_memstall_leave(&pflags);
>> +       delayacct_freepages_end();
>>
>>         trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);
>
> __node_reclaim() calls shrink_node() which at some point will call
> do_try_to_free_pages() (yes?), which calls delayacct_freepages_start().
>
> So we're effectively nesting calls to delayacct_freepages_start(),
> which isn't designed for that?
>
sorry, the last reply was a mistake.

It seems that no point in shrink_node() will call do_try_to_free_pages().
And do_try_to_free_pages() will call shrink_node() through shrink_zones(),
if shrink_node() also has some point will call do_try_to_free_pages,then
delayacct_freepages_start() is nested now?

best wishes.


本?件??指定收件人使用并可能包含保密信息,若??收到本?件,敬?通知?件人,并立即?除本?件及所有副本。?不得擅自?播、??、保存或?制此?件(含附件)。感??的理解与配合。
This message may contain confidential information, and is intended only for the use of the addressee(s) named above. If you have received this message in error, please contact the sender immediately and delete all copies from your system. You are hereby notified that any dissemination, distribution, preservation or copying of this message and/or attachments is strictly prohibited. Thank you for your understanding and cooperation.
Education Directorate Sept. 6, 2023, 12:53 a.m. UTC | #4
On Tue, Sep 05, 2023 at 05:32:15AM +0000, liwenyu01@bilibili.com wrote:
> >> reclaim of the task in do_try_to_free_pages(). In systems with NUMA
> >> open, some tasks occasionally experience slower response times, but the
> >> total count of reclaim does not increase, using ftrace can show that
> >> node_reclaim has occurred.
> >>
> >> The memory reclaim occurring in get_page_from_freelist() is also due to
> >> heavy memory load. To get the impact of tasks in memory reclaim, this
> >> patch adds the statistics of the memory reclaim delay statistics for
> >> __node_reclaim().
> >>
> >> ...
> >>
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -8010,6 +8010,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >>
> >>         cond_resched();
> >>         psi_memstall_enter(&pflags);
> >> +       delayacct_freepages_start();
> >>         fs_reclaim_acquire(sc.gfp_mask);
> >>         /*
> >>          * We need to be able to allocate from the reserves for RECLAIM_UNMAP
> >> @@ -8032,6 +8033,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >>         memalloc_noreclaim_restore(noreclaim_flag);
> >>         fs_reclaim_release(sc.gfp_mask);
> >>         psi_memstall_leave(&pflags);
> >> +       delayacct_freepages_end();
> >>
> >>         trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);
> >
> > __node_reclaim() calls shrink_node() which at some point will call
> > do_try_to_free_pages() (yes?), which calls delayacct_freepages_start().
> >
> > So we're effectively nesting calls to delayacct_freepages_start(),
> > which isn't designed for that?
> >
> sorry, the last reply was a mistake.
> 
> It seems that no point in shrink_node() will call do_try_to_free_pages().
> And do_try_to_free_pages() will call shrink_node() through shrink_zones(),
> if shrink_node() also has some point will call do_try_to_free_pages,then
> delayacct_freepages_start() is nested now?

That's because shrink_node() goes through shrink_list() via
shrink_lruvec()? do_try_to_free_pages() will call shrink_node(). Ideally
we should have some counters around __node_reclaim() and balance_pgdat()
like psi_memstall_* does. Do you want to mimic what psi_memstall_* does?
This would change the definition of delayacct free pages, but I don't think
it will make it worse.

Balbir Singh
liwenyu01@bilibili.com Sept. 7, 2023, 11:52 a.m. UTC | #5
>> >> reclaim of the task in do_try_to_free_pages(). In systems with NUMA
>> >> open, some tasks occasionally experience slower response times, but the
>> >> total count of reclaim does not increase, using ftrace can show that
>> >> node_reclaim has occurred.
>> >>
>> >> The memory reclaim occurring in get_page_from_freelist() is also due to
>> >> heavy memory load. To get the impact of tasks in memory reclaim, this
>> >> patch adds the statistics of the memory reclaim delay statistics for
>> >> __node_reclaim().
>> >>
>> >> ...
>> >>
>> >> --- a/mm/vmscan.c
>> >> +++ b/mm/vmscan.c
>> >> @@ -8010,6 +8010,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>> >>
>> >>         cond_resched();
>> >>         psi_memstall_enter(&pflags);
>> >> +       delayacct_freepages_start();
>> >>         fs_reclaim_acquire(sc.gfp_mask);
>> >>         /*
>> >>          * We need to be able to allocate from the reserves for RECLAIM_UNMAP
>> >> @@ -8032,6 +8033,7 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>> >>         memalloc_noreclaim_restore(noreclaim_flag);
>> >>         fs_reclaim_release(sc.gfp_mask);
>> >>         psi_memstall_leave(&pflags);
>> >> +       delayacct_freepages_end();
>> >>
>> >>         trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);
>> >
>> > __node_reclaim() calls shrink_node() which at some point will call
>> > do_try_to_free_pages() (yes?), which calls delayacct_freepages_start().
>> >
>> > So we're effectively nesting calls to delayacct_freepages_start(),
>> > which isn't designed for that?
>> >
>> sorry, the last reply was a mistake.
>>
>> It seems that no point in shrink_node() will call do_try_to_free_pages().
>> And do_try_to_free_pages() will call shrink_node() through shrink_zones(),
>> if shrink_node() also has some point will call do_try_to_free_pages,then
>> delayacct_freepages_start() is nested now?
>
> That's because shrink_node() goes through shrink_list() via
> shrink_lruvec()? do_try_to_free_pages() will call shrink_node(). Ideally
> we should have some counters around __node_reclaim() and balance_pgdat()
> like psi_memstall_* does. Do you want to mimic what psi_memstall_* does?
> This would change the definition of delayacct free pages, but I don't think
> it will make it worse.
>
> Balbir Singh


The focus of delayacct should be the memory recalim delay statistics for
each task, and there should be only few direct connections with shrink_node()?


At least it seems like the using of delayacct_freepages_start() is not
wrong right now, so there is unnecessary to implement a new counting method?


Compared with the delay statistics of balance_pgdat() for kswapd, is it
more meaningful to keep the definition of delayacct free pages and only
statistics for application?


Keep the definition of delayacct free pages, going back to this simple
patch, it only does one very simple thing, counting the memory reclaim
delay due to memory pressure on the memory allocation path of
application. Currently only measure the memory recalim delay in
do_try_to_free_pages(), this patch adds statistical points in
__node_reclaim(), both do_try_to_free_pages() and __node_reclaim()
will call shrink_node().

WenYu


本?件??指定收件人使用并可能包含保密信息,若??收到本?件,敬?通知?件人,并立即?除本?件及所有副本。?不得擅自?播、??、保存或?制此?件(含附件)。感??的理解与配合。
This message may contain confidential information, and is intended only for the use of the addressee(s) named above. If you have received this message in error, please contact the sender immediately and delete all copies from your system. You are hereby notified that any dissemination, distribution, preservation or copying of this message and/or attachments is strictly prohibited. Thank you for your understanding and cooperation.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1080209a568b..d2471abce1ae 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -8010,6 +8010,7 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in

        cond_resched();
        psi_memstall_enter(&pflags);
+       delayacct_freepages_start();
        fs_reclaim_acquire(sc.gfp_mask);
        /*
         * We need to be able to allocate from the reserves for RECLAIM_UNMAP
@@ -8032,6 +8033,7 @@  static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
        memalloc_noreclaim_restore(noreclaim_flag);
        fs_reclaim_release(sc.gfp_mask);
        psi_memstall_leave(&pflags);
+       delayacct_freepages_end();

        trace_mm_vmscan_node_reclaim_end(sc.nr_reclaimed);