Message ID | 20201206101451.14706-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Convert all vmstat counters to pages or bytes | expand |
On Sun 06-12-20 18:14:39, Muchun Song wrote: > Hi, > > This patch series is aimed to convert all THP vmstat counters to pages > and some KiB vmstat counters to bytes. > > The unit of some vmstat counters are pages, some are bytes, some are > HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > counters to the userspace, we have to know the unit of the vmstat counters > is which one. It makes the code complex. Because there are too many choices, > the probability of making a mistake will be greater. > > For example, the below is some bug fix: > - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > > This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > And make the unit of the vmstat counters are either pages or bytes. Fewer choices > means lower probability of making mistakes :). > > This was inspired by Johannes and Roman. Thanks to them. It would be really great if you could summarize the current and after the patch state so that exceptions are clear and easier to review. The existing situation is rather convoluted but we have at least units part of the name so it is not too hard to notice that. Reducing exeptions sounds nice but I am not really sure it is such an improvement it is worth a lot of code churn. Especially when it comes to KB vs B. Counting THPs as regular pages sounds like a good plan to me because we can expect that THP will be of a different size in the future - especially for file THPs. > Changes in v1 -> v2: > - Change the series subject from "Convert all THP vmstat counters to pages" > to "Convert all vmstat counters to pages or bytes". > - Convert NR_KERNEL_SCS_KB account to bytes. > - Convert vmstat slab counters to bytes. > - Remove {global_}node_page_state_pages. > > Muchun Song (12): > mm: memcontrol: fix NR_ANON_THPS account > mm: memcontrol: convert NR_ANON_THPS account to pages > mm: memcontrol: convert NR_FILE_THPS account to pages > mm: memcontrol: convert NR_SHMEM_THPS account to pages > mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages > mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages > mm: memcontrol: convert kernel stack account to bytes > mm: memcontrol: convert NR_KERNEL_SCS_KB account to bytes > mm: memcontrol: convert vmstat slab counters to bytes > mm: memcontrol: scale stat_threshold for byted-sized vmstat > mm: memcontrol: make the slab calculation consistent > mm: memcontrol: remove {global_}node_page_state_pages > > drivers/base/node.c | 25 ++++----- > fs/proc/meminfo.c | 22 ++++---- > include/linux/mmzone.h | 21 +++----- > include/linux/vmstat.h | 21 ++------ > kernel/fork.c | 8 +-- > kernel/power/snapshot.c | 2 +- > kernel/scs.c | 4 +- > mm/filemap.c | 4 +- > mm/huge_memory.c | 9 ++-- > mm/khugepaged.c | 4 +- > mm/memcontrol.c | 131 ++++++++++++++++++++++++------------------------ > mm/oom_kill.c | 2 +- > mm/page_alloc.c | 17 +++---- > mm/rmap.c | 19 ++++--- > mm/shmem.c | 3 +- > mm/vmscan.c | 2 +- > mm/vmstat.c | 54 ++++++++------------ > 17 files changed, 161 insertions(+), 187 deletions(-) > > -- > 2.11.0
On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: > > On Sun 06-12-20 18:14:39, Muchun Song wrote: > > Hi, > > > > This patch series is aimed to convert all THP vmstat counters to pages > > and some KiB vmstat counters to bytes. > > > > The unit of some vmstat counters are pages, some are bytes, some are > > HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > > counters to the userspace, we have to know the unit of the vmstat counters > > is which one. It makes the code complex. Because there are too many choices, > > the probability of making a mistake will be greater. > > > > For example, the below is some bug fix: > > - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > > - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > > > > This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > > And make the unit of the vmstat counters are either pages or bytes. Fewer choices > > means lower probability of making mistakes :). > > > > This was inspired by Johannes and Roman. Thanks to them. > > It would be really great if you could summarize the current and after > the patch state so that exceptions are clear and easier to review. The Agree. Will do in the next version. Thanks. > existing situation is rather convoluted but we have at least units part > of the name so it is not too hard to notice that. Reducing exeptions > sounds nice but I am not really sure it is such an improvement it is > worth a lot of code churn. Especially when it comes to KB vs B. Counting There are two vmstat counters (NR_KERNEL_STACK_KB and NR_KERNEL_SCS_KB) whose units are KB. If we do this, all vmstat counter units are either pages or bytes in the end. When we expose those counters to userspace, it can be easy. You can reference to: [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent From this point of view, I think that it is worth doing this. Right? > THPs as regular pages sounds like a good plan to me because we can > expect that THP will be of a different size in the future - especially > for file THPs. It can be easy to convert. > > > Changes in v1 -> v2: > > - Change the series subject from "Convert all THP vmstat counters to pages" > > to "Convert all vmstat counters to pages or bytes". > > - Convert NR_KERNEL_SCS_KB account to bytes. > > - Convert vmstat slab counters to bytes. > > - Remove {global_}node_page_state_pages. > > > > Muchun Song (12): > > mm: memcontrol: fix NR_ANON_THPS account > > mm: memcontrol: convert NR_ANON_THPS account to pages > > mm: memcontrol: convert NR_FILE_THPS account to pages > > mm: memcontrol: convert NR_SHMEM_THPS account to pages > > mm: memcontrol: convert NR_SHMEM_PMDMAPPED account to pages > > mm: memcontrol: convert NR_FILE_PMDMAPPED account to pages > > mm: memcontrol: convert kernel stack account to bytes > > mm: memcontrol: convert NR_KERNEL_SCS_KB account to bytes > > mm: memcontrol: convert vmstat slab counters to bytes > > mm: memcontrol: scale stat_threshold for byted-sized vmstat > > mm: memcontrol: make the slab calculation consistent > > mm: memcontrol: remove {global_}node_page_state_pages > > > > drivers/base/node.c | 25 ++++----- > > fs/proc/meminfo.c | 22 ++++---- > > include/linux/mmzone.h | 21 +++----- > > include/linux/vmstat.h | 21 ++------ > > kernel/fork.c | 8 +-- > > kernel/power/snapshot.c | 2 +- > > kernel/scs.c | 4 +- > > mm/filemap.c | 4 +- > > mm/huge_memory.c | 9 ++-- > > mm/khugepaged.c | 4 +- > > mm/memcontrol.c | 131 ++++++++++++++++++++++++------------------------ > > mm/oom_kill.c | 2 +- > > mm/page_alloc.c | 17 +++---- > > mm/rmap.c | 19 ++++--- > > mm/shmem.c | 3 +- > > mm/vmscan.c | 2 +- > > mm/vmstat.c | 54 ++++++++------------ > > 17 files changed, 161 insertions(+), 187 deletions(-) > > > > -- > > 2.11.0 > > -- > Michal Hocko > SUSE Labs -- Yours, Muchun
On Mon 07-12-20 22:52:30, Muchun Song wrote: > On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Sun 06-12-20 18:14:39, Muchun Song wrote: > > > Hi, > > > > > > This patch series is aimed to convert all THP vmstat counters to pages > > > and some KiB vmstat counters to bytes. > > > > > > The unit of some vmstat counters are pages, some are bytes, some are > > > HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > > > counters to the userspace, we have to know the unit of the vmstat counters > > > is which one. It makes the code complex. Because there are too many choices, > > > the probability of making a mistake will be greater. > > > > > > For example, the below is some bug fix: > > > - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > > > - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > > > > > > This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > > > And make the unit of the vmstat counters are either pages or bytes. Fewer choices > > > means lower probability of making mistakes :). > > > > > > This was inspired by Johannes and Roman. Thanks to them. > > > > It would be really great if you could summarize the current and after > > the patch state so that exceptions are clear and easier to review. The > > Agree. Will do in the next version. Thanks. > > > > existing situation is rather convoluted but we have at least units part > > of the name so it is not too hard to notice that. Reducing exeptions > > sounds nice but I am not really sure it is such an improvement it is > > worth a lot of code churn. Especially when it comes to KB vs B. Counting > > There are two vmstat counters (NR_KERNEL_STACK_KB and > NR_KERNEL_SCS_KB) whose units are KB. If we do this, all > vmstat counter units are either pages or bytes in the end. When > we expose those counters to userspace, it can be easy. You can > reference to: > > [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent > > From this point of view, I think that it is worth doing this. Right? Well, unless I am missing something, we have two counters in bytes, two in kB, both clearly distinguishable by the B/KB suffix. Changing KB to B will certainly reduce the different classes of units, no question about that, but I am not really sure this is worth all the code churn. Maybe others will think otherwise. As I've said the THP accounting change makes more sense to me because it allows future changes which are already undergoing so there is more merit in those.
On 12/7/20 7:02 AM, Michal Hocko wrote: > On Mon 07-12-20 22:52:30, Muchun Song wrote: >> On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: >>> >>> On Sun 06-12-20 18:14:39, Muchun Song wrote: >>>> Hi, >>>> >>>> This patch series is aimed to convert all THP vmstat counters to pages >>>> and some KiB vmstat counters to bytes. >>>> >>>> The unit of some vmstat counters are pages, some are bytes, some are >>>> HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat >>>> counters to the userspace, we have to know the unit of the vmstat counters >>>> is which one. It makes the code complex. Because there are too many choices, >>>> the probability of making a mistake will be greater. >>>> >>>> For example, the below is some bug fix: >>>> - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") >>>> - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") >>>> >>>> This patch series can make the code simple (161 insertions(+), 187 deletions(-)). >>>> And make the unit of the vmstat counters are either pages or bytes. Fewer choices >>>> means lower probability of making mistakes :). >>>> >>>> This was inspired by Johannes and Roman. Thanks to them. >>> >>> It would be really great if you could summarize the current and after >>> the patch state so that exceptions are clear and easier to review. The >> >> Agree. Will do in the next version. Thanks. >> >> >>> existing situation is rather convoluted but we have at least units part >>> of the name so it is not too hard to notice that. Reducing exeptions >>> sounds nice but I am not really sure it is such an improvement it is >>> worth a lot of code churn. Especially when it comes to KB vs B. Counting >> >> There are two vmstat counters (NR_KERNEL_STACK_KB and >> NR_KERNEL_SCS_KB) whose units are KB. If we do this, all >> vmstat counter units are either pages or bytes in the end. When >> we expose those counters to userspace, it can be easy. You can >> reference to: >> >> [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent >> >> From this point of view, I think that it is worth doing this. Right? > > Well, unless I am missing something, we have two counters in bytes, two > in kB, both clearly distinguishable by the B/KB suffix. Changing KB to B > will certainly reduce the different classes of units, no question about > that, but I am not really sure this is worth all the code churn. Maybe > others will think otherwise. > > As I've said the THP accounting change makes more sense to me because it > allows future changes which are already undergoing so there is more > merit in those. > Hi, Are there any documentation changes that go with these patches? Or are none needed? If the patches change the output in /proc/* or /sys/* then I expect there would need to be some doc changes. And is there any chance of confusing userspace s/w (binary or scripts) with these changes? thanks.
On Mon, Dec 07, 2020 at 04:02:54PM +0100, Michal Hocko wrote: > On Mon 07-12-20 22:52:30, Muchun Song wrote: > > On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Sun 06-12-20 18:14:39, Muchun Song wrote: > > > > Hi, > > > > > > > > This patch series is aimed to convert all THP vmstat counters to pages > > > > and some KiB vmstat counters to bytes. > > > > > > > > The unit of some vmstat counters are pages, some are bytes, some are > > > > HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > > > > counters to the userspace, we have to know the unit of the vmstat counters > > > > is which one. It makes the code complex. Because there are too many choices, > > > > the probability of making a mistake will be greater. > > > > > > > > For example, the below is some bug fix: > > > > - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > > > > - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > > > > > > > > This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > > > > And make the unit of the vmstat counters are either pages or bytes. Fewer choices > > > > means lower probability of making mistakes :). > > > > > > > > This was inspired by Johannes and Roman. Thanks to them. > > > > > > It would be really great if you could summarize the current and after > > > the patch state so that exceptions are clear and easier to review. The > > > > Agree. Will do in the next version. Thanks. > > > > > > > existing situation is rather convoluted but we have at least units part > > > of the name so it is not too hard to notice that. Reducing exeptions > > > sounds nice but I am not really sure it is such an improvement it is > > > worth a lot of code churn. Especially when it comes to KB vs B. Counting > > > > There are two vmstat counters (NR_KERNEL_STACK_KB and > > NR_KERNEL_SCS_KB) whose units are KB. If we do this, all > > vmstat counter units are either pages or bytes in the end. When > > we expose those counters to userspace, it can be easy. You can > > reference to: > > > > [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent > > > > From this point of view, I think that it is worth doing this. Right? > > Well, unless I am missing something, we have two counters in bytes, two > in kB, both clearly distinguishable by the B/KB suffix. Changing KB to B > will certainly reduce the different classes of units, no question about > that, but I am not really sure this is worth all the code churn. Maybe > others will think otherwise. Even if it was me who suggested it, I do agree. It's nice to have a smaller number of units, but if it creates a lot of hassle, then it makes not much sense. I think we need to look at the final version of patches and decide if it worth it or not. > > As I've said the THP accounting change makes more sense to me because it > allows future changes which are already undergoing so there is more > merit in those. +1 And this part is absolutely trivial.
On Mon, 7 Dec 2020, Roman Gushchin wrote: > On Mon, Dec 07, 2020 at 04:02:54PM +0100, Michal Hocko wrote: > > > > As I've said the THP accounting change makes more sense to me because it > > allows future changes which are already undergoing so there is more > > merit in those. > > +1 > And this part is absolutely trivial. It does need to be recognized that, with these changes, every THP stats update overflows the per-cpu counter, resorting to atomic global updates. And I'd like to see that mentioned in the commit message. But this change is consistent with 4.7's 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"): we accepted greater overhead for greater accuracy back then, so I think it's okay to do so for THP stats. Hugh
On Tue, Dec 8, 2020 at 2:51 AM Randy Dunlap <rdunlap@infradead.org> wrote: > > On 12/7/20 7:02 AM, Michal Hocko wrote: > > On Mon 07-12-20 22:52:30, Muchun Song wrote: > >> On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: > >>> > >>> On Sun 06-12-20 18:14:39, Muchun Song wrote: > >>>> Hi, > >>>> > >>>> This patch series is aimed to convert all THP vmstat counters to pages > >>>> and some KiB vmstat counters to bytes. > >>>> > >>>> The unit of some vmstat counters are pages, some are bytes, some are > >>>> HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > >>>> counters to the userspace, we have to know the unit of the vmstat counters > >>>> is which one. It makes the code complex. Because there are too many choices, > >>>> the probability of making a mistake will be greater. > >>>> > >>>> For example, the below is some bug fix: > >>>> - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > >>>> - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > >>>> > >>>> This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > >>>> And make the unit of the vmstat counters are either pages or bytes. Fewer choices > >>>> means lower probability of making mistakes :). > >>>> > >>>> This was inspired by Johannes and Roman. Thanks to them. > >>> > >>> It would be really great if you could summarize the current and after > >>> the patch state so that exceptions are clear and easier to review. The > >> > >> Agree. Will do in the next version. Thanks. > >> > >> > >>> existing situation is rather convoluted but we have at least units part > >>> of the name so it is not too hard to notice that. Reducing exeptions > >>> sounds nice but I am not really sure it is such an improvement it is > >>> worth a lot of code churn. Especially when it comes to KB vs B. Counting > >> > >> There are two vmstat counters (NR_KERNEL_STACK_KB and > >> NR_KERNEL_SCS_KB) whose units are KB. If we do this, all > >> vmstat counter units are either pages or bytes in the end. When > >> we expose those counters to userspace, it can be easy. You can > >> reference to: > >> > >> [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent > >> > >> From this point of view, I think that it is worth doing this. Right? > > > > Well, unless I am missing something, we have two counters in bytes, two > > in kB, both clearly distinguishable by the B/KB suffix. Changing KB to B > > will certainly reduce the different classes of units, no question about > > that, but I am not really sure this is worth all the code churn. Maybe > > others will think otherwise. > > > > As I've said the THP accounting change makes more sense to me because it > > allows future changes which are already undergoing so there is more > > merit in those. > > > > Hi, > > Are there any documentation changes that go with these patches? > Or are none needed? > > If the patches change the output in /proc/* or /sys/* then I expect > there would need to be some doc changes. Oh, we do not change the output. It is transparent to userspace. Thanks. > > And is there any chance of confusing userspace s/w (binary or scripts) > with these changes? > > thanks. > -- > ~Randy >
On Mon, Dec 7, 2020 at 11:02 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 07-12-20 22:52:30, Muchun Song wrote: > > On Mon, Dec 7, 2020 at 9:00 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Sun 06-12-20 18:14:39, Muchun Song wrote: > > > > Hi, > > > > > > > > This patch series is aimed to convert all THP vmstat counters to pages > > > > and some KiB vmstat counters to bytes. > > > > > > > > The unit of some vmstat counters are pages, some are bytes, some are > > > > HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat > > > > counters to the userspace, we have to know the unit of the vmstat counters > > > > is which one. It makes the code complex. Because there are too many choices, > > > > the probability of making a mistake will be greater. > > > > > > > > For example, the below is some bug fix: > > > > - 7de2e9f195b9 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") > > > > - not committed(it is the first commit in this series) ("mm: memcontrol: fix NR_ANON_THPS account") > > > > > > > > This patch series can make the code simple (161 insertions(+), 187 deletions(-)). > > > > And make the unit of the vmstat counters are either pages or bytes. Fewer choices > > > > means lower probability of making mistakes :). > > > > > > > > This was inspired by Johannes and Roman. Thanks to them. > > > > > > It would be really great if you could summarize the current and after > > > the patch state so that exceptions are clear and easier to review. The > > > > Agree. Will do in the next version. Thanks. > > > > > > > existing situation is rather convoluted but we have at least units part > > > of the name so it is not too hard to notice that. Reducing exeptions > > > sounds nice but I am not really sure it is such an improvement it is > > > worth a lot of code churn. Especially when it comes to KB vs B. Counting > > > > There are two vmstat counters (NR_KERNEL_STACK_KB and > > NR_KERNEL_SCS_KB) whose units are KB. If we do this, all > > vmstat counter units are either pages or bytes in the end. When > > we expose those counters to userspace, it can be easy. You can > > reference to: > > > > [RESEND PATCH v2 11/12] mm: memcontrol: make the slab calculation consistent > > > > From this point of view, I think that it is worth doing this. Right? > > Well, unless I am missing something, we have two counters in bytes, two > in kB, both clearly distinguishable by the B/KB suffix. Changing KB to B > will certainly reduce the different classes of units, no question about > that, but I am not really sure this is worth all the code churn. Maybe > others will think otherwise. > > As I've said the THP accounting change makes more sense to me because it > allows future changes which are already undergoing so there is more > merit in those. OK, will delete the convert of KB to B. Thanks. > -- > Michal Hocko > SUSE Labs
On Tue, Dec 8, 2020 at 4:33 AM Hugh Dickins <hughd@google.com> wrote: > > On Mon, 7 Dec 2020, Roman Gushchin wrote: > > On Mon, Dec 07, 2020 at 04:02:54PM +0100, Michal Hocko wrote: > > > > > > As I've said the THP accounting change makes more sense to me because it > > > allows future changes which are already undergoing so there is more > > > merit in those. > > > > +1 > > And this part is absolutely trivial. > > It does need to be recognized that, with these changes, every THP stats > update overflows the per-cpu counter, resorting to atomic global updates. > And I'd like to see that mentioned in the commit message. Thanks for reminding me. Will add. > > But this change is consistent with 4.7's 8f182270dfec ("mm/swap.c: flush > lru pvecs on compound page arrival"): we accepted greater overhead for > greater accuracy back then, so I think it's okay to do so for THP stats. Agree. Thanks. > > Hugh