Message ID | 20200203232248.104733-6-almasrymina@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v11,1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter | expand |
On 2/3/20 3:22 PM, Mina Almasry wrote: > Support MAP_NORESERVE accounting as part of the new counter. > > For each hugepage allocation, at allocation time we check if there is > a reservation for this allocation or not. If there is a reservation for > this allocation, then this allocation was charged at reservation time, > and we don't re-account it. If there is no reserevation for this > allocation, we charge the appropriate hugetlb_cgroup. > > The hugetlb_cgroup to uncharge for this allocation is stored in > page[3].private. We use new APIs added in an earlier patch to set this > pointer. Ah! That reminded me to look at the migration code. Turns out that none of the existing cgroup information (page[2]) is being migrated today. That is a bug. :( I'll confirm and fix in a patch separate from this series. We will need to make sure that new information added by this series in page[3] is also migrated. That would be in an earlier patch where the use of the field is introduced. > > Signed-off-by: Mina Almasry <almasrymina@google.com> > > --- > > Changes in v10: > - Refactored deferred_reserve check. > > --- > mm/hugetlb.c | 28 +++++++++++++++++++++++++++- > 1 file changed, 27 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 33818ccaf7e89..ec0b55ea1506e 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1339,6 +1339,9 @@ static void __free_huge_page(struct page *page) > clear_page_huge_active(page); > hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), > page, false); > + hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), > + page, true); > + When looking at the code without change markings, the two above lines look so similar my first thought is there must be a mistake. A suggestion for better code readability: - hugetlb_cgroup_uncharge_page could just take "struct hstate *h" and get both hstate_index(h) and pages_per_huge_page(h). - Perhaps make hugetlb_cgroup_uncharge_page and hugetlb_cgroup_uncharge_page_rsvd be wrappers around a common routine. Then the above would look like: hugetlb_cgroup_uncharge_page(h, page); hugetlb_cgroup_uncharge_page_rsvd(h, page); > if (restore_reserve) > h->resv_huge_pages++; > > @@ -2172,6 +2175,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > long gbl_chg; > int ret, idx; > struct hugetlb_cgroup *h_cg; > + bool deferred_reserve; > > idx = hstate_index(h); > /* > @@ -2209,10 +2213,20 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > gbl_chg = 1; > } > > + /* If this allocation is not consuming a reservation, charge it now. > + */ > + deferred_reserve = map_chg || avoid_reserve || !vma_resv_map(vma); > + if (deferred_reserve) { > + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), > + &h_cg, true); > + if (ret) > + goto out_subpool_put; > + } > + > ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg, > false); Hmmm? I'm starting to like the wrapper idea more as a way to help with readability of the bool rsvd argument. hugetlb_cgroup_charge_cgroup_rsvd() hugetlb_cgroup_charge_cgroup() At least to me it makes it easier to read.
On 2/6/20 2:31 PM, Mike Kravetz wrote: > On 2/3/20 3:22 PM, Mina Almasry wrote: >> Support MAP_NORESERVE accounting as part of the new counter. >> >> For each hugepage allocation, at allocation time we check if there is >> a reservation for this allocation or not. If there is a reservation for >> this allocation, then this allocation was charged at reservation time, >> and we don't re-account it. If there is no reserevation for this >> allocation, we charge the appropriate hugetlb_cgroup. >> >> The hugetlb_cgroup to uncharge for this allocation is stored in >> page[3].private. We use new APIs added in an earlier patch to set this >> pointer. > > Ah! That reminded me to look at the migration code. Turns out that none > of the existing cgroup information (page[2]) is being migrated today. That > is a bug. :( I'll confirm and fix in a patch separate from this series. > We will need to make sure that new information added by this series in page[3] > is also migrated. That would be in an earlier patch where the use of the > field is introduced. My appologies! cgroup information is migrated and you took care of it for new reservation information in patch 2. Please disregard that statement.
On Thu, Feb 6, 2020 at 2:31 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 2/3/20 3:22 PM, Mina Almasry wrote: > > Support MAP_NORESERVE accounting as part of the new counter. > > > > For each hugepage allocation, at allocation time we check if there is > > a reservation for this allocation or not. If there is a reservation for > > this allocation, then this allocation was charged at reservation time, > > and we don't re-account it. If there is no reserevation for this > > allocation, we charge the appropriate hugetlb_cgroup. > > > > The hugetlb_cgroup to uncharge for this allocation is stored in > > page[3].private. We use new APIs added in an earlier patch to set this > > pointer. > > Ah! That reminded me to look at the migration code. Turns out that none > of the existing cgroup information (page[2]) is being migrated today. That > is a bug. :( I'll confirm and fix in a patch separate from this series. > We will need to make sure that new information added by this series in page[3] > is also migrated. That would be in an earlier patch where the use of the > field is introduced. > > > > > Signed-off-by: Mina Almasry <almasrymina@google.com> > > > > --- > > > > Changes in v10: > > - Refactored deferred_reserve check. > > > > --- > > mm/hugetlb.c | 28 +++++++++++++++++++++++++++- > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 33818ccaf7e89..ec0b55ea1506e 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1339,6 +1339,9 @@ static void __free_huge_page(struct page *page) > > clear_page_huge_active(page); > > hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), > > page, false); > > + hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), > > + page, true); > > + > > When looking at the code without change markings, the two above lines > look so similar my first thought is there must be a mistake. > > A suggestion for better code readability: > - hugetlb_cgroup_uncharge_page could just take "struct hstate *h" and > get both hstate_index(h) and pages_per_huge_page(h). > - Perhaps make hugetlb_cgroup_uncharge_page and > hugetlb_cgroup_uncharge_page_rsvd be wrappers around a common routine. > Then the above would look like: > > hugetlb_cgroup_uncharge_page(h, page); > hugetlb_cgroup_uncharge_page_rsvd(h, page); > I did modify the interfaces to this, as it's much better for readability indeed. Unfortunately the patch the adds interfaces probably needs a re-review now as it's changed quite a bit, I did not carry your or David's Reviewed-by. > > > if (restore_reserve) > > h->resv_huge_pages++; > > > > @@ -2172,6 +2175,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > long gbl_chg; > > int ret, idx; > > struct hugetlb_cgroup *h_cg; > > + bool deferred_reserve; > > > > idx = hstate_index(h); > > /* > > @@ -2209,10 +2213,20 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > gbl_chg = 1; > > } > > > > + /* If this allocation is not consuming a reservation, charge it now. > > + */ > > + deferred_reserve = map_chg || avoid_reserve || !vma_resv_map(vma); > > + if (deferred_reserve) { > > + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), > > + &h_cg, true); > > + if (ret) > > + goto out_subpool_put; > > + } > > + > > ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg, > > false); > > Hmmm? I'm starting to like the wrapper idea more as a way to help with > readability of the bool rsvd argument. > > hugetlb_cgroup_charge_cgroup_rsvd() > hugetlb_cgroup_charge_cgroup() > > At least to me it makes it easier to read. > -- > Mike Kravetz > > > if (ret) > > - goto out_subpool_put; > > + goto out_uncharge_cgroup_reservation; > > > > spin_lock(&hugetlb_lock); > > /* > > @@ -2236,6 +2250,14 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > } > > hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page, > > false); > > + /* If allocation is not consuming a reservation, also store the > > + * hugetlb_cgroup pointer on the page. > > + */ > > + if (deferred_reserve) { > > + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, > > + page, true); > > + } > > + > > spin_unlock(&hugetlb_lock); > > > > set_page_private(page, (unsigned long)spool); > > @@ -2261,6 +2283,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > out_uncharge_cgroup: > > hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg, > > false); > > +out_uncharge_cgroup_reservation: > > + if (deferred_reserve) > > + hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), > > + h_cg, true); > > out_subpool_put: > > if (map_chg || avoid_reserve) > > hugepage_subpool_put_pages(spool, 1); > > -- > > 2.25.0.341.g760bfbb309-goog
On 2/11/20 1:35 PM, Mina Almasry wrote: > On Thu, Feb 6, 2020 at 2:31 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> On 2/3/20 3:22 PM, Mina Almasry wrote: >>> +++ b/mm/hugetlb.c >>> @@ -1339,6 +1339,9 @@ static void __free_huge_page(struct page *page) >>> clear_page_huge_active(page); >>> hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), >>> page, false); >>> + hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), >>> + page, true); >>> + >> >> When looking at the code without change markings, the two above lines >> look so similar my first thought is there must be a mistake. >> >> A suggestion for better code readability: >> - hugetlb_cgroup_uncharge_page could just take "struct hstate *h" and >> get both hstate_index(h) and pages_per_huge_page(h). >> - Perhaps make hugetlb_cgroup_uncharge_page and >> hugetlb_cgroup_uncharge_page_rsvd be wrappers around a common routine. >> Then the above would look like: >> >> hugetlb_cgroup_uncharge_page(h, page); >> hugetlb_cgroup_uncharge_page_rsvd(h, page); >> > > I did modify the interfaces to this, as it's much better for > readability indeed. Unfortunately the patch the adds interfaces > probably needs a re-review now as it's changed quite a bit, I did not > carry your or David's Reviewed-by. No worries. Happy to review again.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 33818ccaf7e89..ec0b55ea1506e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1339,6 +1339,9 @@ static void __free_huge_page(struct page *page) clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page, false); + hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), + page, true); + if (restore_reserve) h->resv_huge_pages++; @@ -2172,6 +2175,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, long gbl_chg; int ret, idx; struct hugetlb_cgroup *h_cg; + bool deferred_reserve; idx = hstate_index(h); /* @@ -2209,10 +2213,20 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, gbl_chg = 1; } + /* If this allocation is not consuming a reservation, charge it now. + */ + deferred_reserve = map_chg || avoid_reserve || !vma_resv_map(vma); + if (deferred_reserve) { + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), + &h_cg, true); + if (ret) + goto out_subpool_put; + } + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg, false); if (ret) - goto out_subpool_put; + goto out_uncharge_cgroup_reservation; spin_lock(&hugetlb_lock); /* @@ -2236,6 +2250,14 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page, false); + /* If allocation is not consuming a reservation, also store the + * hugetlb_cgroup pointer on the page. + */ + if (deferred_reserve) { + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, + page, true); + } + spin_unlock(&hugetlb_lock); set_page_private(page, (unsigned long)spool); @@ -2261,6 +2283,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg, false); +out_uncharge_cgroup_reservation: + if (deferred_reserve) + hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), + h_cg, true); out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1);
Support MAP_NORESERVE accounting as part of the new counter. For each hugepage allocation, at allocation time we check if there is a reservation for this allocation or not. If there is a reservation for this allocation, then this allocation was charged at reservation time, and we don't re-account it. If there is no reserevation for this allocation, we charge the appropriate hugetlb_cgroup. The hugetlb_cgroup to uncharge for this allocation is stored in page[3].private. We use new APIs added in an earlier patch to set this pointer. Signed-off-by: Mina Almasry <almasrymina@google.com> --- Changes in v10: - Refactored deferred_reserve check. --- mm/hugetlb.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) -- 2.25.0.341.g760bfbb309-goog