diff mbox series

[v4,5/5] uprobe: collapse THP pmd after removing all uprobes

Message ID 20190613175747.1964753-6-songliubraving@fb.com (mailing list archive)
State New, archived
Headers show
Series THP aware uprobe | expand

Commit Message

Song Liu June 13, 2019, 5:57 p.m. UTC
After all uprobes are removed from the huge page (with PTE pgtable), it
is possible to collapse the pmd and benefit from THP again. This patch
does the collapse.

An issue on earlier version was discovered by kbuild test robot.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/huge_mm.h |  7 +++++
 kernel/events/uprobes.c |  5 ++-
 mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+), 1 deletion(-)

Comments

Kirill A . Shutemov June 21, 2019, 12:48 p.m. UTC | #1
On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
> After all uprobes are removed from the huge page (with PTE pgtable), it
> is possible to collapse the pmd and benefit from THP again. This patch
> does the collapse.
> 
> An issue on earlier version was discovered by kbuild test robot.
> 
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  include/linux/huge_mm.h |  7 +++++
>  kernel/events/uprobes.c |  5 ++-
>  mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++

I still sync it's duplication of khugepaged functinallity. We need to fix
khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
be able to call for collapse of particular range if we have all locks
taken (as we do in uprobe case).
Song Liu June 21, 2019, 1:17 p.m. UTC | #2
> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
>> After all uprobes are removed from the huge page (with PTE pgtable), it
>> is possible to collapse the pmd and benefit from THP again. This patch
>> does the collapse.
>> 
>> An issue on earlier version was discovered by kbuild test robot.
>> 
>> Reported-by: kbuild test robot <lkp@intel.com>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> include/linux/huge_mm.h |  7 +++++
>> kernel/events/uprobes.c |  5 ++-
>> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
> 
> I still sync it's duplication of khugepaged functinallity. We need to fix
> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
> be able to call for collapse of particular range if we have all locks
> taken (as we do in uprobe case).
> 

I see the point now. I misunderstood it for a while. 

If we add this to khugepaged, it will have some conflicts with my other 
patchset. How about we move the functionality to khugepaged after these
two sets get in? 

Thanks,
Song
Kirill A . Shutemov June 21, 2019, 1:36 p.m. UTC | #3
On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote:
> 
> 
> > On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > 
> > On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
> >> After all uprobes are removed from the huge page (with PTE pgtable), it
> >> is possible to collapse the pmd and benefit from THP again. This patch
> >> does the collapse.
> >> 
> >> An issue on earlier version was discovered by kbuild test robot.
> >> 
> >> Reported-by: kbuild test robot <lkp@intel.com>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >> include/linux/huge_mm.h |  7 +++++
> >> kernel/events/uprobes.c |  5 ++-
> >> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
> > 
> > I still sync it's duplication of khugepaged functinallity. We need to fix
> > khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
> > be able to call for collapse of particular range if we have all locks
> > taken (as we do in uprobe case).
> > 
> 
> I see the point now. I misunderstood it for a while. 
> 
> If we add this to khugepaged, it will have some conflicts with my other 
> patchset. How about we move the functionality to khugepaged after these
> two sets get in? 

Is the last patch of the patchset essential? I think this part can be done
a bit later in a proper way, no?
Song Liu June 21, 2019, 1:45 p.m. UTC | #4
> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote:
>> 
>> 
>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>> 
>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
>>>> After all uprobes are removed from the huge page (with PTE pgtable), it
>>>> is possible to collapse the pmd and benefit from THP again. This patch
>>>> does the collapse.
>>>> 
>>>> An issue on earlier version was discovered by kbuild test robot.
>>>> 
>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>> ---
>>>> include/linux/huge_mm.h |  7 +++++
>>>> kernel/events/uprobes.c |  5 ++-
>>>> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
>>> 
>>> I still sync it's duplication of khugepaged functinallity. We need to fix
>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
>>> be able to call for collapse of particular range if we have all locks
>>> taken (as we do in uprobe case).
>>> 
>> 
>> I see the point now. I misunderstood it for a while. 
>> 
>> If we add this to khugepaged, it will have some conflicts with my other 
>> patchset. How about we move the functionality to khugepaged after these
>> two sets get in? 
> 
> Is the last patch of the patchset essential? I think this part can be done
> a bit later in a proper way, no?

Technically, we need this patch to regroup pmd mapped page, and thus get 
the performance benefit after the uprobe is detached. 

On the other hand, if we get the first 4 patches of the this set and the 
other set in soonish. I will work on improving this patch right after that..

Thanks,
Song
Song Liu June 21, 2019, 4:30 p.m. UTC | #5
> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> 
>> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote:
>>> 
>>> 
>>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>>> 
>>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
>>>>> After all uprobes are removed from the huge page (with PTE pgtable), it
>>>>> is possible to collapse the pmd and benefit from THP again. This patch
>>>>> does the collapse.
>>>>> 
>>>>> An issue on earlier version was discovered by kbuild test robot.
>>>>> 
>>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>>> ---
>>>>> include/linux/huge_mm.h |  7 +++++
>>>>> kernel/events/uprobes.c |  5 ++-
>>>>> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> I still sync it's duplication of khugepaged functinallity. We need to fix
>>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
>>>> be able to call for collapse of particular range if we have all locks
>>>> taken (as we do in uprobe case).
>>>> 
>>> 
>>> I see the point now. I misunderstood it for a while. 
>>> 
>>> If we add this to khugepaged, it will have some conflicts with my other 
>>> patchset. How about we move the functionality to khugepaged after these
>>> two sets get in? 
>> 
>> Is the last patch of the patchset essential? I think this part can be done
>> a bit later in a proper way, no?
> 
> Technically, we need this patch to regroup pmd mapped page, and thus get 
> the performance benefit after the uprobe is detached. 
> 
> On the other hand, if we get the first 4 patches of the this set and the 
> other set in soonish. I will work on improving this patch right after that..

Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() 
in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after 
my other set). 

Let me fold that in and send v5. 

Thanks,
Song
Song Liu June 21, 2019, 6:04 p.m. UTC | #6
> On Jun 21, 2019, at 9:30 AM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>> 
>>> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote:
>>>> 
>>>> 
>>>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>>>> 
>>>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
>>>>>> After all uprobes are removed from the huge page (with PTE pgtable), it
>>>>>> is possible to collapse the pmd and benefit from THP again. This patch
>>>>>> does the collapse.
>>>>>> 
>>>>>> An issue on earlier version was discovered by kbuild test robot.
>>>>>> 
>>>>>> Reported-by: kbuild test robot <lkp@intel.com>
>>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>>>> ---
>>>>>> include/linux/huge_mm.h |  7 +++++
>>>>>> kernel/events/uprobes.c |  5 ++-
>>>>>> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> I still sync it's duplication of khugepaged functinallity. We need to fix
>>>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
>>>>> be able to call for collapse of particular range if we have all locks
>>>>> taken (as we do in uprobe case).
>>>>> 
>>>> 
>>>> I see the point now. I misunderstood it for a while. 
>>>> 
>>>> If we add this to khugepaged, it will have some conflicts with my other 
>>>> patchset. How about we move the functionality to khugepaged after these
>>>> two sets get in? 
>>> 
>>> Is the last patch of the patchset essential? I think this part can be done
>>> a bit later in a proper way, no?
>> 
>> Technically, we need this patch to regroup pmd mapped page, and thus get 
>> the performance benefit after the uprobe is detached. 
>> 
>> On the other hand, if we get the first 4 patches of the this set and the 
>> other set in soonish. I will work on improving this patch right after that..
> 
> Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() 
> in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after 
> my other set). 
> 
> Let me fold that in and send v5. 

On a second thought, if we would have khugepaged to do collapse, we need a
dedicated bit to tell khugepaged which pmd to collapse. Otherwise, it may 
accidentally collapse pmd that are split by other split_huge_pmd. 

I could not think of a good solution here. In this case, we really need a
flag for this specific pmd. Flags for the compound_page or for the vma 
would not work, as those could be shared by multiple pmds. 

If the analysis above is correct, we probably need uprobe to specifically
call try_collapse_huge_pmd() for now.

Please share your suggestions. 

Thanks!
Song
Kirill A . Shutemov June 24, 2019, 12:34 p.m. UTC | #7
On Fri, Jun 21, 2019 at 06:04:14PM +0000, Song Liu wrote:
> 
> 
> > On Jun 21, 2019, at 9:30 AM, Song Liu <songliubraving@fb.com> wrote:
> > 
> > 
> > 
> >> On Jun 21, 2019, at 6:45 AM, Song Liu <songliubraving@fb.com> wrote:
> >> 
> >> 
> >> 
> >>> On Jun 21, 2019, at 6:36 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >>> 
> >>> On Fri, Jun 21, 2019 at 01:17:05PM +0000, Song Liu wrote:
> >>>> 
> >>>> 
> >>>>> On Jun 21, 2019, at 5:48 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >>>>> 
> >>>>> On Thu, Jun 13, 2019 at 10:57:47AM -0700, Song Liu wrote:
> >>>>>> After all uprobes are removed from the huge page (with PTE pgtable), it
> >>>>>> is possible to collapse the pmd and benefit from THP again. This patch
> >>>>>> does the collapse.
> >>>>>> 
> >>>>>> An issue on earlier version was discovered by kbuild test robot.
> >>>>>> 
> >>>>>> Reported-by: kbuild test robot <lkp@intel.com>
> >>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
> >>>>>> ---
> >>>>>> include/linux/huge_mm.h |  7 +++++
> >>>>>> kernel/events/uprobes.c |  5 ++-
> >>>>>> mm/huge_memory.c        | 69 +++++++++++++++++++++++++++++++++++++++++
> >>>>> 
> >>>>> I still sync it's duplication of khugepaged functinallity. We need to fix
> >>>>> khugepaged to handle SCAN_PAGE_COMPOUND and probably refactor the code to
> >>>>> be able to call for collapse of particular range if we have all locks
> >>>>> taken (as we do in uprobe case).
> >>>>> 
> >>>> 
> >>>> I see the point now. I misunderstood it for a while. 
> >>>> 
> >>>> If we add this to khugepaged, it will have some conflicts with my other 
> >>>> patchset. How about we move the functionality to khugepaged after these
> >>>> two sets get in? 
> >>> 
> >>> Is the last patch of the patchset essential? I think this part can be done
> >>> a bit later in a proper way, no?
> >> 
> >> Technically, we need this patch to regroup pmd mapped page, and thus get 
> >> the performance benefit after the uprobe is detached. 
> >> 
> >> On the other hand, if we get the first 4 patches of the this set and the 
> >> other set in soonish. I will work on improving this patch right after that..
> > 
> > Actually, it might be pretty easy. We can just call try_collapse_huge_pmd() 
> > in khugepaged.c (in khugepaged_scan_shmem() or khugepaged_scan_file() after 
> > my other set). 
> > 
> > Let me fold that in and send v5. 
> 
> On a second thought, if we would have khugepaged to do collapse, we need a
> dedicated bit to tell khugepaged which pmd to collapse. Otherwise, it may 
> accidentally collapse pmd that are split by other split_huge_pmd. 

Why is it a problem? Do you know a situation where such collapse possible
and will break split_huge_pmd() user's expectation. If there's such user
it is broken: normal locking should prevent such situation.
diff mbox series

Patch

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c150c21d..30669e9a9340 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -250,6 +250,9 @@  static inline bool thp_migration_supported(void)
 	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
 }
 
+extern void try_collapse_huge_pmd(struct vm_area_struct *vma,
+				  struct page *page);
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -368,6 +371,10 @@  static inline bool thp_migration_supported(void)
 {
 	return false;
 }
+
+static inline void try_collapse_huge_pmd(struct vm_area_struct *vma,
+					 struct page *page) {}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a20d7b43a056..9bec602bf79e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -474,6 +474,7 @@  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	struct page *old_page, *new_page;
 	struct vm_area_struct *vma;
 	int ret, is_register, ref_ctr_updated = 0;
+	struct page *orig_page = NULL;
 
 	is_register = is_swbp_insn(&opcode);
 	uprobe = container_of(auprobe, struct uprobe, arch);
@@ -512,7 +513,6 @@  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
 
 	if (!is_register) {
-		struct page *orig_page;
 		pgoff_t index;
 
 		index = vaddr_to_offset(vma, vaddr & PAGE_MASK) >> PAGE_SHIFT;
@@ -540,6 +540,9 @@  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret && is_register && ref_ctr_updated)
 		update_ref_ctr(uprobe, mm, -1);
 
+	if (!ret && orig_page && PageTransCompound(orig_page))
+		try_collapse_huge_pmd(vma, orig_page);
+
 	return ret;
 }
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..cc8464650b72 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2886,6 +2886,75 @@  static struct shrinker deferred_split_shrinker = {
 	.flags = SHRINKER_NUMA_AWARE,
 };
 
+/**
+ * try_collapse_huge_pmd - try collapse pmd for a pte mapped huge page
+ * @vma: vma containing the huge page
+ * @page: any sub page of the huge page
+ */
+void try_collapse_huge_pmd(struct vm_area_struct *vma,
+			   struct page *page)
+{
+	struct page *hpage = compound_head(page);
+	struct mm_struct *mm = vma->vm_mm;
+	struct mmu_notifier_range range;
+	unsigned long haddr;
+	unsigned long addr;
+	pmd_t *pmd, _pmd;
+	spinlock_t *ptl;
+	int i, count = 0;
+
+	VM_BUG_ON_PAGE(!PageCompound(page), page);
+
+	haddr = page_address_in_vma(hpage, vma);
+	pmd = mm_find_pmd(mm, haddr);
+	if (!pmd)
+		return;
+
+	lock_page(hpage);
+	ptl = pmd_lock(mm, pmd);
+
+	/* step 1: check all mapped PTEs */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+
+		if (pte_none(*pte))
+			continue;
+		if (hpage + i != vm_normal_page(vma, addr, *pte)) {
+			spin_unlock(ptl);
+			unlock_page(hpage);
+			return;
+		}
+		count++;
+	}
+
+	/* step 2: adjust rmap */
+	for (i = 0, addr = haddr; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE) {
+		pte_t *pte = pte_offset_map(pmd, addr);
+		struct page *p;
+
+		if (pte_none(*pte))
+			continue;
+		p = vm_normal_page(vma, addr, *pte);
+		page_remove_rmap(p, false);
+	}
+
+	/* step 3: flip page table */
+	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
+				haddr, haddr + HPAGE_PMD_SIZE);
+	mmu_notifier_invalidate_range_start(&range);
+
+	_pmd = pmdp_collapse_flush(vma, haddr, pmd);
+	spin_unlock(ptl);
+	mmu_notifier_invalidate_range_end(&range);
+
+	/* step 4: free pgtable, set refcount, mm_counters, etc. */
+	page_ref_sub(page, count);
+	unlock_page(hpage);
+	mm_dec_nr_ptes(mm);
+	pte_free(mm, pmd_pgtable(_pmd));
+	add_mm_counter(mm, mm_counter_file(page), -count);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int split_huge_pages_set(void *data, u64 val)
 {