Message ID | 20181206084604.17167-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: thp: fix soft dirty for migration when split | expand |
On Thu, Dec 06, 2018 at 04:46:04PM +0800, Peter Xu wrote: > When splitting a huge migrating PMD, we'll transfer the soft dirty bit > from the huge page to the small pages. However we're possibly using a > wrong data since when fetching the bit we're using pmd_soft_dirty() > upon a migration entry. Fix it up. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. > > CC: Andrea Arcangeli <aarcange@redhat.com> > CC: Andrew Morton <akpm@linux-foundation.org> > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > CC: Matthew Wilcox <willy@infradead.org> > CC: Michal Hocko <mhocko@suse.com> > CC: Dave Jiang <dave.jiang@intel.com> > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > CC: Souptick Joarder <jrdr.linux@gmail.com> > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > CC: linux-mm@kvack.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > > I noticed this during code reading. Only compile tested. I'm sending > a patch directly for review comments since it's relatively > straightforward and not easy to test. Please have a look, thanks. > --- > mm/huge_memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index f2d19e4fe854..fb0787c3dd3b 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2161,7 +2161,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > SetPageDirty(page); > write = pmd_write(old_pmd); > young = pmd_young(old_pmd); > - soft_dirty = pmd_soft_dirty(old_pmd); > + if (unlikely(pmd_migration)) > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > + else > + soft_dirty = pmd_soft_dirty(old_pmd); > > /* > * Withdraw the table only after we mark the pmd entry invalid. > -- > 2.17.1 > Regards,
On Fri, Dec 7, 2018 at 6:34 AM Peter Xu <peterx@redhat.com> wrote: > > On Thu, Dec 06, 2018 at 04:46:04PM +0800, Peter Xu wrote: > > When splitting a huge migrating PMD, we'll transfer the soft dirty bit > > from the huge page to the small pages. However we're possibly using a > > wrong data since when fetching the bit we're using pmd_soft_dirty() > > upon a migration entry. Fix it up. > > Note that if my understanding is correct about the problem then if > without the patch there is chance to lose some of the dirty bits in > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > of swap offset instead of bit 2) and it could potentially corrupt the > memory of an userspace program which depends on the dirty bit. It seems this code is broken in case of pmd_migraion: old_pmd = pmdp_invalidate(vma, haddr, pmd); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION pmd_migration = is_pmd_migration_entry(old_pmd); if (pmd_migration) { swp_entry_t entry; entry = pmd_to_swp_entry(old_pmd); page = pfn_to_page(swp_offset(entry)); } else #endif page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); if (pmd_dirty(old_pmd)) SetPageDirty(page); write = pmd_write(old_pmd); young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); Not just soft_dirt - all bits (dirty, write, young) have diffrent encoding or not present at all for migration entry. > > > > > CC: Andrea Arcangeli <aarcange@redhat.com> > > CC: Andrew Morton <akpm@linux-foundation.org> > > CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > CC: Matthew Wilcox <willy@infradead.org> > > CC: Michal Hocko <mhocko@suse.com> > > CC: Dave Jiang <dave.jiang@intel.com> > > CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > CC: Souptick Joarder <jrdr.linux@gmail.com> > > CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > > CC: linux-mm@kvack.org > > CC: linux-kernel@vger.kernel.org > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > > > I noticed this during code reading. Only compile tested. I'm sending > > a patch directly for review comments since it's relatively > > straightforward and not easy to test. Please have a look, thanks. > > --- > > mm/huge_memory.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index f2d19e4fe854..fb0787c3dd3b 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2161,7 +2161,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, > > SetPageDirty(page); > > write = pmd_write(old_pmd); > > young = pmd_young(old_pmd); > > - soft_dirty = pmd_soft_dirty(old_pmd); > > + if (unlikely(pmd_migration)) > > + soft_dirty = pmd_swp_soft_dirty(old_pmd); > > + else > > + soft_dirty = pmd_soft_dirty(old_pmd); > > > > /* > > * Withdraw the table only after we mark the pmd entry invalid. > > -- > > 2.17.1 > > > > Regards, > > -- > Peter Xu > <div dir="ltr"><div dir="ltr"><br>On Fri, Dec 7, 2018 at 6:34 AM Peter Xu <<a href="mailto:peterx@redhat.com">peterx@redhat.com</a>> wrote:<br>><br>> On Thu, Dec 06, 2018 at 04:46:04PM +0800, Peter Xu wrote:<br>> > When splitting a huge migrating PMD, we'll transfer the soft dirty bit<br>> > from the huge page to the small pages. However we're possibly using a<br>> > wrong data since when fetching the bit we're using pmd_soft_dirty()<br>> > upon a migration entry. Fix it up.<br>><br>> Note that if my understanding is correct about the problem then if<br>> without the patch there is chance to lose some of the dirty bits in<br>> the migrating pmd pages (on x86_64 we're fetching bit 11 which is part<br>> of swap offset instead of bit 2) and it could potentially corrupt the<br>> memory of an userspace program which depends on the dirty bit.<br><br>It seems this code is broken in case of pmd_migraion:<br><br><div><span style="white-space:pre"> </span>old_pmd = pmdp_invalidate(vma, haddr, pmd);</div><div><br></div><div>#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION</div><div><span style="white-space:pre"> </span>pmd_migration = is_pmd_migration_entry(old_pmd);</div><div><span style="white-space:pre"> </span>if (pmd_migration) {</div><div><span style="white-space:pre"> </span>swp_entry_t entry;</div><div><br></div><div><span style="white-space:pre"> </span>entry = pmd_to_swp_entry(old_pmd);</div><div><span style="white-space:pre"> </span>page = pfn_to_page(swp_offset(entry));</div><div><span style="white-space:pre"> </span>} else</div><div>#endif</div><div><span style="white-space:pre"> </span>page = pmd_page(old_pmd);</div><div><span style="white-space:pre"> </span>VM_BUG_ON_PAGE(!page_count(page), page);</div><div><span style="white-space:pre"> </span>page_ref_add(page, HPAGE_PMD_NR - 1);</div><div><span style="white-space:pre"> </span>if (pmd_dirty(old_pmd))</div><div><span style="white-space:pre"> </span>SetPageDirty(page);</div><div><span style="white-space:pre"> </span>write = pmd_write(old_pmd);</div><div><span style="white-space:pre"> </span>young = pmd_young(old_pmd);</div><div><span style="white-space:pre"> </span>soft_dirty = pmd_soft_dirty(old_pmd);</div><div><br></div><div>Not just soft_dirt - all bits (dirty, write, young) have diffrent encoding or not present at all for migration entry.</div><br>><br>> ><br>> > CC: Andrea Arcangeli <<a href="mailto:aarcange@redhat.com">aarcange@redhat.com</a>><br>> > CC: Andrew Morton <<a href="mailto:akpm@linux-foundation.org">akpm@linux-foundation.org</a>><br>> > CC: "Kirill A. Shutemov" <<a href="mailto:kirill.shutemov@linux.intel.com">kirill.shutemov@linux.intel.com</a>><br>> > CC: Matthew Wilcox <<a href="mailto:willy@infradead.org">willy@infradead.org</a>><br>> > CC: Michal Hocko <<a href="mailto:mhocko@suse.com">mhocko@suse.com</a>><br>> > CC: Dave Jiang <<a href="mailto:dave.jiang@intel.com">dave.jiang@intel.com</a>><br>> > CC: "Aneesh Kumar K.V" <<a href="mailto:aneesh.kumar@linux.vnet.ibm.com">aneesh.kumar@linux.vnet.ibm.com</a>><br>> > CC: Souptick Joarder <<a href="mailto:jrdr.linux@gmail.com">jrdr.linux@gmail.com</a>><br>> > CC: Konstantin Khlebnikov <<a href="mailto:khlebnikov@yandex-team.ru">khlebnikov@yandex-team.ru</a>><br>> > CC: <a href="mailto:linux-mm@kvack.org">linux-mm@kvack.org</a><br>> > CC: <a href="mailto:linux-kernel@vger.kernel.org">linux-kernel@vger.kernel.org</a><br>> > Signed-off-by: Peter Xu <<a href="mailto:peterx@redhat.com">peterx@redhat.com</a>><br>> > ---<br>> ><br>> > I noticed this during code reading. Only compile tested. I'm sending<br>> > a patch directly for review comments since it's relatively<br>> > straightforward and not easy to test. Please have a look, thanks.<br>> > ---<br>> > mm/huge_memory.c | 5 ++++-<br>> > 1 file changed, 4 insertions(+), 1 deletion(-)<br>> ><br>> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c<br>> > index f2d19e4fe854..fb0787c3dd3b 100644<br>> > --- a/mm/huge_memory.c<br>> > +++ b/mm/huge_memory.c<br>> > @@ -2161,7 +2161,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,<br>> > SetPageDirty(page);<br>> > write = pmd_write(old_pmd);<br>> > young = pmd_young(old_pmd);<br>> > - soft_dirty = pmd_soft_dirty(old_pmd);<br>> > + if (unlikely(pmd_migration))<br>> > + soft_dirty = pmd_swp_soft_dirty(old_pmd);<br>> > + else<br>> > + soft_dirty = pmd_soft_dirty(old_pmd);<br>> ><br>> > /*<br>> > * Withdraw the table only after we mark the pmd entry invalid.<br>> > --<br>> > 2.17.1<br>> ><br>><br>> Regards,<br>><br>> --<br>> Peter Xu<br>></div></div>
On Mon, Dec 10, 2018 at 07:50:52PM +0300, Konstantin Khlebnikov wrote: > On Fri, Dec 7, 2018 at 6:34 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Dec 06, 2018 at 04:46:04PM +0800, Peter Xu wrote: > > > When splitting a huge migrating PMD, we'll transfer the soft dirty bit > > > from the huge page to the small pages. However we're possibly using a > > > wrong data since when fetching the bit we're using pmd_soft_dirty() > > > upon a migration entry. Fix it up. > > > > Note that if my understanding is correct about the problem then if > > without the patch there is chance to lose some of the dirty bits in > > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > > of swap offset instead of bit 2) and it could potentially corrupt the > > memory of an userspace program which depends on the dirty bit. > > It seems this code is broken in case of pmd_migraion: > > old_pmd = pmdp_invalidate(vma, haddr, pmd); > > #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION > pmd_migration = is_pmd_migration_entry(old_pmd); > if (pmd_migration) { > swp_entry_t entry; > > entry = pmd_to_swp_entry(old_pmd); > page = pfn_to_page(swp_offset(entry)); > } else > #endif > page = pmd_page(old_pmd); > VM_BUG_ON_PAGE(!page_count(page), page); > page_ref_add(page, HPAGE_PMD_NR - 1); > if (pmd_dirty(old_pmd)) > SetPageDirty(page); > write = pmd_write(old_pmd); > young = pmd_young(old_pmd); > soft_dirty = pmd_soft_dirty(old_pmd); > > Not just soft_dirt - all bits (dirty, write, young) have diffrent encoding > or not present at all for migration entry. Hi, Konstantin, Actually I noticed it but I thought it didn't hurt since both write/young flags are not used at all when applying to the small pages when pmd_migration==true. But indeed there's at least an unexpected side effect of an extra call to SetPageDirty() that I missed. I'll repost soon. Thanks!
On Tue, Dec 11, 2018 at 7:48 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Dec 10, 2018 at 07:50:52PM +0300, Konstantin Khlebnikov wrote: > > On Fri, Dec 7, 2018 at 6:34 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Thu, Dec 06, 2018 at 04:46:04PM +0800, Peter Xu wrote: > > > > When splitting a huge migrating PMD, we'll transfer the soft dirty bit > > > > from the huge page to the small pages. However we're possibly using a > > > > wrong data since when fetching the bit we're using pmd_soft_dirty() > > > > upon a migration entry. Fix it up. > > > > > > Note that if my understanding is correct about the problem then if > > > without the patch there is chance to lose some of the dirty bits in > > > the migrating pmd pages (on x86_64 we're fetching bit 11 which is part > > > of swap offset instead of bit 2) and it could potentially corrupt the > > > memory of an userspace program which depends on the dirty bit. > > > > It seems this code is broken in case of pmd_migraion: > > > > old_pmd = pmdp_invalidate(vma, haddr, pmd); > > > > #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION > > pmd_migration = is_pmd_migration_entry(old_pmd); > > if (pmd_migration) { > > swp_entry_t entry; > > > > entry = pmd_to_swp_entry(old_pmd); > > page = pfn_to_page(swp_offset(entry)); > > } else > > #endif > > page = pmd_page(old_pmd); > > VM_BUG_ON_PAGE(!page_count(page), page); > > page_ref_add(page, HPAGE_PMD_NR - 1); > > if (pmd_dirty(old_pmd)) > > SetPageDirty(page); > > write = pmd_write(old_pmd); > > young = pmd_young(old_pmd); > > soft_dirty = pmd_soft_dirty(old_pmd); > > > > Not just soft_dirt - all bits (dirty, write, young) have diffrent encoding > > or not present at all for migration entry. > > Hi, Konstantin, > > Actually I noticed it but I thought it didn't hurt since both > write/young flags are not used at all when applying to the small pages > when pmd_migration==true. But indeed there's at least an unexpected > side effect of an extra call to SetPageDirty() that I missed. "write" is used for making smaller migration entry: swp_entry = make_migration_entry(page + i, write); > > > I'll repost soon. Thanks! > > -- > Peter Xu
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f2d19e4fe854..fb0787c3dd3b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, SetPageDirty(page); write = pmd_write(old_pmd); young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); + if (unlikely(pmd_migration)) + soft_dirty = pmd_swp_soft_dirty(old_pmd); + else + soft_dirty = pmd_soft_dirty(old_pmd); /* * Withdraw the table only after we mark the pmd entry invalid.
When splitting a huge migrating PMD, we'll transfer the soft dirty bit from the huge page to the small pages. However we're possibly using a wrong data since when fetching the bit we're using pmd_soft_dirty() upon a migration entry. Fix it up. CC: Andrea Arcangeli <aarcange@redhat.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> CC: Matthew Wilcox <willy@infradead.org> CC: Michal Hocko <mhocko@suse.com> CC: Dave Jiang <dave.jiang@intel.com> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> CC: Souptick Joarder <jrdr.linux@gmail.com> CC: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> CC: linux-mm@kvack.org CC: linux-kernel@vger.kernel.org Signed-off-by: Peter Xu <peterx@redhat.com> --- I noticed this during code reading. Only compile tested. I'm sending a patch directly for review comments since it's relatively straightforward and not easy to test. Please have a look, thanks. --- mm/huge_memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)