Patchwork [1/6] mm: khugepaged: fix radix tree node leak in shmem collapse error path

login
register
mail settings
Submitter Johannes Weiner
Date Nov. 8, 2016, 4:12 p.m.
Message ID <20161108161245.GA4020@cmpxchg.org>
Download mbox | patch
Permalink /patch/9417701/
State New
Headers show

Comments

Johannes Weiner - Nov. 8, 2016, 4:12 p.m.
On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > The radix tree counts valid entries in each tree node. Entries stored
> > in the tree cannot be removed by simpling storing NULL in the slot or
> > the internal counters will be off and the node never gets freed again.
> > 
> > When collapsing a shmem page fails, restore the holes that were filled
> > with radix_tree_insert() with a proper radix tree deletion.
> > 
> > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > Reported-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/khugepaged.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 728d7790dc2d..eac6f0580e26 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> >  				if (!nr_none)
> >  					break;
> >  				/* Put holes back where they were */
> > -				radix_tree_replace_slot(slot, NULL);
> > +				radix_tree_delete(&mapping->page_tree,
> > +						  iter.index);
> 
> Hum, but this is inside radix_tree_for_each_slot() iteration. And
> radix_tree_delete() may end up freeing nodes resulting in invalidating
> current slot pointer and the iteration code will do use-after-free.

Good point, we need to do another tree lookup after the deletion.

But there are other instances in the code, where we drop the lock
temporarily and somebody else could delete the node from under us.

In the main collapse path, I *think* this is prevented by the fact
that when we drop the tree lock we still hold the page lock of the
regular page that's in the tree while we isolate and unmap it, thus
pin the node. Even so, it would seem a little hairy to rely on that.

Kirill?

I'll update this patch and prepend another fix to the series that
addresses the other two lock dropping issues.

Thanks Jan.
Jan Kara - Nov. 9, 2016, 7:41 a.m.
On Tue 08-11-16 11:12:45, Johannes Weiner wrote:
> On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> > On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > > The radix tree counts valid entries in each tree node. Entries stored
> > > in the tree cannot be removed by simpling storing NULL in the slot or
> > > the internal counters will be off and the node never gets freed again.
> > > 
> > > When collapsing a shmem page fails, restore the holes that were filled
> > > with radix_tree_insert() with a proper radix tree deletion.
> > > 
> > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > > Reported-by: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/khugepaged.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 728d7790dc2d..eac6f0580e26 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> > >  				if (!nr_none)
> > >  					break;
> > >  				/* Put holes back where they were */
> > > -				radix_tree_replace_slot(slot, NULL);
> > > +				radix_tree_delete(&mapping->page_tree,
> > > +						  iter.index);
> > 
> > Hum, but this is inside radix_tree_for_each_slot() iteration. And
> > radix_tree_delete() may end up freeing nodes resulting in invalidating
> > current slot pointer and the iteration code will do use-after-free.
> 
> Good point, we need to do another tree lookup after the deletion.
> 
> But there are other instances in the code, where we drop the lock
> temporarily and somebody else could delete the node from under us.
> 
> In the main collapse path, I *think* this is prevented by the fact
> that when we drop the tree lock we still hold the page lock of the
> regular page that's in the tree while we isolate and unmap it, thus
> pin the node. Even so, it would seem a little hairy to rely on that.

Yeah, I think that is mostly right but I'm not sure whether shrinking of
radix tree into direct pointer cannot bite us here as well. Generally that
relies on internal implementatation of the radix tree and its iterator
so what you did makes sense to me.

								Honza
Kirill A. Shutemov - Nov. 11, 2016, 10:59 a.m.
On Tue, Nov 08, 2016 at 11:12:45AM -0500, Johannes Weiner wrote:
> On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> > On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > > The radix tree counts valid entries in each tree node. Entries stored
> > > in the tree cannot be removed by simpling storing NULL in the slot or
> > > the internal counters will be off and the node never gets freed again.
> > > 
> > > When collapsing a shmem page fails, restore the holes that were filled
> > > with radix_tree_insert() with a proper radix tree deletion.
> > > 
> > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > > Reported-by: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > >  mm/khugepaged.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 728d7790dc2d..eac6f0580e26 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> > >  				if (!nr_none)
> > >  					break;
> > >  				/* Put holes back where they were */
> > > -				radix_tree_replace_slot(slot, NULL);
> > > +				radix_tree_delete(&mapping->page_tree,
> > > +						  iter.index);
> > 
> > Hum, but this is inside radix_tree_for_each_slot() iteration. And
> > radix_tree_delete() may end up freeing nodes resulting in invalidating
> > current slot pointer and the iteration code will do use-after-free.
> 
> Good point, we need to do another tree lookup after the deletion.
> 
> But there are other instances in the code, where we drop the lock
> temporarily and somebody else could delete the node from under us.
> 
> In the main collapse path, I *think* this is prevented by the fact
> that when we drop the tree lock we still hold the page lock of the
> regular page that's in the tree while we isolate and unmap it, thus
> pin the node. Even so, it would seem a little hairy to rely on that.
> 
> Kirill?

[ sorry for delay ]

Yes, we make sure that locked page still belong to the radix tree and fall
off if it's not. Locked page cannot be removed from radix-tree, so we
should be fine.

> I'll update this patch and prepend another fix to the series that
> addresses the other two lock dropping issues.

Feel free add my Acked-by.
Jan Kara - Nov. 11, 2016, 12:22 p.m.
On Fri 11-11-16 13:59:21, Kirill A. Shutemov wrote:
> On Tue, Nov 08, 2016 at 11:12:45AM -0500, Johannes Weiner wrote:
> > On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> > > On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > > > The radix tree counts valid entries in each tree node. Entries stored
> > > > in the tree cannot be removed by simpling storing NULL in the slot or
> > > > the internal counters will be off and the node never gets freed again.
> > > > 
> > > > When collapsing a shmem page fails, restore the holes that were filled
> > > > with radix_tree_insert() with a proper radix tree deletion.
> > > > 
> > > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > > > Reported-by: Jan Kara <jack@suse.cz>
> > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > ---
> > > >  mm/khugepaged.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index 728d7790dc2d..eac6f0580e26 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> > > >  				if (!nr_none)
> > > >  					break;
> > > >  				/* Put holes back where they were */
> > > > -				radix_tree_replace_slot(slot, NULL);
> > > > +				radix_tree_delete(&mapping->page_tree,
> > > > +						  iter.index);
> > > 
> > > Hum, but this is inside radix_tree_for_each_slot() iteration. And
> > > radix_tree_delete() may end up freeing nodes resulting in invalidating
> > > current slot pointer and the iteration code will do use-after-free.
> > 
> > Good point, we need to do another tree lookup after the deletion.
> > 
> > But there are other instances in the code, where we drop the lock
> > temporarily and somebody else could delete the node from under us.
> > 
> > In the main collapse path, I *think* this is prevented by the fact
> > that when we drop the tree lock we still hold the page lock of the
> > regular page that's in the tree while we isolate and unmap it, thus
> > pin the node. Even so, it would seem a little hairy to rely on that.
> > 
> > Kirill?
> 
> [ sorry for delay ]
> 
> Yes, we make sure that locked page still belong to the radix tree and fall
> off if it's not. Locked page cannot be removed from radix-tree, so we
> should be fine.

Well, it cannot be removed from the radix tree but radix tree code is still
free to collapse / expand the tree nodes as it sees fit (currently the only
real case is when changing direct page pointer in the tree root to a node
pointer or vice versa but still...). So code should not really assume that
the node page is referenced from does not change once tree_lock is dropped.
It leads to subtle bugs...

								Honza
Kirill A. Shutemov - Nov. 11, 2016, 4:37 p.m.
On Fri, Nov 11, 2016 at 01:22:24PM +0100, Jan Kara wrote:
> On Fri 11-11-16 13:59:21, Kirill A. Shutemov wrote:
> > On Tue, Nov 08, 2016 at 11:12:45AM -0500, Johannes Weiner wrote:
> > > On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> > > > On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > > > > The radix tree counts valid entries in each tree node. Entries stored
> > > > > in the tree cannot be removed by simpling storing NULL in the slot or
> > > > > the internal counters will be off and the node never gets freed again.
> > > > > 
> > > > > When collapsing a shmem page fails, restore the holes that were filled
> > > > > with radix_tree_insert() with a proper radix tree deletion.
> > > > > 
> > > > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > > > > Reported-by: Jan Kara <jack@suse.cz>
> > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > > ---
> > > > >  mm/khugepaged.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > index 728d7790dc2d..eac6f0580e26 100644
> > > > > --- a/mm/khugepaged.c
> > > > > +++ b/mm/khugepaged.c
> > > > > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> > > > >  				if (!nr_none)
> > > > >  					break;
> > > > >  				/* Put holes back where they were */
> > > > > -				radix_tree_replace_slot(slot, NULL);
> > > > > +				radix_tree_delete(&mapping->page_tree,
> > > > > +						  iter.index);
> > > > 
> > > > Hum, but this is inside radix_tree_for_each_slot() iteration. And
> > > > radix_tree_delete() may end up freeing nodes resulting in invalidating
> > > > current slot pointer and the iteration code will do use-after-free.
> > > 
> > > Good point, we need to do another tree lookup after the deletion.
> > > 
> > > But there are other instances in the code, where we drop the lock
> > > temporarily and somebody else could delete the node from under us.
> > > 
> > > In the main collapse path, I *think* this is prevented by the fact
> > > that when we drop the tree lock we still hold the page lock of the
> > > regular page that's in the tree while we isolate and unmap it, thus
> > > pin the node. Even so, it would seem a little hairy to rely on that.
> > > 
> > > Kirill?
> > 
> > [ sorry for delay ]
> > 
> > Yes, we make sure that locked page still belong to the radix tree and fall
> > off if it's not. Locked page cannot be removed from radix-tree, so we
> > should be fine.
> 
> Well, it cannot be removed from the radix tree but radix tree code is still
> free to collapse / expand the tree nodes as it sees fit (currently the only
> real case is when changing direct page pointer in the tree root to a node
> pointer or vice versa but still...). So code should not really assume that
> the node page is referenced from does not change once tree_lock is dropped.
> It leads to subtle bugs...

Hm. Okay.

What is the right way re-validate that slot is still valid? Do I need full
look up again? Can I pin node explicitly?
Jan Kara - Nov. 14, 2016, 8:07 a.m.
On Fri 11-11-16 19:37:53, Kirill A. Shutemov wrote:
> On Fri, Nov 11, 2016 at 01:22:24PM +0100, Jan Kara wrote:
> > On Fri 11-11-16 13:59:21, Kirill A. Shutemov wrote:
> > > On Tue, Nov 08, 2016 at 11:12:45AM -0500, Johannes Weiner wrote:
> > > > On Tue, Nov 08, 2016 at 10:53:52AM +0100, Jan Kara wrote:
> > > > > On Mon 07-11-16 14:07:36, Johannes Weiner wrote:
> > > > > > The radix tree counts valid entries in each tree node. Entries stored
> > > > > > in the tree cannot be removed by simpling storing NULL in the slot or
> > > > > > the internal counters will be off and the node never gets freed again.
> > > > > > 
> > > > > > When collapsing a shmem page fails, restore the holes that were filled
> > > > > > with radix_tree_insert() with a proper radix tree deletion.
> > > > > > 
> > > > > > Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> > > > > > Reported-by: Jan Kara <jack@suse.cz>
> > > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > > > > ---
> > > > > >  mm/khugepaged.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > > > index 728d7790dc2d..eac6f0580e26 100644
> > > > > > --- a/mm/khugepaged.c
> > > > > > +++ b/mm/khugepaged.c
> > > > > > @@ -1520,7 +1520,8 @@ static void collapse_shmem(struct mm_struct *mm,
> > > > > >  				if (!nr_none)
> > > > > >  					break;
> > > > > >  				/* Put holes back where they were */
> > > > > > -				radix_tree_replace_slot(slot, NULL);
> > > > > > +				radix_tree_delete(&mapping->page_tree,
> > > > > > +						  iter.index);
> > > > > 
> > > > > Hum, but this is inside radix_tree_for_each_slot() iteration. And
> > > > > radix_tree_delete() may end up freeing nodes resulting in invalidating
> > > > > current slot pointer and the iteration code will do use-after-free.
> > > > 
> > > > Good point, we need to do another tree lookup after the deletion.
> > > > 
> > > > But there are other instances in the code, where we drop the lock
> > > > temporarily and somebody else could delete the node from under us.
> > > > 
> > > > In the main collapse path, I *think* this is prevented by the fact
> > > > that when we drop the tree lock we still hold the page lock of the
> > > > regular page that's in the tree while we isolate and unmap it, thus
> > > > pin the node. Even so, it would seem a little hairy to rely on that.
> > > > 
> > > > Kirill?
> > > 
> > > [ sorry for delay ]
> > > 
> > > Yes, we make sure that locked page still belong to the radix tree and fall
> > > off if it's not. Locked page cannot be removed from radix-tree, so we
> > > should be fine.
> > 
> > Well, it cannot be removed from the radix tree but radix tree code is still
> > free to collapse / expand the tree nodes as it sees fit (currently the only
> > real case is when changing direct page pointer in the tree root to a node
> > pointer or vice versa but still...). So code should not really assume that
> > the node page is referenced from does not change once tree_lock is dropped.
> > It leads to subtle bugs...
> 
> Hm. Okay.
> 
> What is the right way re-validate that slot is still valid? Do I need full
> look up again? Can I pin node explicitly?

Full lookup is the only way to re-validate the slot. There is no way to pin
a radix tree node.

									Honza

Patch

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fed8d5e96978..1e43e77a98da 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1424,6 +1424,7 @@  static void collapse_shmem(struct mm_struct *mm,
 		radix_tree_replace_slot(&mapping->page_tree, slot,
 				new_page + (index % HPAGE_PMD_NR));
 
+		slot = radix_tree_iter_next(&iter);
 		index++;
 		continue;
 out_lru:
@@ -1522,6 +1523,7 @@  static void collapse_shmem(struct mm_struct *mm,
 				/* Put holes back where they were */
 				radix_tree_delete(&mapping->page_tree,
 						  iter.index);
+				slot = radix_tree_iter_next(&iter);
 				nr_none--;
 				continue;
 			}
@@ -1537,6 +1539,7 @@  static void collapse_shmem(struct mm_struct *mm,
 			putback_lru_page(page);
 			unlock_page(page);
 			spin_lock_irq(&mapping->tree_lock);
+			slot = radix_tree_iter_next(&iter);
 		}
 		VM_BUG_ON(nr_none);
 		spin_unlock_irq(&mapping->tree_lock);