Message ID | 20190606184438.31646-1-jgg@ziepe.ca (mailing list archive) |
---|---|
Headers | show |
Series | Various revisions from a locking/code review | expand |
On Thu, Jun 06, 2019 at 03:44:27PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe <jgg@mellanox.com> > > For hmm.git: > > This patch series arised out of discussions with Jerome when looking at the > ODP changes, particularly informed by use after free races we have already > found and fixed in the ODP code (thanks to syzkaller) working with mmu > notifiers, and the discussion with Ralph on how to resolve the lifetime model. > > Overall this brings in a simplified locking scheme and easy to explain > lifetime model: > > If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm > is allocated memory. > > If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc) > then the mmget must be obtained via mmget_not_zero(). > > Locking of mm->hmm is shifted to use the mmap_sem consistently for all > read/write and unlocked accesses are removed. > > The use unlocked reads on 'hmm->dead' are also eliminated in favour of using > standard mmget() locking to prevent the mm from being released. Many of the > debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison - > which is much clearer as to the lifetime intent. > > The trailing patches are just some random cleanups I noticed when reviewing > this code. > > This v2 incorporates alot of the good off list changes & feedback Jerome had, > and all the on-list comments too. However, now that we have the shared git I > have kept the one line change to nouveau_svm.c rather than the compat > funtions. > > I believe we can resolve this merge in the DRM tree now and keep the core > mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong. > > It is on top of hmm.git, and I have a git tree of this series to ease testing > here: > > https://github.com/jgunthorpe/linux/tree/hmm > > There are still some open locking issues, as I think this remains unaddressed: > > https://lore.kernel.org/linux-mm/20190527195829.GB18019@mellanox.com/ > > I'm looking for some more acks, reviews and tests so this can move ahead to > hmm.git. AMD Folks, this is looking pretty good now, can you please give at least a Tested-by for the new driver code using this that I see in linux-next? Thanks, Jason
[+Philip] Hi Jason, I'm out of the office this week. Hi Philip, can you give this a go? Not sure how much you've been following this patch series review. Message or call me on Skype to discuss any questions. Thanks, Felix On 2019-06-11 12:48, Jason Gunthorpe wrote: > On Thu, Jun 06, 2019 at 03:44:27PM -0300, Jason Gunthorpe wrote: >> From: Jason Gunthorpe <jgg@mellanox.com> >> >> For hmm.git: >> >> This patch series arised out of discussions with Jerome when looking at the >> ODP changes, particularly informed by use after free races we have already >> found and fixed in the ODP code (thanks to syzkaller) working with mmu >> notifiers, and the discussion with Ralph on how to resolve the lifetime model. >> >> Overall this brings in a simplified locking scheme and easy to explain >> lifetime model: >> >> If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm >> is allocated memory. >> >> If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc) >> then the mmget must be obtained via mmget_not_zero(). >> >> Locking of mm->hmm is shifted to use the mmap_sem consistently for all >> read/write and unlocked accesses are removed. >> >> The use unlocked reads on 'hmm->dead' are also eliminated in favour of using >> standard mmget() locking to prevent the mm from being released. Many of the >> debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison - >> which is much clearer as to the lifetime intent. >> >> The trailing patches are just some random cleanups I noticed when reviewing >> this code. >> >> This v2 incorporates alot of the good off list changes & feedback Jerome had, >> and all the on-list comments too. However, now that we have the shared git I >> have kept the one line change to nouveau_svm.c rather than the compat >> funtions. >> >> I believe we can resolve this merge in the DRM tree now and keep the core >> mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong. >> >> It is on top of hmm.git, and I have a git tree of this series to ease testing >> here: >> >> https://github.com/jgunthorpe/linux/tree/hmm >> >> There are still some open locking issues, as I think this remains unaddressed: >> >> https://lore.kernel.org/linux-mm/20190527195829.GB18019@mellanox.com/ >> >> I'm looking for some more acks, reviews and tests so this can move ahead to >> hmm.git. > AMD Folks, this is looking pretty good now, can you please give at > least a Tested-by for the new driver code using this that I see in > linux-next? > > Thanks, > Jason
Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some changes because of interface hmm_range_register change. Then run a quick amdgpu_test. Test is finished, result is ok. But there is below kernel BUG message, seems hmm_free_rcu calls down_write..... [ 1171.919921] BUG: sleeping function called from invalid context at /home/yangp/git/compute_staging/kernel/kernel/locking/rwsem.c:65 [ 1171.919933] in_atomic(): 1, irqs_disabled(): 0, pid: 53, name: kworker/1:1 [ 1171.919938] 2 locks held by kworker/1:1/53: [ 1171.919940] #0: 000000001c7c19d4 ((wq_completion)rcu_gp){+.+.}, at: process_one_work+0x20e/0x630 [ 1171.919951] #1: 00000000923f2cfa ((work_completion)(&sdp->work)){+.+.}, at: process_one_work+0x20e/0x630 [ 1171.919959] CPU: 1 PID: 53 Comm: kworker/1:1 Tainted: G W 5.2.0-rc1-kfd-yangp #196 [ 1171.919961] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1, BIOS 9001 03/07/2016 [ 1171.919965] Workqueue: rcu_gp srcu_invoke_callbacks [ 1171.919968] Call Trace: [ 1171.919974] dump_stack+0x67/0x9b [ 1171.919980] ___might_sleep+0x149/0x230 [ 1171.919985] down_write+0x1c/0x70 [ 1171.919989] hmm_free_rcu+0x24/0x80 [ 1171.919993] srcu_invoke_callbacks+0xc9/0x150 [ 1171.920000] process_one_work+0x28e/0x630 [ 1171.920008] worker_thread+0x39/0x3f0 [ 1171.920014] ? process_one_work+0x630/0x630 [ 1171.920017] kthread+0x11c/0x140 [ 1171.920021] ? kthread_park+0x90/0x90 [ 1171.920026] ret_from_fork+0x24/0x30 Philip On 2019-06-12 1:54 p.m., Kuehling, Felix wrote: > [+Philip] > > Hi Jason, > > I'm out of the office this week. > > Hi Philip, can you give this a go? Not sure how much you've been > following this patch series review. Message or call me on Skype to > discuss any questions. > > Thanks, > Felix > > On 2019-06-11 12:48, Jason Gunthorpe wrote: >> On Thu, Jun 06, 2019 at 03:44:27PM -0300, Jason Gunthorpe wrote: >>> From: Jason Gunthorpe <jgg@mellanox.com> >>> >>> For hmm.git: >>> >>> This patch series arised out of discussions with Jerome when looking at the >>> ODP changes, particularly informed by use after free races we have already >>> found and fixed in the ODP code (thanks to syzkaller) working with mmu >>> notifiers, and the discussion with Ralph on how to resolve the lifetime model. >>> >>> Overall this brings in a simplified locking scheme and easy to explain >>> lifetime model: >>> >>> If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm >>> is allocated memory. >>> >>> If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc) >>> then the mmget must be obtained via mmget_not_zero(). >>> >>> Locking of mm->hmm is shifted to use the mmap_sem consistently for all >>> read/write and unlocked accesses are removed. >>> >>> The use unlocked reads on 'hmm->dead' are also eliminated in favour of using >>> standard mmget() locking to prevent the mm from being released. Many of the >>> debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison - >>> which is much clearer as to the lifetime intent. >>> >>> The trailing patches are just some random cleanups I noticed when reviewing >>> this code. >>> >>> This v2 incorporates alot of the good off list changes & feedback Jerome had, >>> and all the on-list comments too. However, now that we have the shared git I >>> have kept the one line change to nouveau_svm.c rather than the compat >>> funtions. >>> >>> I believe we can resolve this merge in the DRM tree now and keep the core >>> mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong. >>> >>> It is on top of hmm.git, and I have a git tree of this series to ease testing >>> here: >>> >>> https://github.com/jgunthorpe/linux/tree/hmm >>> >>> There are still some open locking issues, as I think this remains unaddressed: >>> >>> https://lore.kernel.org/linux-mm/20190527195829.GB18019@mellanox.com/ >>> >>> I'm looking for some more acks, reviews and tests so this can move ahead to >>> hmm.git. >> AMD Folks, this is looking pretty good now, can you please give at >> least a Tested-by for the new driver code using this that I see in >> linux-next? >> >> Thanks, >> Jason
On Wed, Jun 12, 2019 at 09:49:12PM +0000, Yang, Philip wrote: > Rebase to https://github.com/jgunthorpe/linux.git hmm branch, need some > changes because of interface hmm_range_register change. Then run a quick > amdgpu_test. Test is finished, result is ok. Great! Thanks I'll add your Tested-by to the series > But there is below kernel BUG message, seems hmm_free_rcu calls > down_write..... > > [ 1171.919921] BUG: sleeping function called from invalid context at > /home/yangp/git/compute_staging/kernel/kernel/locking/rwsem.c:65 > [ 1171.919933] in_atomic(): 1, irqs_disabled(): 0, pid: 53, name: > kworker/1:1 > [ 1171.919938] 2 locks held by kworker/1:1/53: > [ 1171.919940] #0: 000000001c7c19d4 ((wq_completion)rcu_gp){+.+.}, at: > process_one_work+0x20e/0x630 > [ 1171.919951] #1: 00000000923f2cfa > ((work_completion)(&sdp->work)){+.+.}, at: process_one_work+0x20e/0x630 > [ 1171.919959] CPU: 1 PID: 53 Comm: kworker/1:1 Tainted: G W > 5.2.0-rc1-kfd-yangp #196 > [ 1171.919961] Hardware name: ASUS All Series/Z97-PRO(Wi-Fi ac)/USB 3.1, > BIOS 9001 03/07/2016 > [ 1171.919965] Workqueue: rcu_gp srcu_invoke_callbacks > [ 1171.919968] Call Trace: > [ 1171.919974] dump_stack+0x67/0x9b > [ 1171.919980] ___might_sleep+0x149/0x230 > [ 1171.919985] down_write+0x1c/0x70 > [ 1171.919989] hmm_free_rcu+0x24/0x80 > [ 1171.919993] srcu_invoke_callbacks+0xc9/0x150 > [ 1171.920000] process_one_work+0x28e/0x630 > [ 1171.920008] worker_thread+0x39/0x3f0 > [ 1171.920014] ? process_one_work+0x630/0x630 > [ 1171.920017] kthread+0x11c/0x140 > [ 1171.920021] ? kthread_park+0x90/0x90 > [ 1171.920026] ret_from_fork+0x24/0x30 Thank you Phillip, it seems the prior tests were not done with lockdep.. Sigh, I will keep this with the gross pagetable_lock then. I updated the patches on the git with this modification. I think we have covered all the bases so I will send another V of the series to the list and if no more comments then it will move ahead to hmm.git. Thanks to all. diff --git a/mm/hmm.c b/mm/hmm.c index 136c812faa2790..4c64d4c32f4825 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -49,16 +49,15 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm) lockdep_assert_held_exclusive(&mm->mmap_sem); + /* Abuse the page_table_lock to also protect mm->hmm. */ + spin_lock(&mm->page_table_lock); if (mm->hmm) { - if (kref_get_unless_zero(&mm->hmm->kref)) + if (kref_get_unless_zero(&mm->hmm->kref)) { + spin_unlock(&mm->page_table_lock); return mm->hmm; - /* - * The hmm is being freed by some other CPU and is pending a - * RCU grace period, but this CPU can NULL now it since we - * have the mmap_sem. - */ - mm->hmm = NULL; + } } + spin_unlock(&mm->page_table_lock); hmm = kmalloc(sizeof(*hmm), GFP_KERNEL); if (!hmm) @@ -81,7 +80,14 @@ static struct hmm *hmm_get_or_create(struct mm_struct *mm) } mmgrab(hmm->mm); + + /* + * We hold the exclusive mmap_sem here so we know that mm->hmm is + * still NULL or 0 kref, and is safe to update. + */ + spin_lock(&mm->page_table_lock); mm->hmm = hmm; + spin_unlock(&mm->page_table_lock); return hmm; } @@ -89,10 +95,14 @@ static void hmm_free_rcu(struct rcu_head *rcu) { struct hmm *hmm = container_of(rcu, struct hmm, rcu); - down_write(&hmm->mm->mmap_sem); + /* + * The mm->hmm pointer is kept valid while notifier ops can be running + * so they don't have to deal with a NULL mm->hmm value + */ + spin_lock(&hmm->mm->page_table_lock); if (hmm->mm->hmm == hmm) hmm->mm->hmm = NULL; - up_write(&hmm->mm->mmap_sem); + spin_unlock(&hmm->mm->page_table_lock); mmdrop(hmm->mm); kfree(hmm);
From: Jason Gunthorpe <jgg@mellanox.com> For hmm.git: This patch series arised out of discussions with Jerome when looking at the ODP changes, particularly informed by use after free races we have already found and fixed in the ODP code (thanks to syzkaller) working with mmu notifiers, and the discussion with Ralph on how to resolve the lifetime model. Overall this brings in a simplified locking scheme and easy to explain lifetime model: If a hmm_range is valid, then the hmm is valid, if a hmm is valid then the mm is allocated memory. If the mm needs to still be alive (ie to lock the mmap_sem, find a vma, etc) then the mmget must be obtained via mmget_not_zero(). Locking of mm->hmm is shifted to use the mmap_sem consistently for all read/write and unlocked accesses are removed. The use unlocked reads on 'hmm->dead' are also eliminated in favour of using standard mmget() locking to prevent the mm from being released. Many of the debugging checks of !range->hmm and !hmm->mm are dropped in favour of poison - which is much clearer as to the lifetime intent. The trailing patches are just some random cleanups I noticed when reviewing this code. This v2 incorporates alot of the good off list changes & feedback Jerome had, and all the on-list comments too. However, now that we have the shared git I have kept the one line change to nouveau_svm.c rather than the compat funtions. I believe we can resolve this merge in the DRM tree now and keep the core mm/hmm.c clean. DRM maintainers, please correct me if I'm wrong. It is on top of hmm.git, and I have a git tree of this series to ease testing here: https://github.com/jgunthorpe/linux/tree/hmm There are still some open locking issues, as I think this remains unaddressed: https://lore.kernel.org/linux-mm/20190527195829.GB18019@mellanox.com/ I'm looking for some more acks, reviews and tests so this can move ahead to hmm.git. Detailed notes on the v2 changes are in each patch. The big changes: - mmget is held so long as the range is registered - the last patch 'Remove confusing comment and logic from hmm_release' is new Thanks everyone, Jason Jason Gunthorpe (11): mm/hmm: fix use after free with struct hmm in the mmu notifiers mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register mm/hmm: Hold a mmgrab from hmm to mm mm/hmm: Simplify hmm_get_or_create and make it reliable mm/hmm: Remove duplicate condition test before wait_event_timeout mm/hmm: Hold on to the mmget for the lifetime of the range mm/hmm: Use lockdep instead of comments mm/hmm: Remove racy protection against double-unregistration mm/hmm: Poison hmm_range during unregister mm/hmm: Do not use list*_rcu() for hmm->ranges mm/hmm: Remove confusing comment and logic from hmm_release drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- include/linux/hmm.h | 49 +------ kernel/fork.c | 1 - mm/hmm.c | 204 ++++++++++---------------- 4 files changed, 87 insertions(+), 169 deletions(-)