Message ID | ecea6835baca75b945bd8ecfaa636ff01dabcc1d.1639157090.git.robin.murphy@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommu: refactor flush queues into iommu-dma | expand |
On 10/12/2021 17:54, Robin Murphy wrote: > From: Xiongfeng Wang<wangxiongfeng2@huawei.com> > > It turns out to be possible for hotplugging out a device to reach the > stage of tearing down the device's group and default domain before the > domain's flush queue has drained naturally. At this point, it is then > possible for the timeout to expire just*before* the del_timer() call super nit: "just*before* the" - needs a whitespace before "before" :) > from free_iova_flush_queue(), such that we then proceed to free the FQ > resources while fq_flush_timeout() is still accessing them on another > CPU. Crashes due to this have been observed in the wild while removing > NVMe devices. > > Close the race window by using del_timer_sync() to safely wait for any > active timeout handler to finish before we start to free things. We > already avoid any locking in free_iova_flush_queue() since the FQ is > supposed to be inactive anyway, so the potential deadlock scenario does > not apply. > > Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer") > Signed-off-by: Xiongfeng Wang<wangxiongfeng2@huawei.com> > [ rm: rewrite commit message ] > Signed-off-by: Robin Murphy<robin.murphy@arm.com> FWIW, Reviewed-by: John Garry <john.garry@huawei.com>
On 2021-12-10 18:04, John Garry via iommu wrote: > On 10/12/2021 17:54, Robin Murphy wrote: >> From: Xiongfeng Wang<wangxiongfeng2@huawei.com> >> >> It turns out to be possible for hotplugging out a device to reach the >> stage of tearing down the device's group and default domain before the >> domain's flush queue has drained naturally. At this point, it is then >> possible for the timeout to expire just*before* the del_timer() call > > super nit: "just*before* the" - needs a whitespace before "before" :) Weird... the original patch file here and the copy received by lore via linux-iommu look fine, gremlins in your MUA or delivery path perhaps? >> from free_iova_flush_queue(), such that we then proceed to free the FQ >> resources while fq_flush_timeout() is still accessing them on another >> CPU. Crashes due to this have been observed in the wild while removing >> NVMe devices. >> >> Close the race window by using del_timer_sync() to safely wait for any >> active timeout handler to finish before we start to free things. We >> already avoid any locking in free_iova_flush_queue() since the FQ is >> supposed to be inactive anyway, so the potential deadlock scenario does >> not apply. >> >> Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer") >> Signed-off-by: Xiongfeng Wang<wangxiongfeng2@huawei.com> >> [ rm: rewrite commit message ] >> Signed-off-by: Robin Murphy<robin.murphy@arm.com> > > FWIW, > > Reviewed-by: John Garry <john.garry@huawei.com> Thanks John! Robin.
On 10/12/2021 18:13, Robin Murphy wrote: >>> possible for the timeout to expire just*before* the del_timer() call >> >> super nit: "just*before* the" - needs a whitespace before "before" :) > > Weird... the original patch file here and the copy received by lore via > linux-iommu look fine, gremlins in your MUA or delivery path perhaps? Right, apologies for that. I did receive it ok, but my replying seemed to mangle it. Not sure why.. Thanks, John
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 9e8bc802ac05..920fcc27c9a1 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -83,8 +83,7 @@ static void free_iova_flush_queue(struct iova_domain *iovad) if (!has_iova_flush_queue(iovad)) return; - if (timer_pending(&iovad->fq_timer)) - del_timer(&iovad->fq_timer); + del_timer_sync(&iovad->fq_timer); fq_destroy_all_entries(iovad);