Message ID | 20170518233353.14370-1-qing.huang@oracle.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote: > This change will optimize kernel memory deregistration operations. > __ib_umem_release() used to call set_page_dirty_lock() against every > writable page in its memory region. Its purpose is to keep data > synced between CPU and DMA device when swapping happens after mem > deregistration ops. Now we choose not to set page dirty bit if it's > already set by kernel prior to calling __ib_umem_release(). This > reduces memory deregistration time by half or even more when we ran > application simulation test program. As far as I can tell this code doesn't even need set_page_dirty_lock and could just use set_page_dirty > > Signed-off-by: Qing Huang <qing.huang@oracle.com> > --- > drivers/infiniband/core/umem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index 3dbf811..21e60b1 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -58,7 +58,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d > for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { > > page = sg_page(sg); > - if (umem->writable && dirty) > + if (!PageDirty(page) && umem->writable && dirty) > set_page_dirty_lock(page); > put_page(page); > } > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text--- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/19/2017 6:05 AM, Christoph Hellwig wrote: > On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote: >> This change will optimize kernel memory deregistration operations. >> __ib_umem_release() used to call set_page_dirty_lock() against every >> writable page in its memory region. Its purpose is to keep data >> synced between CPU and DMA device when swapping happens after mem >> deregistration ops. Now we choose not to set page dirty bit if it's >> already set by kernel prior to calling __ib_umem_release(). This >> reduces memory deregistration time by half or even more when we ran >> application simulation test program. > As far as I can tell this code doesn't even need set_page_dirty_lock > and could just use set_page_dirty It seems that set_page_dirty_lock has been used here for more than 10 years. Don't know the original purpose. Maybe it was used to prevent races between setting dirty bits and swapping out pages? Perhaps we can call set_page_dirty before calling ib_dma_unmap_sg? >> Signed-off-by: Qing Huang<qing.huang@oracle.com> >> --- >> drivers/infiniband/core/umem.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c >> index 3dbf811..21e60b1 100644 >> --- a/drivers/infiniband/core/umem.c >> +++ b/drivers/infiniband/core/umem.c >> @@ -58,7 +58,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d >> for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { >> >> page = sg_page(sg); >> - if (umem->writable && dirty) >> + if (!PageDirty(page) && umem->writable && dirty) >> set_page_dirty_lock(page); >> put_page(page); >> } >> -- >> 2.9.3 >> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 22, 2017 at 04:43:57PM -0700, Qing Huang wrote: > > On 5/19/2017 6:05 AM, Christoph Hellwig wrote: > > On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote: > > > This change will optimize kernel memory deregistration operations. > > > __ib_umem_release() used to call set_page_dirty_lock() against every > > > writable page in its memory region. Its purpose is to keep data > > > synced between CPU and DMA device when swapping happens after mem > > > deregistration ops. Now we choose not to set page dirty bit if it's > > > already set by kernel prior to calling __ib_umem_release(). This > > > reduces memory deregistration time by half or even more when we ran > > > application simulation test program. > > As far as I can tell this code doesn't even need set_page_dirty_lock > > and could just use set_page_dirty > > It seems that set_page_dirty_lock has been used here for more than 10 years. > Don't know the original purpose. Maybe it was used to prevent races between > setting dirty bits and swapping out pages? I suspect copy & paste. Or maybe I don't actually understand the explanation of set_page_dirty vs set_page_dirty_lock enough. But I'd rather not hack around the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/23/2017 12:42 AM, Christoph Hellwig wrote: > On Mon, May 22, 2017 at 04:43:57PM -0700, Qing Huang wrote: >> On 5/19/2017 6:05 AM, Christoph Hellwig wrote: >>> On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote: >>>> This change will optimize kernel memory deregistration operations. >>>> __ib_umem_release() used to call set_page_dirty_lock() against every >>>> writable page in its memory region. Its purpose is to keep data >>>> synced between CPU and DMA device when swapping happens after mem >>>> deregistration ops. Now we choose not to set page dirty bit if it's >>>> already set by kernel prior to calling __ib_umem_release(). This >>>> reduces memory deregistration time by half or even more when we ran >>>> application simulation test program. >>> As far as I can tell this code doesn't even need set_page_dirty_lock >>> and could just use set_page_dirty >> It seems that set_page_dirty_lock has been used here for more than 10 years. >> Don't know the original purpose. Maybe it was used to prevent races between >> setting dirty bits and swapping out pages? > I suspect copy & paste. Or maybe I don't actually understand the > explanation of set_page_dirty vs set_page_dirty_lock enough. But > I'd rather not hack around the problem. > -- I think there are two parts here. First part is that we don't need to set the dirty bit if it's already set. Second part is whether we use set_page_dirty or set_page_dirty_lock to set dirty bits. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-05-18 at 16:33 -0700, Qing Huang wrote: > This change will optimize kernel memory deregistration operations. > __ib_umem_release() used to call set_page_dirty_lock() against every > writable page in its memory region. Its purpose is to keep data > synced between CPU and DMA device when swapping happens after mem > deregistration ops. Now we choose not to set page dirty bit if it's > already set by kernel prior to calling __ib_umem_release(). This > reduces memory deregistration time by half or even more when we ran > application simulation test program. > > Signed-off-by: Qing Huang <qing.huang@oracle.com> Thanks, applied.
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 3dbf811..21e60b1 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -58,7 +58,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { page = sg_page(sg); - if (umem->writable && dirty) + if (!PageDirty(page) && umem->writable && dirty) set_page_dirty_lock(page); put_page(page); }
This change will optimize kernel memory deregistration operations. __ib_umem_release() used to call set_page_dirty_lock() against every writable page in its memory region. Its purpose is to keep data synced between CPU and DMA device when swapping happens after mem deregistration ops. Now we choose not to set page dirty bit if it's already set by kernel prior to calling __ib_umem_release(). This reduces memory deregistration time by half or even more when we ran application simulation test program. Signed-off-by: Qing Huang <qing.huang@oracle.com> --- drivers/infiniband/core/umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)