diff mbox

ib/core: not to set page dirty bit if it's already set.

Message ID 20170518233353.14370-1-qing.huang@oracle.com (mailing list archive)
State Accepted
Headers show

Commit Message

Qing Huang May 18, 2017, 11:33 p.m. UTC
This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
---
 drivers/infiniband/core/umem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig May 19, 2017, 1:05 p.m. UTC | #1
On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote:
> This change will optimize kernel memory deregistration operations.
> __ib_umem_release() used to call set_page_dirty_lock() against every
> writable page in its memory region. Its purpose is to keep data
> synced between CPU and DMA device when swapping happens after mem
> deregistration ops. Now we choose not to set page dirty bit if it's
> already set by kernel prior to calling __ib_umem_release(). This
> reduces memory deregistration time by half or even more when we ran
> application simulation test program.

As far as I can tell this code doesn't even need set_page_dirty_lock
and could just use set_page_dirty

> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> ---
>  drivers/infiniband/core/umem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 3dbf811..21e60b1 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -58,7 +58,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  	for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
>  
>  		page = sg_page(sg);
> -		if (umem->writable && dirty)
> +		if (!PageDirty(page) && umem->writable && dirty)
>  			set_page_dirty_lock(page);
>  		put_page(page);
>  	}
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qing Huang May 22, 2017, 11:43 p.m. UTC | #2
On 5/19/2017 6:05 AM, Christoph Hellwig wrote:
> On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote:
>> This change will optimize kernel memory deregistration operations.
>> __ib_umem_release() used to call set_page_dirty_lock() against every
>> writable page in its memory region. Its purpose is to keep data
>> synced between CPU and DMA device when swapping happens after mem
>> deregistration ops. Now we choose not to set page dirty bit if it's
>> already set by kernel prior to calling __ib_umem_release(). This
>> reduces memory deregistration time by half or even more when we ran
>> application simulation test program.
> As far as I can tell this code doesn't even need set_page_dirty_lock
> and could just use set_page_dirty

It seems that set_page_dirty_lock has been used here for more than 10 
years. Don't know the original purpose. Maybe it was used to prevent 
races between setting dirty bits and swapping out pages?

Perhaps we can call set_page_dirty before calling ib_dma_unmap_sg?

>> Signed-off-by: Qing Huang<qing.huang@oracle.com>
>> ---
>>   drivers/infiniband/core/umem.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 3dbf811..21e60b1 100644
>> --- a/drivers/infiniband/core/umem.c
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -58,7 +58,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>>   	for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
>>   
>>   		page = sg_page(sg);
>> -		if (umem->writable && dirty)
>> +		if (!PageDirty(page) && umem->writable && dirty)
>>   			set_page_dirty_lock(page);
>>   		put_page(page);
>>   	}
>> -- 
>> 2.9.3
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig May 23, 2017, 7:42 a.m. UTC | #3
On Mon, May 22, 2017 at 04:43:57PM -0700, Qing Huang wrote:
> 
> On 5/19/2017 6:05 AM, Christoph Hellwig wrote:
> > On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote:
> > > This change will optimize kernel memory deregistration operations.
> > > __ib_umem_release() used to call set_page_dirty_lock() against every
> > > writable page in its memory region. Its purpose is to keep data
> > > synced between CPU and DMA device when swapping happens after mem
> > > deregistration ops. Now we choose not to set page dirty bit if it's
> > > already set by kernel prior to calling __ib_umem_release(). This
> > > reduces memory deregistration time by half or even more when we ran
> > > application simulation test program.
> > As far as I can tell this code doesn't even need set_page_dirty_lock
> > and could just use set_page_dirty
> 
> It seems that set_page_dirty_lock has been used here for more than 10 years.
> Don't know the original purpose. Maybe it was used to prevent races between
> setting dirty bits and swapping out pages?

I suspect copy & paste.  Or maybe I don't actually understand the
explanation of set_page_dirty vs set_page_dirty_lock enough.  But
I'd rather not hack around the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qing Huang May 23, 2017, 9:39 p.m. UTC | #4
On 5/23/2017 12:42 AM, Christoph Hellwig wrote:
> On Mon, May 22, 2017 at 04:43:57PM -0700, Qing Huang wrote:
>> On 5/19/2017 6:05 AM, Christoph Hellwig wrote:
>>> On Thu, May 18, 2017 at 04:33:53PM -0700, Qing Huang wrote:
>>>> This change will optimize kernel memory deregistration operations.
>>>> __ib_umem_release() used to call set_page_dirty_lock() against every
>>>> writable page in its memory region. Its purpose is to keep data
>>>> synced between CPU and DMA device when swapping happens after mem
>>>> deregistration ops. Now we choose not to set page dirty bit if it's
>>>> already set by kernel prior to calling __ib_umem_release(). This
>>>> reduces memory deregistration time by half or even more when we ran
>>>> application simulation test program.
>>> As far as I can tell this code doesn't even need set_page_dirty_lock
>>> and could just use set_page_dirty
>> It seems that set_page_dirty_lock has been used here for more than 10 years.
>> Don't know the original purpose. Maybe it was used to prevent races between
>> setting dirty bits and swapping out pages?
> I suspect copy & paste.  Or maybe I don't actually understand the
> explanation of set_page_dirty vs set_page_dirty_lock enough.  But
> I'd rather not hack around the problem.
> --
I think there are two parts here. First part is that we don't need to 
set the dirty bit if it's already set. Second part is whether we use 
set_page_dirty or set_page_dirty_lock to set dirty bits.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford June 1, 2017, 10:33 p.m. UTC | #5
On Thu, 2017-05-18 at 16:33 -0700, Qing Huang wrote:
> This change will optimize kernel memory deregistration operations.
> __ib_umem_release() used to call set_page_dirty_lock() against every
> writable page in its memory region. Its purpose is to keep data
> synced between CPU and DMA device when swapping happens after mem
> deregistration ops. Now we choose not to set page dirty bit if it's
> already set by kernel prior to calling __ib_umem_release(). This
> reduces memory deregistration time by half or even more when we ran
> application simulation test program.
> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>

Thanks, applied.
diff mbox

Patch

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 3dbf811..21e60b1 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -58,7 +58,7 @@  static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 	for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {
 
 		page = sg_page(sg);
-		if (umem->writable && dirty)
+		if (!PageDirty(page) && umem->writable && dirty)
 			set_page_dirty_lock(page);
 		put_page(page);
 	}