diff mbox

[v2,2/5] dax: increase granularity of dax_clear_blocks() operations

Message ID 20151022171026.38343.6371.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State Superseded
Headers show

Commit Message

Dan Williams Oct. 22, 2015, 5:10 p.m. UTC
dax_clear_blocks is currently performing a cond_resched() after every
PAGE_SIZE memset.  We need not check so frequently, for example md-raid
only calls cond_resched() at stripe granularity.  Also, in preparation
for introducing a dax_map_atomic() operation that temporarily pins a dax
mapping move the call to cond_resched() to the outer loop.

Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

Comments

Jeff Moyer Oct. 22, 2015, 9:04 p.m. UTC | #1
Dan Williams <dan.j.williams@intel.com> writes:

> dax_clear_blocks is currently performing a cond_resched() after every
> PAGE_SIZE memset.  We need not check so frequently, for example md-raid
> only calls cond_resched() at stripe granularity.  Also, in preparation
> for introducing a dax_map_atomic() operation that temporarily pins a dax
> mapping move the call to cond_resched() to the outer loop.

There's nothing wrong with the mechanics here, but why bother?  I only
see 1 caller in the kernel, and that caller passes in
1<<inode->i_blkbits for the size (so 1 page or less).  Did you plan to
add other callers?  I don't see them in this particular patch set.

Again, I'm not taking issue with the patch, I'm just wondering what
motivated the change.

Thanks!
Jeff

>
> Reviewed-by: Jan Kara <jack@suse.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/dax.c |   27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 5dc33d788d50..f8e543839e5c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -28,6 +28,7 @@
>  #include <linux/sched.h>
>  #include <linux/uio.h>
>  #include <linux/vmstat.h>
> +#include <linux/sizes.h>
>  
>  int dax_clear_blocks(struct inode *inode, sector_t block, long size)
>  {
> @@ -38,24 +39,20 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
>  	do {
>  		void __pmem *addr;
>  		unsigned long pfn;
> -		long count;
> +		long count, sz;
>  
> -		count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
> +		sz = min_t(long, size, SZ_1M);
> +		count = bdev_direct_access(bdev, sector, &addr, &pfn, sz);
>  		if (count < 0)
>  			return count;
> -		BUG_ON(size < count);
> -		while (count > 0) {
> -			unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
> -			if (pgsz > count)
> -				pgsz = count;
> -			clear_pmem(addr, pgsz);
> -			addr += pgsz;
> -			size -= pgsz;
> -			count -= pgsz;
> -			BUG_ON(pgsz & 511);
> -			sector += pgsz / 512;
> -			cond_resched();
> -		}
> +		if (count < sz)
> +			sz = count;
> +		clear_pmem(addr, sz);
> +		addr += sz;
> +		size -= sz;
> +		BUG_ON(sz & 511);
> +		sector += sz / 512;
> +		cond_resched();
>  	} while (size);
>  
>  	wmb_pmem();
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
Dan Williams Oct. 22, 2015, 10:57 p.m. UTC | #2
On Thu, 2015-10-22 at 17:04 -0400, Jeff Moyer wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
> > dax_clear_blocks is currently performing a cond_resched() after every
> > PAGE_SIZE memset.  We need not check so frequently, for example md-raid
> > only calls cond_resched() at stripe granularity.  Also, in preparation
> > for introducing a dax_map_atomic() operation that temporarily pins a dax
> > mapping move the call to cond_resched() to the outer loop.
> 
> There's nothing wrong with the mechanics here, but why bother?  I only
> see 1 caller in the kernel, and that caller passes in
> 1<<inode->i_blkbits for the size (so 1 page or less).  Did you plan to
> add other callers?  I don't see them in this particular patch set.
> 
> Again, I'm not taking issue with the patch, I'm just wondering what
> motivated the change.

The motivation is the subsequent patch to wrap all touches of pmem
within a dax_map_atomic() / dax_unmap_atomic() pairing.  If I just do
the straightforward conversion of this function to dax_map_atomic() it
looks something like this:


> diff --git a/fs/dax.c b/fs/dax.c
> index 5dc33d788d50..fa2a2a255d3a 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -40,9 +40,9 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
>               unsigned long pfn;
>               long count;
>  
> -             count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
> -             if (count < 0)
> -                     return count;
> +             addr = __dax_map_atomic(bdev, sector, size, &pfn, &count);
> +             if (IS_ERR(addr))
> +                     return PTR_ERR(addr);
>               BUG_ON(size < count);
>               while (count > 0) {
>                       unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
> @@ -56,6 +56,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
>                       sector += pgsz / 512;
>                       cond_resched();
>               }
> +             dax_unmap_atomic(bdev, addr);
>       } while (size);
>  
>       wmb_pmem();

The problem is that intervening call to cond_resched().  I later want to
inject an rcu_read_lock()/unlock() pair to allow flushing active
dax_map_atomic() usages at driver teardown time [1].  But, I think the
patch stands alone as a cleanup outside of that admittedly hidden
motivation.

[1]: "mm, pmem: devm_memunmap_pages(), truncate and unmap ZONE_DEVICE
pages"
https://lists.01.org/pipermail/linux-nvdimm/2015-October/002406.html
Jeff Moyer Oct. 28, 2015, 9:02 p.m. UTC | #3
"Williams, Dan J" <dan.j.williams@intel.com> writes:

> The problem is that intervening call to cond_resched().  I later want to
> inject an rcu_read_lock()/unlock() pair to allow flushing active
> dax_map_atomic() usages at driver teardown time [1].  But, I think the
> patch stands alone as a cleanup outside of that admittedly hidden
> motivation.

I'm not going to split hairs.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
diff mbox

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 5dc33d788d50..f8e543839e5c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -28,6 +28,7 @@ 
 #include <linux/sched.h>
 #include <linux/uio.h>
 #include <linux/vmstat.h>
+#include <linux/sizes.h>
 
 int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 {
@@ -38,24 +39,20 @@  int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 	do {
 		void __pmem *addr;
 		unsigned long pfn;
-		long count;
+		long count, sz;
 
-		count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
+		sz = min_t(long, size, SZ_1M);
+		count = bdev_direct_access(bdev, sector, &addr, &pfn, sz);
 		if (count < 0)
 			return count;
-		BUG_ON(size < count);
-		while (count > 0) {
-			unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
-			if (pgsz > count)
-				pgsz = count;
-			clear_pmem(addr, pgsz);
-			addr += pgsz;
-			size -= pgsz;
-			count -= pgsz;
-			BUG_ON(pgsz & 511);
-			sector += pgsz / 512;
-			cond_resched();
-		}
+		if (count < sz)
+			sz = count;
+		clear_pmem(addr, sz);
+		addr += sz;
+		size -= sz;
+		BUG_ON(sz & 511);
+		sector += sz / 512;
+		cond_resched();
 	} while (size);
 
 	wmb_pmem();