diff mbox

xfs: fix bmv_count confusion w/ shared extents

Message ID 20170126031114.GS9134@birch.djwong.org (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Darrick J. Wong Jan. 26, 2017, 3:11 a.m. UTC
In a bmapx call, bmv_count is the total size of the array, including the
zeroth element that userspace uses to supply the search key.  The output
array starts at offset 1 so that we can set up the user for the next
invocation.  Since we now can split an extent into multiple bmap records
due to shared/unshared status, we have to be careful that we don't
overflow the output array.

In the original patch f86f403794b ("xfs: teach get_bmapx about shared
extents and the CoW fork") I used cur_ext (the output index) to check
for overflows, albeit with an off-by-one error.  Since nexleft describes
the number of unfilled slots in the output, we can rip all that out and
use nexleft for the check directly.

Failure to do this causes heap corruption in bmapx callers such as
xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.

Suggested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: simplify the loop accounting to use nexleft for the output checks
---
 fs/xfs/xfs_bmap_util.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen Jan. 26, 2017, 5:33 p.m. UTC | #1
On 1/25/17 9:11 PM, Darrick J. Wong wrote:
> In a bmapx call, bmv_count is the total size of the array, including the
> zeroth element that userspace uses to supply the search key.  The output
> array starts at offset 1 so that we can set up the user for the next
> invocation.  Since we now can split an extent into multiple bmap records
> due to shared/unshared status, we have to be careful that we don't
> overflow the output array.
> 
> In the original patch f86f403794b ("xfs: teach get_bmapx about shared
> extents and the CoW fork") I used cur_ext (the output index) to check
> for overflows, albeit with an off-by-one error.  Since nexleft describes
> the number of unfilled slots in the output, we can rip all that out and
> use nexleft for the check directly.
> 
> Failure to do this causes heap corruption in bmapx callers such as
> xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.
> 
> Suggested-by: Eric Sandeen <sandeen@sandeen.net>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Yup, I think this is better, thanks.  Comments around the
whole inject_map business would be nice, but *shrug* doesn't
have to be in this patch.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> ---
> v2: simplify the loop accounting to use nexleft for the output checks
> ---
>  fs/xfs/xfs_bmap_util.c |   10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index b9abce5..fc6bdaf 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -697,8 +697,7 @@ xfs_getbmap(
>  			goto out_free_map;
>  		ASSERT(nmap <= subnex);
>  
> -		for (i = 0; i < nmap && nexleft && bmv->bmv_length &&
> -				cur_ext < bmv->bmv_count; i++) {
> +		for (i = 0; i < nmap && nexleft && bmv->bmv_length; i++) {
>  			out[cur_ext].bmv_oflags = 0;
>  			if (map[i].br_state == XFS_EXT_UNWRITTEN)
>  				out[cur_ext].bmv_oflags |= BMV_OF_PREALLOC;
> @@ -760,16 +759,15 @@ xfs_getbmap(
>  				continue;
>  			}
>  
> +			nexleft--;
>  			if (inject_map.br_startblock != NULLFSBLOCK) {
>  				map[i] = inject_map;
>  				i--;
> -			} else
> -				nexleft--;
> +			}
>  			bmv->bmv_entries++;
>  			cur_ext++;
>  		}
> -	} while (nmap && nexleft && bmv->bmv_length &&
> -		 cur_ext < bmv->bmv_count);
> +	} while (nmap && nexleft && bmv->bmv_length);
>  
>   out_free_map:
>  	kmem_free(map);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Jan. 26, 2017, 9:54 p.m. UTC | #2
On Thu, Jan 26, 2017 at 11:33:03AM -0600, Eric Sandeen wrote:
> On 1/25/17 9:11 PM, Darrick J. Wong wrote:
> > In a bmapx call, bmv_count is the total size of the array, including the
> > zeroth element that userspace uses to supply the search key.  The output
> > array starts at offset 1 so that we can set up the user for the next
> > invocation.  Since we now can split an extent into multiple bmap records
> > due to shared/unshared status, we have to be careful that we don't
> > overflow the output array.
> > 
> > In the original patch f86f403794b ("xfs: teach get_bmapx about shared
> > extents and the CoW fork") I used cur_ext (the output index) to check
> > for overflows, albeit with an off-by-one error.  Since nexleft describes
> > the number of unfilled slots in the output, we can rip all that out and
> > use nexleft for the check directly.
> > 
> > Failure to do this causes heap corruption in bmapx callers such as
> > xfs_io and xfs_scrub.  xfs/328 can reproduce this problem.
> > 
> > Suggested-by: Eric Sandeen <sandeen@sandeen.net>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Yup, I think this is better, thanks.  Comments around the
> whole inject_map business would be nice, but *shrug* doesn't
> have to be in this patch.
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> > ---
> > v2: simplify the loop accounting to use nexleft for the output checks
> > ---
> >  fs/xfs/xfs_bmap_util.c |   10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index b9abce5..fc6bdaf 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -697,8 +697,7 @@ xfs_getbmap(
> >  			goto out_free_map;
> >  		ASSERT(nmap <= subnex);
> >  
> > -		for (i = 0; i < nmap && nexleft && bmv->bmv_length &&
> > -				cur_ext < bmv->bmv_count; i++) {
> > +		for (i = 0; i < nmap && nexleft && bmv->bmv_length; i++) {

NAK.  I forgot that nexleft is min(bmv_count-1, di_nextents), which
means that that if we have one partially shared bmbt extent and
bmv_count = 1000, we only return the first part of that bmbt extent to
the user.  Worse yet, we also return with bmv_entries < bmv_count-1,
which leads xfs_io to stop calling bmapx prematurely.

That leads to xfs/280 regressing, so I'm going to resubmit the v1 of
this patch, but with improved commenting so that nobody else will miss
this again.

--D

> >  			out[cur_ext].bmv_oflags = 0;
> >  			if (map[i].br_state == XFS_EXT_UNWRITTEN)
> >  				out[cur_ext].bmv_oflags |= BMV_OF_PREALLOC;
> > @@ -760,16 +759,15 @@ xfs_getbmap(
> >  				continue;
> >  			}
> >  
> > +			nexleft--;
> >  			if (inject_map.br_startblock != NULLFSBLOCK) {
> >  				map[i] = inject_map;
> >  				i--;
> > -			} else
> > -				nexleft--;
> > +			}
> >  			bmv->bmv_entries++;
> >  			cur_ext++;
> >  		}
> > -	} while (nmap && nexleft && bmv->bmv_length &&
> > -		 cur_ext < bmv->bmv_count);
> > +	} while (nmap && nexleft && bmv->bmv_length);
> >  
> >   out_free_map:
> >  	kmem_free(map);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b9abce5..fc6bdaf 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -697,8 +697,7 @@  xfs_getbmap(
 			goto out_free_map;
 		ASSERT(nmap <= subnex);
 
-		for (i = 0; i < nmap && nexleft && bmv->bmv_length &&
-				cur_ext < bmv->bmv_count; i++) {
+		for (i = 0; i < nmap && nexleft && bmv->bmv_length; i++) {
 			out[cur_ext].bmv_oflags = 0;
 			if (map[i].br_state == XFS_EXT_UNWRITTEN)
 				out[cur_ext].bmv_oflags |= BMV_OF_PREALLOC;
@@ -760,16 +759,15 @@  xfs_getbmap(
 				continue;
 			}
 
+			nexleft--;
 			if (inject_map.br_startblock != NULLFSBLOCK) {
 				map[i] = inject_map;
 				i--;
-			} else
-				nexleft--;
+			}
 			bmv->bmv_entries++;
 			cur_ext++;
 		}
-	} while (nmap && nexleft && bmv->bmv_length &&
-		 cur_ext < bmv->bmv_count);
+	} while (nmap && nexleft && bmv->bmv_length);
 
  out_free_map:
 	kmem_free(map);