diff mbox

tests/xfs: filestream allocator inode use-after-free test

Message ID 20180406141815.6472-1-bfoster@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Brian Foster April 6, 2018, 2:18 p.m. UTC
The XFS filestreams allocator caches dir inode -> agno mappings in
an MRU mechanism that holds elements in memory for an amount of time
and then cleans up expired elements in the background. The elements
typically held inode pointers without holding a reference to the
associated inode. This means that if the inode is reclaimed before
an expired entry is cleaned up, the MRU reaper can access freed
memory and cause a panic.

Test for this problem by performing continuous filestreams
allocations under short-lived parent directory inodes. This will
produce KASAN use-after-free splats if enabled during the test.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This test reproduces the problem described in this[1] thread. It's
XFS-specific because of the filestream option and specific geometry used
to format the scratch device.

Brian

[1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2

 tests/xfs/445     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/445.out |   2 +
 tests/xfs/group   |   1 +
 3 files changed, 114 insertions(+)
 create mode 100755 tests/xfs/445
 create mode 100644 tests/xfs/445.out

Comments

Eryu Guan April 25, 2018, 3:22 a.m. UTC | #1
On Fri, Apr 06, 2018 at 10:18:15AM -0400, Brian Foster wrote:
> The XFS filestreams allocator caches dir inode -> agno mappings in
> an MRU mechanism that holds elements in memory for an amount of time
> and then cleans up expired elements in the background. The elements
> typically held inode pointers without holding a reference to the
> associated inode. This means that if the inode is reclaimed before
> an expired entry is cleaned up, the MRU reaper can access freed
> memory and cause a panic.
> 
> Test for this problem by performing continuous filestreams
> allocations under short-lived parent directory inodes. This will
> produce KASAN use-after-free splats if enabled during the test.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This test reproduces the problem described in this[1] thread. It's
> XFS-specific because of the filestream option and specific geometry used
> to format the scratch device.
> 
> Brian
> 
> [1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2

From above thread, it seems that we don't need the kernel change
anymore, and the test itself would trigger dmesg check failure when
testing on kernel with kasan enabled, right?

But I've looped the test for 200 times and it all passed without
triggering any KASAN warnings, kernel is v4.17-rc2.

> 
>  tests/xfs/445     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/445.out |   2 +
>  tests/xfs/group   |   1 +
>  3 files changed, 114 insertions(+)
>  create mode 100755 tests/xfs/445
>  create mode 100644 tests/xfs/445.out
> 
> diff --git a/tests/xfs/445 b/tests/xfs/445
> new file mode 100755
> index 00000000..331a7dd4
> --- /dev/null
> +++ b/tests/xfs/445
> @@ -0,0 +1,111 @@
> +#! /bin/bash
> +# FS QA Test 445
> +#
> +# Test the XFS filestreams allocator for use-after-free inode access. The
> +# filestreams allocator uses the MRU and historically kept around unreferenced
> +# inode pointers in each element. These pointers could outlive the inodes they
> +# referred to and thus lead to access of freed or reused memory when the MRU
> +# element was reaped. Test for this problem by performing filestream allocations
> +# against short-lived parent directory inodes.
> +#
> +# Note that some form of kernel debug mechanism for use-after-free detection
> +# (i.e., KASAN) is required for this test to reproduce the original problem.
> +# This is because XFS uses a kmem cache for xfs_inode objects which means that
> +# the backing pages for freed inodes may still reside in the cache with the
> +# freed inodes in a partially initialized state.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/filestreams
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +drop_caches()
> +{
> +	while [ true ]; do
> +		echo 2 > /proc/sys/vm/drop_caches
> +		sleep 1
> +	done
> +}
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +
> +# check for filestreams and min device size
> +_check_filestreams_support || _notrun "filestreams not available"
> +devsize=`blockdev --getsize64 $SCRATCH_DEV`

There're a few other tests follow the same pattern in getting device
size, seems it's time to make it a helper function :)

> +[ $devsize -gt $((2 * 1024 * 1024 * 1024)) ] || _notrun "scratch dev too small"
> +
> +# use small AGs for frequent stream switching
> +_scratch_mkfs_xfs -d agsize=20m,size=2g >> $seqres.full 2>&1 ||
> +	_fail "mkfs failed"
> +_scratch_mount "-o filestreams" || _fail "mount failed"

No need to _fail on _scratch_mount failure, it's the default behavior
now, since commit 69eb6281a9d3 ("fstests: _fail test by default when
_scratch_mount fails").

Thanks,
Eryu

> +
> +# start background inode reclaim
> +drop_caches &
> +pid=$!
> +
> +# Stress the filestreams allocator via continuous allocation to a file under
> +# different parent dirs. Remove the old dirs as the file is moved so the MRU
> +# references point to an unlinked inode by the time they are removed. If the
> +# old dir inodes are reclaimed and associated memory reused, MRU cleanup can
> +# access the inode after it's been freed.
> +dir=$SCRATCH_MNT
> +for i in $(seq 0 90); do
> +	mkdir -p $dir/$i
> +	$XFS_IO_PROG -fc "falloc $(($i * 20))m 20m" $dir/$i/file
> +
> +	mkdir -p $dir/$((i + 1))
> +	mv $dir/$i/file $dir/$((i + 1))/file
> +	rmdir $dir/$i
> +
> +	# throttle to ensure this loop sees several cache reclaims
> +	sleep 0.1
> +done
> +
> +kill $pid 2> /dev/null
> +wait $pid 2> /dev/null
> +
> +echo Silence is golden
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/445.out b/tests/xfs/445.out
> new file mode 100644
> index 00000000..44e55d20
> --- /dev/null
> +++ b/tests/xfs/445.out
> @@ -0,0 +1,2 @@
> +QA output created by 445
> +Silence is golden
> diff --git a/tests/xfs/group b/tests/xfs/group
> index 831f2cfa..2a7dec6f 100644
> --- a/tests/xfs/group
> +++ b/tests/xfs/group
> @@ -442,3 +442,4 @@
>  442 auto stress clone quota
>  443 auto quick ioctl fsr
>  444 auto quick
> +445 auto filestreams
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Foster April 25, 2018, 11:53 a.m. UTC | #2
On Wed, Apr 25, 2018 at 11:22:21AM +0800, Eryu Guan wrote:
> On Fri, Apr 06, 2018 at 10:18:15AM -0400, Brian Foster wrote:
> > The XFS filestreams allocator caches dir inode -> agno mappings in
> > an MRU mechanism that holds elements in memory for an amount of time
> > and then cleans up expired elements in the background. The elements
> > typically held inode pointers without holding a reference to the
> > associated inode. This means that if the inode is reclaimed before
> > an expired entry is cleaned up, the MRU reaper can access freed
> > memory and cause a panic.
> > 
> > Test for this problem by performing continuous filestreams
> > allocations under short-lived parent directory inodes. This will
> > produce KASAN use-after-free splats if enabled during the test.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > This test reproduces the problem described in this[1] thread. It's
> > XFS-specific because of the filestream option and specific geometry used
> > to format the scratch device.
> > 
> > Brian
> > 
> > [1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2
> 
> From above thread, it seems that we don't need the kernel change
> anymore, and the test itself would trigger dmesg check failure when
> testing on kernel with kasan enabled, right?
> 

Yep...

> But I've looped the test for 200 times and it all passed without
> triggering any KASAN warnings, kernel is v4.17-rc2.
> 

The kernel fix ended up being a patch from Christoph. It looks like it
made it into v4.17-rc1 as commit 7fcd3efa1e ("xfs: remove filestream
item xfs_inode reference"). Could you perhaps try an older kernel or one
with that patch reverted?

> > 
> >  tests/xfs/445     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/445.out |   2 +
> >  tests/xfs/group   |   1 +
> >  3 files changed, 114 insertions(+)
> >  create mode 100755 tests/xfs/445
> >  create mode 100644 tests/xfs/445.out
> > 
> > diff --git a/tests/xfs/445 b/tests/xfs/445
> > new file mode 100755
> > index 00000000..331a7dd4
> > --- /dev/null
> > +++ b/tests/xfs/445
> > @@ -0,0 +1,111 @@
> > +#! /bin/bash
> > +# FS QA Test 445
> > +#
> > +# Test the XFS filestreams allocator for use-after-free inode access. The
> > +# filestreams allocator uses the MRU and historically kept around unreferenced
> > +# inode pointers in each element. These pointers could outlive the inodes they
> > +# referred to and thus lead to access of freed or reused memory when the MRU
> > +# element was reaped. Test for this problem by performing filestream allocations
> > +# against short-lived parent directory inodes.
> > +#
> > +# Note that some form of kernel debug mechanism for use-after-free detection
> > +# (i.e., KASAN) is required for this test to reproduce the original problem.
> > +# This is because XFS uses a kmem cache for xfs_inode objects which means that
> > +# the backing pages for freed inodes may still reside in the cache with the
> > +# freed inodes in a partially initialized state.
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#-----------------------------------------------------------------------
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +. ./common/filestreams
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +drop_caches()
> > +{
> > +	while [ true ]; do
> > +		echo 2 > /proc/sys/vm/drop_caches
> > +		sleep 1
> > +	done
> > +}
> > +
> > +# Modify as appropriate.
> > +_supported_fs generic
> > +_supported_os Linux
> > +_require_scratch
> > +
> > +# check for filestreams and min device size
> > +_check_filestreams_support || _notrun "filestreams not available"
> > +devsize=`blockdev --getsize64 $SCRATCH_DEV`
> 
> There're a few other tests follow the same pattern in getting device
> size, seems it's time to make it a helper function :)
> 

TBH I wasn't quite sure this was necessary with a 2GB requirement, but I
figured it couldn't hurt just in case. A helper seems fine.. are you
thinking something like _require_scratch_size() perhaps?

> > +[ $devsize -gt $((2 * 1024 * 1024 * 1024)) ] || _notrun "scratch dev too small"
> > +
> > +# use small AGs for frequent stream switching
> > +_scratch_mkfs_xfs -d agsize=20m,size=2g >> $seqres.full 2>&1 ||
> > +	_fail "mkfs failed"
> > +_scratch_mount "-o filestreams" || _fail "mount failed"
> 
> No need to _fail on _scratch_mount failure, it's the default behavior
> now, since commit 69eb6281a9d3 ("fstests: _fail test by default when
> _scratch_mount fails").
> 

Ah, right. I'll drop that, thanks!

Brian

> Thanks,
> Eryu
> 
> > +
> > +# start background inode reclaim
> > +drop_caches &
> > +pid=$!
> > +
> > +# Stress the filestreams allocator via continuous allocation to a file under
> > +# different parent dirs. Remove the old dirs as the file is moved so the MRU
> > +# references point to an unlinked inode by the time they are removed. If the
> > +# old dir inodes are reclaimed and associated memory reused, MRU cleanup can
> > +# access the inode after it's been freed.
> > +dir=$SCRATCH_MNT
> > +for i in $(seq 0 90); do
> > +	mkdir -p $dir/$i
> > +	$XFS_IO_PROG -fc "falloc $(($i * 20))m 20m" $dir/$i/file
> > +
> > +	mkdir -p $dir/$((i + 1))
> > +	mv $dir/$i/file $dir/$((i + 1))/file
> > +	rmdir $dir/$i
> > +
> > +	# throttle to ensure this loop sees several cache reclaims
> > +	sleep 0.1
> > +done
> > +
> > +kill $pid 2> /dev/null
> > +wait $pid 2> /dev/null
> > +
> > +echo Silence is golden
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/445.out b/tests/xfs/445.out
> > new file mode 100644
> > index 00000000..44e55d20
> > --- /dev/null
> > +++ b/tests/xfs/445.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 445
> > +Silence is golden
> > diff --git a/tests/xfs/group b/tests/xfs/group
> > index 831f2cfa..2a7dec6f 100644
> > --- a/tests/xfs/group
> > +++ b/tests/xfs/group
> > @@ -442,3 +442,4 @@
> >  442 auto stress clone quota
> >  443 auto quick ioctl fsr
> >  444 auto quick
> > +445 auto filestreams
> > -- 
> > 2.13.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan April 26, 2018, 1:54 a.m. UTC | #3
On Wed, Apr 25, 2018 at 07:53:41AM -0400, Brian Foster wrote:
> On Wed, Apr 25, 2018 at 11:22:21AM +0800, Eryu Guan wrote:
> > On Fri, Apr 06, 2018 at 10:18:15AM -0400, Brian Foster wrote:
> > > The XFS filestreams allocator caches dir inode -> agno mappings in
> > > an MRU mechanism that holds elements in memory for an amount of time
> > > and then cleans up expired elements in the background. The elements
> > > typically held inode pointers without holding a reference to the
> > > associated inode. This means that if the inode is reclaimed before
> > > an expired entry is cleaned up, the MRU reaper can access freed
> > > memory and cause a panic.
> > > 
> > > Test for this problem by performing continuous filestreams
> > > allocations under short-lived parent directory inodes. This will
> > > produce KASAN use-after-free splats if enabled during the test.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > This test reproduces the problem described in this[1] thread. It's
> > > XFS-specific because of the filestream option and specific geometry used
> > > to format the scratch device.
> > > 
> > > Brian
> > > 
> > > [1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2
> > 
> > From above thread, it seems that we don't need the kernel change
> > anymore, and the test itself would trigger dmesg check failure when
> > testing on kernel with kasan enabled, right?
> > 
> 
> Yep...
> 
> > But I've looped the test for 200 times and it all passed without
> > triggering any KASAN warnings, kernel is v4.17-rc2.
> > 
> 
> The kernel fix ended up being a patch from Christoph. It looks like it
> made it into v4.17-rc1 as commit 7fcd3efa1e ("xfs: remove filestream
> item xfs_inode reference"). Could you perhaps try an older kernel or one
> with that patch reverted?

Sure, I'll try reverting that patch.

> 
> > > 
> > >  tests/xfs/445     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/xfs/445.out |   2 +
> > >  tests/xfs/group   |   1 +
> > >  3 files changed, 114 insertions(+)
> > >  create mode 100755 tests/xfs/445
> > >  create mode 100644 tests/xfs/445.out
> > > 
> > > diff --git a/tests/xfs/445 b/tests/xfs/445
> > > new file mode 100755
> > > index 00000000..331a7dd4
> > > --- /dev/null
> > > +++ b/tests/xfs/445
> > > @@ -0,0 +1,111 @@
> > > +#! /bin/bash
> > > +# FS QA Test 445
> > > +#
> > > +# Test the XFS filestreams allocator for use-after-free inode access. The
> > > +# filestreams allocator uses the MRU and historically kept around unreferenced
> > > +# inode pointers in each element. These pointers could outlive the inodes they
> > > +# referred to and thus lead to access of freed or reused memory when the MRU
> > > +# element was reaped. Test for this problem by performing filestream allocations
> > > +# against short-lived parent directory inodes.
> > > +#
> > > +# Note that some form of kernel debug mechanism for use-after-free detection
> > > +# (i.e., KASAN) is required for this test to reproduce the original problem.
> > > +# This is because XFS uses a kmem cache for xfs_inode objects which means that
> > > +# the backing pages for freed inodes may still reside in the cache with the
> > > +# freed inodes in a partially initialized state.
> > > +#
> > > +#-----------------------------------------------------------------------
> > > +# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > > +# modify it under the terms of the GNU General Public License as
> > > +# published by the Free Software Foundation.
> > > +#
> > > +# This program is distributed in the hope that it would be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public License
> > > +# along with this program; if not, write the Free Software Foundation,
> > > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > > +#-----------------------------------------------------------------------
> > > +#
> > > +
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +
> > > +here=`pwd`
> > > +tmp=/tmp/$$
> > > +status=1	# failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +. ./common/filestreams
> > > +
> > > +# remove previous $seqres.full before test
> > > +rm -f $seqres.full
> > > +
> > > +# real QA test starts here
> > > +drop_caches()
> > > +{
> > > +	while [ true ]; do
> > > +		echo 2 > /proc/sys/vm/drop_caches
> > > +		sleep 1
> > > +	done
> > > +}
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch
> > > +
> > > +# check for filestreams and min device size
> > > +_check_filestreams_support || _notrun "filestreams not available"
> > > +devsize=`blockdev --getsize64 $SCRATCH_DEV`
> > 
> > There're a few other tests follow the same pattern in getting device
> > size, seems it's time to make it a helper function :)
> > 
> 
> TBH I wasn't quite sure this was necessary with a 2GB requirement, but I
> figured it couldn't hurt just in case. A helper seems fine.. are you

Agreed.

> thinking something like _require_scratch_size() perhaps?

Yeah, that sounds good to me. BTW, I just noticed that there's already a
_get_device_size() helper, which was introduced by me almost 4 years
ago.., but it queries /proc/partitions, looks like we can reuse/rewrite
it somehow.

Thanks,
Eryu

> 
> > > +[ $devsize -gt $((2 * 1024 * 1024 * 1024)) ] || _notrun "scratch dev too small"
> > > +
> > > +# use small AGs for frequent stream switching
> > > +_scratch_mkfs_xfs -d agsize=20m,size=2g >> $seqres.full 2>&1 ||
> > > +	_fail "mkfs failed"
> > > +_scratch_mount "-o filestreams" || _fail "mount failed"
> > 
> > No need to _fail on _scratch_mount failure, it's the default behavior
> > now, since commit 69eb6281a9d3 ("fstests: _fail test by default when
> > _scratch_mount fails").
> > 
> 
> Ah, right. I'll drop that, thanks!
> 
> Brian
> 
> > Thanks,
> > Eryu
> > 
> > > +
> > > +# start background inode reclaim
> > > +drop_caches &
> > > +pid=$!
> > > +
> > > +# Stress the filestreams allocator via continuous allocation to a file under
> > > +# different parent dirs. Remove the old dirs as the file is moved so the MRU
> > > +# references point to an unlinked inode by the time they are removed. If the
> > > +# old dir inodes are reclaimed and associated memory reused, MRU cleanup can
> > > +# access the inode after it's been freed.
> > > +dir=$SCRATCH_MNT
> > > +for i in $(seq 0 90); do
> > > +	mkdir -p $dir/$i
> > > +	$XFS_IO_PROG -fc "falloc $(($i * 20))m 20m" $dir/$i/file
> > > +
> > > +	mkdir -p $dir/$((i + 1))
> > > +	mv $dir/$i/file $dir/$((i + 1))/file
> > > +	rmdir $dir/$i
> > > +
> > > +	# throttle to ensure this loop sees several cache reclaims
> > > +	sleep 0.1
> > > +done
> > > +
> > > +kill $pid 2> /dev/null
> > > +wait $pid 2> /dev/null
> > > +
> > > +echo Silence is golden
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/xfs/445.out b/tests/xfs/445.out
> > > new file mode 100644
> > > index 00000000..44e55d20
> > > --- /dev/null
> > > +++ b/tests/xfs/445.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 445
> > > +Silence is golden
> > > diff --git a/tests/xfs/group b/tests/xfs/group
> > > index 831f2cfa..2a7dec6f 100644
> > > --- a/tests/xfs/group
> > > +++ b/tests/xfs/group
> > > @@ -442,3 +442,4 @@
> > >  442 auto stress clone quota
> > >  443 auto quick ioctl fsr
> > >  444 auto quick
> > > +445 auto filestreams
> > > -- 
> > > 2.13.6
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eryu Guan April 26, 2018, 2:01 a.m. UTC | #4
On Thu, Apr 26, 2018 at 09:54:56AM +0800, Eryu Guan wrote:
> On Wed, Apr 25, 2018 at 07:53:41AM -0400, Brian Foster wrote:
> > On Wed, Apr 25, 2018 at 11:22:21AM +0800, Eryu Guan wrote:
> > > On Fri, Apr 06, 2018 at 10:18:15AM -0400, Brian Foster wrote:
> > > > The XFS filestreams allocator caches dir inode -> agno mappings in
> > > > an MRU mechanism that holds elements in memory for an amount of time
> > > > and then cleans up expired elements in the background. The elements
> > > > typically held inode pointers without holding a reference to the
> > > > associated inode. This means that if the inode is reclaimed before
> > > > an expired entry is cleaned up, the MRU reaper can access freed
> > > > memory and cause a panic.
> > > > 
> > > > Test for this problem by performing continuous filestreams
> > > > allocations under short-lived parent directory inodes. This will
> > > > produce KASAN use-after-free splats if enabled during the test.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > 
> > > > This test reproduces the problem described in this[1] thread. It's
> > > > XFS-specific because of the filestream option and specific geometry used
> > > > to format the scratch device.
> > > > 
> > > > Brian
> > > > 
> > > > [1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2
> > > 
> > > From above thread, it seems that we don't need the kernel change
> > > anymore, and the test itself would trigger dmesg check failure when
> > > testing on kernel with kasan enabled, right?
> > > 
> > 
> > Yep...
> > 
> > > But I've looped the test for 200 times and it all passed without
> > > triggering any KASAN warnings, kernel is v4.17-rc2.
> > > 
> > 
> > The kernel fix ended up being a patch from Christoph. It looks like it
> > made it into v4.17-rc1 as commit 7fcd3efa1e ("xfs: remove filestream
> > item xfs_inode reference"). Could you perhaps try an older kernel or one
> > with that patch reverted?
> 
> Sure, I'll try reverting that patch.

Yeah, I hit KASAN warning quite quickly after reverting that patch.
Thanks!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tests/xfs/445 b/tests/xfs/445
new file mode 100755
index 00000000..331a7dd4
--- /dev/null
+++ b/tests/xfs/445
@@ -0,0 +1,111 @@ 
+#! /bin/bash
+# FS QA Test 445
+#
+# Test the XFS filestreams allocator for use-after-free inode access. The
+# filestreams allocator uses the MRU and historically kept around unreferenced
+# inode pointers in each element. These pointers could outlive the inodes they
+# referred to and thus lead to access of freed or reused memory when the MRU
+# element was reaped. Test for this problem by performing filestream allocations
+# against short-lived parent directory inodes.
+#
+# Note that some form of kernel debug mechanism for use-after-free detection
+# (i.e., KASAN) is required for this test to reproduce the original problem.
+# This is because XFS uses a kmem cache for xfs_inode objects which means that
+# the backing pages for freed inodes may still reside in the cache with the
+# freed inodes in a partially initialized state.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/filestreams
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+drop_caches()
+{
+	while [ true ]; do
+		echo 2 > /proc/sys/vm/drop_caches
+		sleep 1
+	done
+}
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+
+# check for filestreams and min device size
+_check_filestreams_support || _notrun "filestreams not available"
+devsize=`blockdev --getsize64 $SCRATCH_DEV`
+[ $devsize -gt $((2 * 1024 * 1024 * 1024)) ] || _notrun "scratch dev too small"
+
+# use small AGs for frequent stream switching
+_scratch_mkfs_xfs -d agsize=20m,size=2g >> $seqres.full 2>&1 ||
+	_fail "mkfs failed"
+_scratch_mount "-o filestreams" || _fail "mount failed"
+
+# start background inode reclaim
+drop_caches &
+pid=$!
+
+# Stress the filestreams allocator via continuous allocation to a file under
+# different parent dirs. Remove the old dirs as the file is moved so the MRU
+# references point to an unlinked inode by the time they are removed. If the
+# old dir inodes are reclaimed and associated memory reused, MRU cleanup can
+# access the inode after it's been freed.
+dir=$SCRATCH_MNT
+for i in $(seq 0 90); do
+	mkdir -p $dir/$i
+	$XFS_IO_PROG -fc "falloc $(($i * 20))m 20m" $dir/$i/file
+
+	mkdir -p $dir/$((i + 1))
+	mv $dir/$i/file $dir/$((i + 1))/file
+	rmdir $dir/$i
+
+	# throttle to ensure this loop sees several cache reclaims
+	sleep 0.1
+done
+
+kill $pid 2> /dev/null
+wait $pid 2> /dev/null
+
+echo Silence is golden
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/445.out b/tests/xfs/445.out
new file mode 100644
index 00000000..44e55d20
--- /dev/null
+++ b/tests/xfs/445.out
@@ -0,0 +1,2 @@ 
+QA output created by 445
+Silence is golden
diff --git a/tests/xfs/group b/tests/xfs/group
index 831f2cfa..2a7dec6f 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -442,3 +442,4 @@ 
 442 auto stress clone quota
 443 auto quick ioctl fsr
 444 auto quick
+445 auto filestreams