diff mbox

[v3] btrfs: add missing memset while reading compressed inline extents

Message ID 1489025563-8071-1-git-send-email-ce3g8jdj@umail.furryterror.org (mailing list archive)
State Superseded
Headers show

Commit Message

Zygo Blaxell March 9, 2017, 2:12 a.m. UTC
This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

        item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
                inode generation 50 transid 50 size 47424 nbytes 49141
                block group 0 mode 100644 links 1 uid 0 gid 0
                rdev 0 flags 0x0(none)
        item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
                inode ref index 3 namelen 10 name: DB_File.so
        item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
                inline extent data size 1341 ram 4085 compress(zlib)
        item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
                extent data disk byte 5367308288 nr 20480
                extent data offset 0 nr 45056 ram 45056
                extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating an inline extent and hole pattern leading to the same data
corruption/infoleak on read:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

	# touch foo
	# chattr +c foo
	# xfs_io -f -c "pwrite -W 0 1000" foo
	# xfs_io -f -c "falloc 4 8188" foo
	# od -x foo
	# echo 3 >/proc/sys/vm/drop_caches
	# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed by zeros
to the end of the file:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
	0001760 0000 0000 0000 0000 0000 0000 0000 0000
	*
	0020000

Actual output:  the data from byte 1000 to the end of the first
4096 byte page will be corrupt/infoleak:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
	0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
	0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
	(...)

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
---

v3: Clarify that there are two distinct methods to create the hole,
but both lead to the same corruption/infoleak when the hole is read.
No code change.

v2: I'm not able to contrive a test case where pg_offset != 0, but we                                                             
might as well handle it anyway.                                                                                                   

 fs/btrfs/inode.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Chris Mason March 9, 2017, 3:39 p.m. UTC | #1
On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> This is a story about 4 distinct (and very old) btrfs bugs.
>

Really great write up.

[ ... ]

>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 25ac2cf..4d41a31 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
>  	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>  	ret = btrfs_decompress(compress_type, tmp, page,
>  			       extent_offset, inline_size, max_size);
> +	WARN_ON(max_size + pg_offset > PAGE_SIZE);

Can you please drop this WARN_ON and make the math reflect any possible 
pg_offset?  I do agree it shouldn't be happening, but its easy to 
correct for and the WARN is likely to get lost.

> +	if (max_size + pg_offset < PAGE_SIZE) {
> +		char *map = kmap(page);
> +		memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);
> +		kunmap(page);
> +	}

Both lzo and zlib have a memset to cover the gap between what they 
actually decompress and the max_size that we pass here.  That's 
important because ram_bytes may not be 100% accurate.

Can you also please toss in a comment about how the decompression code 
is responsible for the memset up to max_bytes?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zygo Blaxell March 10, 2017, 4:41 a.m. UTC | #2
On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> 
> 
> On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >This is a story about 4 distinct (and very old) btrfs bugs.
> >
> 
> Really great write up.
> 
> [ ... ]
> 
> >
> >diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >index 25ac2cf..4d41a31 100644
> >--- a/fs/btrfs/inode.c
> >+++ b/fs/btrfs/inode.c
> >@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
> > 	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > 	ret = btrfs_decompress(compress_type, tmp, page,
> > 			       extent_offset, inline_size, max_size);
> >+	WARN_ON(max_size + pg_offset > PAGE_SIZE);
> 
> Can you please drop this WARN_ON and make the math reflect any possible
> pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> for and the WARN is likely to get lost.

I'm not sure how to do that.  It looks like I'd have to pass pg_offset
through btrfs_decompress to the decompress functions?

	ret = btrfs_decompress(compress_type, tmp, page,
		extent_offset, inline_size, max_size, pg_offset);

and in the compression functions get pg_offset from the argument list
instead of hardcoding zero.

But how does pg_offset become non-zero for an inline extent?  A micro-hole
before the first byte?  If the offset was >= 4096, the data wouldn't
be in the first block so there would never be an inline extent in the
first place.

> >+	if (max_size + pg_offset < PAGE_SIZE) {
> >+		char *map = kmap(page);
> >+		memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);
> >+		kunmap(page);
> >+	}
> 
> Both lzo and zlib have a memset to cover the gap between what they actually
> decompress and the max_size that we pass here.  That's important because
> ram_bytes may not be 100% accurate.
> 
> Can you also please toss in a comment about how the decompression code is
> responsible for the memset up to max_bytes?
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason March 10, 2017, 4:19 p.m. UTC | #3
On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
> On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
>>
>>
>> On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
>>> This is a story about 4 distinct (and very old) btrfs bugs.
>>>
>>
>> Really great write up.
>>
>> [ ... ]
>>
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 25ac2cf..4d41a31 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
>>> 	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>>> 	ret = btrfs_decompress(compress_type, tmp, page,
>>> 			       extent_offset, inline_size, max_size);
>>> +	WARN_ON(max_size + pg_offset > PAGE_SIZE);
>>
>> Can you please drop this WARN_ON and make the math reflect any possible
>> pg_offset?  I do agree it shouldn't be happening, but its easy to correct
>> for and the WARN is likely to get lost.
>
> I'm not sure how to do that.  It looks like I'd have to pass pg_offset
> through btrfs_decompress to the decompress functions?
>
> 	ret = btrfs_decompress(compress_type, tmp, page,
> 		extent_offset, inline_size, max_size, pg_offset);
>
> and in the compression functions get pg_offset from the argument list
> instead of hardcoding zero.

Yeah, it's a good point.  Both zlib and lzo are assuming a zero 
pg_offset right now, but just like there are wacky corners allowing 
inline extents followed by more data, there are a few wacky corners 
allowing inline extents at the end of the file.

Lets not mix that change in with this one though.  For now, just get the 
memset right and we can pass pg_offset down in a later patch.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zygo Blaxell March 10, 2017, 6:56 p.m. UTC | #4
On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:
> On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
> >On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> >>
> >>
> >>On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >>>This is a story about 4 distinct (and very old) btrfs bugs.
> >>>
> >>
> >>Really great write up.
> >>
> >>[ ... ]
> >>
> >>>
> >>>diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >>>index 25ac2cf..4d41a31 100644
> >>>--- a/fs/btrfs/inode.c
> >>>+++ b/fs/btrfs/inode.c
> >>>@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
> >>>	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> >>>	ret = btrfs_decompress(compress_type, tmp, page,
> >>>			       extent_offset, inline_size, max_size);
> >>>+	WARN_ON(max_size + pg_offset > PAGE_SIZE);
> >>
> >>Can you please drop this WARN_ON and make the math reflect any possible
> >>pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> >>for and the WARN is likely to get lost.
> >
> >I'm not sure how to do that.  It looks like I'd have to pass pg_offset
> >through btrfs_decompress to the decompress functions?
> >
> >	ret = btrfs_decompress(compress_type, tmp, page,
> >		extent_offset, inline_size, max_size, pg_offset);
> >
> >and in the compression functions get pg_offset from the argument list
> >instead of hardcoding zero.
> 
> Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
> right now, but just like there are wacky corners allowing inline extents
> followed by more data, there are a few wacky corners allowing inline extents
> at the end of the file.
> 
> Lets not mix that change in with this one though.  For now, just get the
> memset right and we can pass pg_offset down in a later patch.

Are you saying "fix the memset in the patch" (and if so, what's wrong
with it?), or are you saying "let's take the patch with its memset as is,
and fix the pg_offset > 0 issues later"?

> -chris
>
Chris Mason March 10, 2017, 7:12 p.m. UTC | #5
On 03/10/2017 01:56 PM, Zygo Blaxell wrote:
> On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:
>> On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
>>> On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
>>>>
>>>>
>>>> On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
>>>>> This is a story about 4 distinct (and very old) btrfs bugs.
>>>>>
>>>>
>>>> Really great write up.
>>>>
>>>> [ ... ]
>>>>
>>>>>
>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>>>> index 25ac2cf..4d41a31 100644
>>>>> --- a/fs/btrfs/inode.c
>>>>> +++ b/fs/btrfs/inode.c
>>>>> @@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
>>>>> 	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>>>>> 	ret = btrfs_decompress(compress_type, tmp, page,
>>>>> 			       extent_offset, inline_size, max_size);
>>>>> +	WARN_ON(max_size + pg_offset > PAGE_SIZE);
>>>>
>>>> Can you please drop this WARN_ON and make the math reflect any possible
>>>> pg_offset?  I do agree it shouldn't be happening, but its easy to correct
>>>> for and the WARN is likely to get lost.
>>>
>>> I'm not sure how to do that.  It looks like I'd have to pass pg_offset
>>> through btrfs_decompress to the decompress functions?
>>>
>>> 	ret = btrfs_decompress(compress_type, tmp, page,
>>> 		extent_offset, inline_size, max_size, pg_offset);
>>>
>>> and in the compression functions get pg_offset from the argument list
>>> instead of hardcoding zero.
>>
>> Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
>> right now, but just like there are wacky corners allowing inline extents
>> followed by more data, there are a few wacky corners allowing inline extents
>> at the end of the file.
>>
>> Lets not mix that change in with this one though.  For now, just get the
>> memset right and we can pass pg_offset down in a later patch.
>
> Are you saying "fix the memset in the patch" (and if so, what's wrong
> with it?), or are you saying "let's take the patch with its memset as is,
> and fix the pg_offset > 0 issues later"?

Your WARN_ON() would fire when this math is bad:

memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);

Instead or warning, just don't memset if pg_offset + max_size >= PAGE_SIZE

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zygo Blaxell March 10, 2017, 9:45 p.m. UTC | #6
On Fri, Mar 10, 2017 at 02:12:54PM -0500, Chris Mason wrote:
> 
> 
> On 03/10/2017 01:56 PM, Zygo Blaxell wrote:
> >On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:
> >>On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
> >>>On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> >>>>
> >>>>
> >>>>On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >>>>>This is a story about 4 distinct (and very old) btrfs bugs.
> >>>>>
> >>>>
> >>>>Really great write up.
> >>>>
> >>>>[ ... ]
> >>>>
> >>>>>
> >>>>>diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >>>>>index 25ac2cf..4d41a31 100644
> >>>>>--- a/fs/btrfs/inode.c
> >>>>>+++ b/fs/btrfs/inode.c
> >>>>>@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path *path,
> >>>>>	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> >>>>>	ret = btrfs_decompress(compress_type, tmp, page,
> >>>>>			       extent_offset, inline_size, max_size);
> >>>>>+	WARN_ON(max_size + pg_offset > PAGE_SIZE);
> >>>>
> >>>>Can you please drop this WARN_ON and make the math reflect any possible
> >>>>pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> >>>>for and the WARN is likely to get lost.
> >>>
> >>>I'm not sure how to do that.  It looks like I'd have to pass pg_offset
> >>>through btrfs_decompress to the decompress functions?
> >>>
> >>>	ret = btrfs_decompress(compress_type, tmp, page,
> >>>		extent_offset, inline_size, max_size, pg_offset);
> >>>
> >>>and in the compression functions get pg_offset from the argument list
> >>>instead of hardcoding zero.
> >>
> >>Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
> >>right now, but just like there are wacky corners allowing inline extents
> >>followed by more data, there are a few wacky corners allowing inline extents
> >>at the end of the file.
> >>
> >>Lets not mix that change in with this one though.  For now, just get the
> >>memset right and we can pass pg_offset down in a later patch.
> >
> >Are you saying "fix the memset in the patch" (and if so, what's wrong
> >with it?), or are you saying "let's take the patch with its memset as is,
> >and fix the pg_offset > 0 issues later"?
> 
> Your WARN_ON() would fire when this math is bad:
> 
> memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);
> 
> Instead or warning, just don't memset if pg_offset + max_size >= PAGE_SIZE

OK.

While I was looking at this function I noticed that there doesn't seem to be
a sanity check on the data in the extent ref.  e.g. ram_bytes could be 2GB
and nothing would notice.  I'm pretty sure that's only possible by fuzzing,
but it seemed worthwhile to log it if it ever happened.

I'll take the WARN_ON out, and also put in the comment you asked for in the
other branch of this thread.

> -chris
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25ac2cf..4d41a31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6805,6 +6805,12 @@  static noinline int uncompress_inline(struct btrfs_path *path,
 	max_size = min_t(unsigned long, PAGE_SIZE, max_size);
 	ret = btrfs_decompress(compress_type, tmp, page,
 			       extent_offset, inline_size, max_size);
+	WARN_ON(max_size + pg_offset > PAGE_SIZE);
+	if (max_size + pg_offset < PAGE_SIZE) {
+		char *map = kmap(page);
+		memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);
+		kunmap(page);
+	}
 	kfree(tmp);
 	return ret;
 }