Message ID | 1480309392-4786-1-git-send-email-ce3g8jdj@umail.furryterror.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 28 Nov 2016 00:03:12 -0500 Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8e3a5a2..b1314d6 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > ret = btrfs_decompress(compress_type, tmp, page, > extent_offset, inline_size, max_size); > + WARN_ON(max_size > PAGE_SIZE); > + if (max_size < PAGE_SIZE) { > + char *map = kmap(page); > + memset(map + max_size, 0, PAGE_SIZE - max_size); > + kunmap(page); > + } > kfree(tmp); > return ret; > } Wasn't this already posted as: btrfs: fix silent data corruption while reading compressed inline extents https://patchwork.kernel.org/patch/9371971/ but you don't indicate that's a V2 or something, and in fact the patch seems exactly the same, just the subject and commit message are entirely different. Quite confusing.
On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > On Mon, 28 Nov 2016 00:03:12 -0500 > Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 8e3a5a2..b1314d6 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, > > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > > ret = btrfs_decompress(compress_type, tmp, page, > > extent_offset, inline_size, max_size); > > + WARN_ON(max_size > PAGE_SIZE); > > + if (max_size < PAGE_SIZE) { > > + char *map = kmap(page); > > + memset(map + max_size, 0, PAGE_SIZE - max_size); > > + kunmap(page); > > + } > > kfree(tmp); > > return ret; > > } > > Wasn't this already posted as: > > btrfs: fix silent data corruption while reading compressed inline extents > https://patchwork.kernel.org/patch/9371971/ > > but you don't indicate that's a V2 or something, and in fact the patch seems > exactly the same, just the subject and commit message are entirely different. > Quite confusing. The previous commit message discussed the related hole-creation bug, including a reproducer; however, this patch does not fix the hole-creation bug and was never intended to. Despite my follow-up clarification, reviewers got distracted by the hole-creation bug discussion and didn't recover, so the patch didn't go anywhere. This patch only fixes _reading_ the holes after they are created, and the new commit message and subject line state that much more clearly. The patch didn't change, so I didn't add 'v2'. There's no 'v1' with the same title, so I thought a 'v2' tag would be more confusing than just starting over. The hole-creation bug is a very old, low-urgency issue. btrfs filesystems in the field have the buggy holes already, and have been creating new ones from 2009(*) to the present. I had to ask a few people before I found one who know whether it was even a bug, or intentional behavior from the beginning. (*) 2009 is the oldest commit date I can find that introduces a change which would only be necessary in the presence of the hole-creation bug. I have not been able to test kernels before 2012 because they crash while running my reproducer. > -- > With respect, > Roman >
Ping? I know at least two people have read this patch, but it hasn't appeared in the usual integration branches yet, and I've seen no actionable suggestion to improve it. I've provided two non-overlapping rationales for it. Is there something else you are looking for? This patch is a fix for a simple data corruption bug. It (or some equivalent fix for the same bug) should be on its way to all stable kernels starting from 2.6.32. Thanks On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > On Mon, 28 Nov 2016 00:03:12 -0500 > Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 8e3a5a2..b1314d6 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, > > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > > ret = btrfs_decompress(compress_type, tmp, page, > > extent_offset, inline_size, max_size); > > + WARN_ON(max_size > PAGE_SIZE); > > + if (max_size < PAGE_SIZE) { > > + char *map = kmap(page); > > + memset(map + max_size, 0, PAGE_SIZE - max_size); > > + kunmap(page); > > + } > > kfree(tmp); > > return ret; > > } > > Wasn't this already posted as: > > btrfs: fix silent data corruption while reading compressed inline extents > https://patchwork.kernel.org/patch/9371971/ > > but you don't indicate that's a V2 or something, and in fact the patch seems > exactly the same, just the subject and commit message are entirely different. > Quite confusing. > > -- > With respect, > Roman > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Zygo, Since the corruption happens after I/O and checksum, could it be possible to add some bug catcher code in code path for debug build, to help narrowing down the issue? Thanks, Xin Sent: Saturday, December 10, 2016 at 9:16 PM From: "Zygo Blaxell" <ce3g8jdj@umail.furryterror.org> To: "Roman Mamedov" <rm@romanrm.net>, "Filipe Manana" <fdmanana@gmail.com> Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents Ping? I know at least two people have read this patch, but it hasn't appeared in the usual integration branches yet, and I've seen no actionable suggestion to improve it. I've provided two non-overlapping rationales for it. Is there something else you are looking for? This patch is a fix for a simple data corruption bug. It (or some equivalent fix for the same bug) should be on its way to all stable kernels starting from 2.6.32. Thanks On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > On Mon, 28 Nov 2016 00:03:12 -0500 > Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 8e3a5a2..b1314d6 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, > > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > > ret = btrfs_decompress(compress_type, tmp, page, > > extent_offset, inline_size, max_size); > > + WARN_ON(max_size > PAGE_SIZE); > > + if (max_size < PAGE_SIZE) { > > + char *map = kmap(page); > > + memset(map + max_size, 0, PAGE_SIZE - max_size); > > + kunmap(page); > > + } > > kfree(tmp); > > return ret; > > } > > Wasn't this already posted as: > > btrfs: fix silent data corruption while reading compressed inline extents > https://patchwork.kernel.org/patch/9371971/ > > but you don't indicate that's a V2 or something, and in fact the patch seems > exactly the same, just the subject and commit message are entirely different. > Quite confusing. > > -- > With respect, > Roman > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 11, 2016 at 10:56:59PM +0100, Xin Zhou wrote: > Hi Zygo, > > Since the corruption happens after I/O and checksum, > could it be possible to add some bug catcher code in debug build, to help > narrowing down the issue? The corruption happens only during reads, after IO and checksum. It happens at the point that I have patched, after decompression when the decompressed data doesn't fill the extent and there is a hole after the extent. No other code exists in btrfs to fill the hole that forms in such cases. There is nothing left to narrow down. Chris Mason discovered the same bug in 2009 and fixed half of it in commit 93c82d5750. His fix only fixed one of two cases where corruption occurs (i.e. after uncompressed extents). The compressed extent case remained unfixed until the present. To be fair to everyone who missed this bug for seven years: it was quite a hard bug to spot. Over a period of 28 months I observed backup verification failures and eliminated competing theories until only this bug remained. There were at least two other bugs fixed between 2009 and 2014 which would have also produced corrupted data in compressed files under slightly different circumstances, so anyone looking for data corruption in compressed extents would have thought they'd solved the problem at least twice before now. There was a memset in *almost* the right spot in the code to fix the corruption bug until early 2014. Ironically, that memset (which was introduced in 2009) was itself the cause of another corruption bug. The memset was removed in 166ae5a418 ("btrfs: fix inline compressed read err corruption") but that commit removed the memset entirely instead of fixing it. It's hard to detect this bug in the wild because the uninitialized kernel data is often zero by accident (zero is typically the most common byte value in kernel memory) and the corrupted data is always holes. The programs that create files with the specific extent structure that triggers the corruption don't write any data to the corrupted regions of the files (e.g. the corrupted bytes are unused bytes in ELF headers, or EXIF thumbnail data in JPEG files). Such programs typically won't read the corrupted data bytes, so the corruption often has no visible impact on users even when it occurs. The file on disk is not corrupted--the kernel corrupts the file while reading it--so repeatedly copying and verifying the same files over and over (e.g. daily backups of the same origin data) will *eventually* succeed. That makes the problem look like a hardware issue. For several months I thought it *was* a hardware issue, but I kept finding similar corruption on too many dissimilar machines that were showing no other evidence of hardware problems. All that seems to be needed to produce corruption is a filesystem mounted with the compress option and max_inline >= 2048. Any version of btrfs from 2012 to the present behaves the same way (older kernels probably also behave the same way, but they crash too early to finish running my tests). So far I've discovered corrupted files in these places in the wild: - in build directories (e.g. Linux kernel or anything using GCC) - in /etc/lvm/{archive,backup}/* - in /var/log/ntpstats - in backups made with rsync -avxzHSP (but not with rsync -avxzHP) Here is a one-line script that will find potentially corrupted files on an existing filesystem. There is no special setup or repro script required, other than to set 'compress' in the mount options. This is just a developer's work box chosen at random: # cd /usr/src/linux # find -type f -size +4097c -exec sh -c 'for x; do if filefrag -v "$x" | sed -n "4p" | grep -q inline; then echo "$x"; fi; done' -- {} + ./arch/x86/kernel/.module.o.cmd ./block/.scsi_ioctl.o.cmd ./drivers/ata/.pata_sis.o.cmd # filefrag -v ./drivers/ata/.pata_sis.o.cmd Filesystem type is: 9123683e File size of ./drivers/ata/.pata_sis.o.cmd is 45677 (12 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 4095: 0.. 4095: 4096: not_aligned,inline 1: 1.. 1: 35322749.. 35322749: 1: 1: shared 2: 2.. 2: 35371776.. 35371776: 1: 35322750: shared 3: 3.. 3: 35532566.. 35532566: 1: 35371777: shared 4: 4.. 4: 35315682.. 35315682: 1: 35532567: shared 5: 5.. 5: 36010227.. 36010227: 1: 35315683: shared 6: 6.. 9: 35396173.. 35396176: 4: 36010228: encoded,shared 7: 10.. 10: 35913574.. 35913574: 1: 35396177: shared 8: 11.. 11: 35305969.. 35305969: 1: 35913575: last,eof ./drivers/ata/.pata_sis.o.cmd: 9 extents found Note that corruption can only occur if the first extent (the inline extent at offset 0) is compressed (encoded). Uncompressed inline extents (like the one above) will not be corrupted due to the fix in commit 93c82d5750. If commit 93c82d5750 is reverted, you can get corruption on uncompressed files too. > Thanks, > Xin > > Sent: Saturday, December 10, 2016 at 9:16 PM > From: "Zygo Blaxell" <ce3g8jdj@umail.furryterror.org> > To: "Roman Mamedov" <rm@romanrm.net>, "Filipe Manana" <fdmanana@gmail.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs: fix hole read corruption for compressed inline > extents > Ping? > > I know at least two people have read this patch, but it hasn't appeared in > the usual integration branches yet, and I've seen no actionable suggestion > to improve it. I've provided two non-overlapping rationales for it. > Is there something else you are looking for? > > This patch is a fix for a simple data corruption bug. It (or some > equivalent fix for the same bug) should be on its way to all stable > kernels starting from 2.6.32. > > Thanks > > On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > > On Mon, 28 Nov 2016 00:03:12 -0500 > > Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote: > > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > > index 8e3a5a2..b1314d6 100644 > > > --- a/fs/btrfs/inode.c > > > +++ b/fs/btrfs/inode.c > > > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct > btrfs_path *path, > > > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > > > ret = btrfs_decompress(compress_type, tmp, page, > > > extent_offset, inline_size, max_size); > > > + WARN_ON(max_size > PAGE_SIZE); > > > + if (max_size < PAGE_SIZE) { > > > + char *map = kmap(page); > > > + memset(map + max_size, 0, PAGE_SIZE - max_size); > > > + kunmap(page); > > > + } > > > kfree(tmp); > > > return ret; > > > } > > > > Wasn't this already posted as: > > > > btrfs: fix silent data corruption while reading compressed inline > extents > > [1]https://patchwork.kernel.org/patch/9371971/ > > > > but you don't indicate that's a V2 or something, and in fact the patch > seems > > exactly the same, just the subject and commit message are entirely > different. > > Quite confusing. > > > > -- > > With respect, > > Roman > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" > in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at [2]http://vger.kernel.org/majordomo-info.html > > References > > Visible links > 1. https://patchwork.kernel.org/patch/9371971/ > 2. http://vger.kernel.org/majordomo-info.html
Ping? This is still reproducible on 4.9.8. On Mon, Nov 28, 2016 at 12:03:12AM -0500, Zygo Blaxell wrote: > Commit c8b978188c ("Btrfs: Add zlib compression support") produces > data corruption when reading a file with a hole positioned after an > inline extent. btrfs_get_extent will return uninitialized kernel memory > instead of zero bytes in the hole. > > Commit 93c82d5750 ("Btrfs: zero page past end of inline file items") > fills the hole by memset to zero after *uncompressed* inline extents. > > This patch provides the missing memset for holes after *compressed* > inline extents. > > The offending holes appear in the wild and will appear during routine > data integrity audits (e.g. comparing backups against their originals). > They can also be created intentionally by fuzzing or crafting a filesystem > image. > > Holes like these are not intended to occur in btrfs; however, I tested > tagged kernels between v3.5 and the present, and found that all of them > can create such holes. Whether we like them or not, this kind of hole > is now part of the btrfs de-facto on-disk format, and we need to be able > to read such holes without an infoleak or wrong data. > > An example of a hole leading to data corruption: > > item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 > inode generation 50 transid 50 size 47424 nbytes 49141 > block group 0 mode 100644 links 1 uid 0 gid 0 > rdev 0 flags 0x0(none) > item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 > inode ref index 3 namelen 10 name: DB_File.so > item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 > inline extent data size 1341 ram 4085 compress(zlib) > item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 > extent data disk byte 5367308288 nr 20480 > extent data offset 0 nr 45056 ram 45056 > extent compression(zlib) > > Different data appears in userspace during each uncached read of the 10 > bytes between offset 4085 and 4095. The extent in item 63 is not long > enough to fill the first page of the file, so a memset is required to > fill the space between item 63 (ending at 4085) and item 64 (beginning > at 4096) with zero. > > Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > > --- > fs/btrfs/inode.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8e3a5a2..b1314d6 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > ret = btrfs_decompress(compress_type, tmp, page, > extent_offset, inline_size, max_size); > + WARN_ON(max_size > PAGE_SIZE); > + if (max_size < PAGE_SIZE) { > + char *map = kmap(page); > + memset(map + max_size, 0, PAGE_SIZE - max_size); > + kunmap(page); > + } > kfree(tmp); > return ret; > } > -- > 2.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Zygo, On Tue, Dec 13, 2016 at 01:40:13AM -0500, Zygo Blaxell wrote: > On Sun, Dec 11, 2016 at 10:56:59PM +0100, Xin Zhou wrote: > > Hi Zygo, > > > > Since the corruption happens after I/O and checksum, > > could it be possible to add some bug catcher code in debug build, to help > > narrowing down the issue? > > The corruption happens only during reads, after IO and checksum. > It happens at the point that I have patched, after decompression when > the decompressed data doesn't fill the extent and there is a hole after > the extent. No other code exists in btrfs to fill the hole that forms > in such cases. There is nothing left to narrow down. > > Chris Mason discovered the same bug in 2009 and fixed half of it in commit > 93c82d5750. His fix only fixed one of two cases where corruption occurs > (i.e. after uncompressed extents). The compressed extent case remained > unfixed until the present. > > To be fair to everyone who missed this bug for seven years: it was > quite a hard bug to spot. Over a period of 28 months I observed backup > verification failures and eliminated competing theories until only this > bug remained. > > There were at least two other bugs fixed between 2009 and 2014 which > would have also produced corrupted data in compressed files under > slightly different circumstances, so anyone looking for data corruption > in compressed extents would have thought they'd solved the problem at > least twice before now. > > There was a memset in *almost* the right spot in the code to fix the > corruption bug until early 2014. Ironically, that memset (which was > introduced in 2009) was itself the cause of another corruption bug. > The memset was removed in 166ae5a418 ("btrfs: fix inline compressed read > err corruption") but that commit removed the memset entirely instead of > fixing it. > Thanks for the nice analysis here, so I think btrfs does have this problem, I could reproduce this with fallocate if using 'page_poison=on' kernel command line (or enable CONFIG_PAGE_POISONING). Here are the steps, # touch foo # chattr +c foo # xfs_io -f -c "pwrite -W 0 1000" foo # xfs_io -f -c "falloc 4 8188" foo # od -x foo # echo 3 >/proc/sys/vm/drop_caches # od -x foo This produce the following on my box: 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000 0001760 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd * 0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d 0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400 0002000 435f 0056 5f74 6164 7400 645f 0062 5f74 (...) Regarding to this patch, - Although pg_offset is assumed to be zero, using (PAGE_SIZE - pg_offset) makes the code look saner, could you please add pg_offset to the check? - Could you please wrap my reproducer script into the commit log so that a later test could catch the point quickly? With those added, you can add Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8e3a5a2..b1314d6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path *path, max_size = min_t(unsigned long, PAGE_SIZE, max_size); ret = btrfs_decompress(compress_type, tmp, page, extent_offset, inline_size, max_size); + WARN_ON(max_size > PAGE_SIZE); + if (max_size < PAGE_SIZE) { + char *map = kmap(page); + memset(map + max_size, 0, PAGE_SIZE - max_size); + kunmap(page); + } kfree(tmp); return ret; }
Commit c8b978188c ("Btrfs: Add zlib compression support") produces data corruption when reading a file with a hole positioned after an inline extent. btrfs_get_extent will return uninitialized kernel memory instead of zero bytes in the hole. Commit 93c82d5750 ("Btrfs: zero page past end of inline file items") fills the hole by memset to zero after *uncompressed* inline extents. This patch provides the missing memset for holes after *compressed* inline extents. The offending holes appear in the wild and will appear during routine data integrity audits (e.g. comparing backups against their originals). They can also be created intentionally by fuzzing or crafting a filesystem image. Holes like these are not intended to occur in btrfs; however, I tested tagged kernels between v3.5 and the present, and found that all of them can create such holes. Whether we like them or not, this kind of hole is now part of the btrfs de-facto on-disk format, and we need to be able to read such holes without an infoleak or wrong data. An example of a hole leading to data corruption: item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160 inode generation 50 transid 50 size 47424 nbytes 49141 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 flags 0x0(none) item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20 inode ref index 3 namelen 10 name: DB_File.so item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362 inline extent data size 1341 ram 4085 compress(zlib) item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53 extent data disk byte 5367308288 nr 20480 extent data offset 0 nr 45056 ram 45056 extent compression(zlib) Different data appears in userspace during each uncached read of the 10 bytes between offset 4085 and 4095. The extent in item 63 is not long enough to fill the first page of the file, so a memset is required to fill the space between item 63 (ending at 4085) and item 64 (beginning at 4096) with zero. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> --- fs/btrfs/inode.c | 6 ++++++ 1 file changed, 6 insertions(+)