Message ID | cover.1635773880.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] Btrfs updates for 5.16 | expand |
On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@suse.com> wrote: > > There's a merge conflict due to the last minute 5.15 changes (kmap > reverts) and the conflict is not trivial. You don't say. I ended up just re-doing that resolution entirely, and as I did so, I think I found a bug in the original revert that caused the conflict in the first place. And since that revert made it into 5.15, I felt like I had to fix that bug first - and separately - so that the fix can be backported to stable. I then re-did my merge on top of that hopefully fixed state, and maybe it's correct. Or maybe I messed up entirely. I did end up comparing it to your other branch too, but that was equally as messy, apart from the "ok, I can mindlessly just take your side". And it was fairly different from what I had done in my merge resolution, so who knows. ANYWAY. What I'm trying to say is that you should look very very carefully at commits 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage") 037c50bfbeb3 ("Merge tag 'for-5.16-tag' of git://git.../linux") because I marked that first one for stable, and the second is obviously my entirely untested merge. It makes sense to me, but apart from "it builds", I've not actually tested any of it. This is all purely from looking at the code and trying to figure out what the RightThing(tm) is. Obviously the kmap thing tends to only be noticeable on 32-bit platforms, and that lzo_decompress_bio() bug also needs just the proper filesystem settings to trigger in the first place. Again - please take a careful look. Both at my merge and at that alleged kmap fix. Linus
The pull request you sent on Mon, 1 Nov 2021 17:45:56 +0100:
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.16-tag
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/037c50bfbeb33b4c74e120eef5b8b99d8f025418
Thank you!
On 2021/11/2 04:03, Linus Torvalds wrote: > On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@suse.com> wrote: >> >> There's a merge conflict due to the last minute 5.15 changes (kmap >> reverts) and the conflict is not trivial. > > You don't say. > > I ended up just re-doing that resolution entirely, and as I did so, I > think I found a bug in the original revert that caused the conflict in > the first place. > > And since that revert made it into 5.15, I felt like I had to fix that > bug first - and separately - so that the fix can be backported to > stable. > > I then re-did my merge on top of that hopefully fixed state, and maybe > it's correct. > > Or maybe I messed up entirely. > > I did end up comparing it to your other branch too, but that was > equally as messy, apart from the "ok, I can mindlessly just take your > side". > > And it was fairly different from what I had done in my merge > resolution, so who knows. > > ANYWAY. What I'm trying to say is that you should look very very > carefully at commits > > 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage") Since I'm doing the revert manually for lzo part, I double checked the code. It turns out, your fix is the same as the original version I sent to David (although not through the mail list). Full patch attached. @@ -345,8 +358,9 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb) (cur_in + LZO_LEN - 1) / sectorsize); cur_page = cb->compressed_pages[cur_in / PAGE_SIZE]; ASSERT(cur_page); - seg_len = read_compress_length(page_address(cur_page) + - offset_in_page(cur_in)); + kaddr = kmap(cur_page); + seg_len = read_compress_length(kaddr + offset_in_page(cur_in)); + kunmap(cur_page); cur_in += LZO_LEN; Thus it looks like by somehow my version is not applied? Thanks, Qu > 037c50bfbeb3 ("Merge tag 'for-5.16-tag' of git://git.../linux") > > because I marked that first one for stable, and the second is > obviously my entirely untested merge. > > It makes sense to me, but apart from "it builds", I've not actually > tested any of it. This is all purely from looking at the code and > trying to figure out what the RightThing(tm) is. > > Obviously the kmap thing tends to only be noticeable on 32-bit > platforms, and that lzo_decompress_bio() bug also needs just the > proper filesystem settings to trigger in the first place. > > Again - please take a careful look. Both at my merge and at that > alleged kmap fix. > > Linus >
On Tue, Nov 02, 2021 at 08:08:22AM +0800, Qu Wenruo wrote: > On 2021/11/2 04:03, Linus Torvalds wrote: > > On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@suse.com> wrote: > > it's correct. > > > > Or maybe I messed up entirely. > > > > I did end up comparing it to your other branch too, but that was > > equally as messy, apart from the "ok, I can mindlessly just take your > > side". > > > > And it was fairly different from what I had done in my merge > > resolution, so who knows. > > > > ANYWAY. What I'm trying to say is that you should look very very > > carefully at commits > > > > 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage") > > Since I'm doing the revert manually for lzo part, I double checked the code. > > It turns out, your fix is the same as the original version I sent to > David (although not through the mail list). > Full patch attached. > > @@ -345,8 +358,9 @@ int lzo_decompress_bio(struct list_head *ws, struct > compressed_bio *cb) > (cur_in + LZO_LEN - 1) / sectorsize); > cur_page = cb->compressed_pages[cur_in / PAGE_SIZE]; > ASSERT(cur_page); > - seg_len = read_compress_length(page_address(cur_page) + > - offset_in_page(cur_in)); > + kaddr = kmap(cur_page); > + seg_len = read_compress_length(kaddr + offset_in_page(cur_in)); > + kunmap(cur_page); > cur_in += LZO_LEN; > > Thus it looks like by somehow my version is not applied? Yeah, I had a look what you sent me, that version was correct. The mistake was on my side, a copy&paste error, sorry.
On Mon, Nov 01, 2021 at 01:03:25PM -0700, Linus Torvalds wrote: > On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@suse.com> wrote: > > > > There's a merge conflict due to the last minute 5.15 changes (kmap > > reverts) and the conflict is not trivial. > > You don't say. > > I ended up just re-doing that resolution entirely, and as I did so, I > think I found a bug in the original revert that caused the conflict in > the first place. > > And since that revert made it into 5.15, I felt like I had to fix that > bug first - and separately - so that the fix can be backported to > stable. > > I then re-did my merge on top of that hopefully fixed state, and maybe > it's correct. > > Or maybe I messed up entirely. > > I did end up comparing it to your other branch too, but that was > equally as messy, apart from the "ok, I can mindlessly just take your > side". > > And it was fairly different from what I had done in my merge > resolution, so who knows. Looks good to me, thanks for catching the bug. > ANYWAY. What I'm trying to say is that you should look very very > carefully at commits > > 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage") > 037c50bfbeb3 ("Merge tag 'for-5.16-tag' of git://git.../linux") > > because I marked that first one for stable, and the second is > obviously my entirely untested merge. > > It makes sense to me, but apart from "it builds", I've not actually > tested any of it. This is all purely from looking at the code and > trying to figure out what the RightThing(tm) is. > > Obviously the kmap thing tends to only be noticeable on 32-bit > platforms, and that lzo_decompress_bio() bug also needs just the > proper filesystem settings to trigger in the first place. > > Again - please take a careful look. Both at my merge and at that > alleged kmap fix. I checked the commits individually and in the source files, all the kmaps seem to be correctly paired with kunmaps. We'll do more tests too. I'm sorry that it turned out to be such mess.
On 2021/11/2 22:50, David Sterba wrote: > On Mon, Nov 01, 2021 at 01:03:25PM -0700, Linus Torvalds wrote: >> On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@suse.com> wrote: >>> >>> There's a merge conflict due to the last minute 5.15 changes (kmap >>> reverts) and the conflict is not trivial. >> >> You don't say. >> >> I ended up just re-doing that resolution entirely, and as I did so, I >> think I found a bug in the original revert that caused the conflict in >> the first place. >> >> And since that revert made it into 5.15, I felt like I had to fix that >> bug first - and separately - so that the fix can be backported to >> stable. >> >> I then re-did my merge on top of that hopefully fixed state, and maybe >> it's correct. >> >> Or maybe I messed up entirely. >> >> I did end up comparing it to your other branch too, but that was >> equally as messy, apart from the "ok, I can mindlessly just take your >> side". >> >> And it was fairly different from what I had done in my merge >> resolution, so who knows. > > Looks good to me, thanks for catching the bug. > >> ANYWAY. What I'm trying to say is that you should look very very >> carefully at commits >> >> 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage") >> 037c50bfbeb3 ("Merge tag 'for-5.16-tag' of git://git.../linux") >> >> because I marked that first one for stable, and the second is >> obviously my entirely untested merge. >> >> It makes sense to me, but apart from "it builds", I've not actually >> tested any of it. This is all purely from looking at the code and >> trying to figure out what the RightThing(tm) is. >> >> Obviously the kmap thing tends to only be noticeable on 32-bit >> platforms, and that lzo_decompress_bio() bug also needs just the >> proper filesystem settings to trigger in the first place. >> >> Again - please take a careful look. Both at my merge and at that >> alleged kmap fix. > > I checked the commits individually and in the source files, all the > kmaps seem to be correctly paired with kunmaps. We'll do more tests too. > I'm sorry that it turned out to be such mess. > OK, something is going weird now. As an extra step to make sure there is no leakage, I ran fstests with "compress=lzo" mount option, but the result can't be more ugly. For the master branch (which includes the fix), it has a very strange "leakage" behavior, at least in my x86 32bit VM. The used memory reported from free would just suddenly sky rocket during generic/027. (From regular 100~200M to 800M and more). Easily causing OOM for my 2G RAM VM. I originally think it's btrfs LZO, but nope. On v5.15 tag with the fix from Linus, generic/027 runs fine as expected with lzo compression. BTW, zlib/zstd compression runs fine. I guess there is something changed in MM layer causing the problem. Btrfs LZO has tons of quick kmap()/kunmap() pairs, unlike what we did in zlib/zstd, thus I guess that may be a triggering factor. Any clue? Thanks, Qu