Message ID | 20191016073711.4141-1-riteshh@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Ext4: Add support for blocksize < pagesize for dioread_nolock | expand |
Hi Ritesh, I haven't had a chance to dig into the test failures yet, but FYI.... when I ran the auto test group in xfstests, I saw failures for generic/219, generic 273, and generic/476 --- these errors did not show up when running using a standard 4k blocksize on x86, and they also did not show up when running dioread_nolock using a 4k blocksize. So I tried running "generic/219 generic/273 generic/476" 30 times, using in a Google Compute Engine VM, using gce-xfstests, and while I wasn't able to get generic/219 to fail when run in isolation, generic/273 seems to fail quite reliably, and generic/476 about a third of the time. How much testing have you done with these patches? Thanks, - Ted TESTRUNID: tytso-20191023144956 KERNEL: kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64 CMDLINE: --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476 CPUS: 2 MEM: 7680 ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds Failures: generic/273 generic/273 generic/273 generic/273 generic/476 generic/273 generic/476 generic/273 generic/273 generic/273 generic/476 generic/273 generic/476 generic/273 generic/476 generic/273 generic/476 generic/273 generic/273 generic/273 generic/273 generic/273 generic/476 generic/273 generic/273 generic/273 generic/273 generic/476 generic/273 generic/476 generic/273 generic/476 generic/273 generic/273 generic/273 generic/273 generic/273 generic/273 generic/273 generic/476 generic/273 generic/476 Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s
Hello Ted, On 10/24/19 4:56 AM, Theodore Y. Ts'o wrote: > Hi Ritesh, > > I haven't had a chance to dig into the test failures yet, but FYI.... > when I ran the auto test group in xfstests, I saw failures for > generic/219, generic 273, and generic/476 --- these errors did not > show up when running using a standard 4k blocksize on x86, and they > also did not show up when running dioread_nolock using a 4k blocksize. > Sorry about that. Were these 3 the only tests you saw to be failing, or there were more? > So I tried running "generic/219 generic/273 generic/476" 30 times, > using in a Google Compute Engine VM, using gce-xfstests, and while I > wasn't able to get generic/219 to fail when run in isolation, > generic/273 seems to fail quite reliably, and generic/476 about a > third of the time. > > How much testing have you done with these patches? I did test "quick" group of xfstests & ltp/fsx tests with the posted version. And had tested a full suite of xfstests with one of my previous version. I guess I was comparing against 1K blocksize without dioread_nolock. But as I think more about it, I may need to compare against vanilla kernel with 1K blocksize even without dioread_nolock. Since there may be some changes in blocksize < pagesize path with this patch. Also I see these tests(generic/273 & generic/476) are not part of quick group. Let me check more about these failing tests at my end. Thanks for your inputs. -ritesh > > Thanks, > > - Ted > > TESTRUNID: tytso-20191023144956 > KERNEL: kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64 > CMDLINE: --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476 > CPUS: 2 > MEM: 7680 > > ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds > Failures: generic/273 generic/273 generic/273 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/476 generic/273 generic/476 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/273 generic/273 generic/476 generic/273 > generic/273 generic/273 generic/273 generic/476 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/273 generic/273 generic/273 generic/273 > generic/476 generic/273 generic/476 > Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s >
Hello Ted, I tried reproducing issue generic/273 & generic/476 on vanilla 5.4-rc4. I could see that both of these could be very well reproduced with same symptoms on vanilla 5.4-rc4 as well with 1K blocksize on x86 (i.e. without this patch series). Although when I tried reproducing generic/273 & generic/476 on ppc64le (64K pagesize & 4K blocksize) both with & without these patches, I could not reproduce this issue on this platform. As of now, I haven't gone deep into analyzing failure for generic/476, but generic/273 seems to be failing with 1K blocksize because of "No space left in device". This is happening, since the free inode count is exhausted (mostly only with 1K blocksize). I will have to look into this on whether it needs any xfstest fixing or if there is something else going on. So it looks like these failed tests does not seem to be because of this patch series. But these are broken in general for at least 1K blocksize. Also as an FYI, it seems generic/388 is also broken for blocksize < pagesize for both platforms. So I will be planning to look into these failures (generic/273 generic/388 generic/476) in parallel. Do you think we can work on these failing tests separately, since it does not look to be failing because of these patches and go ahead in reviewing this current patch series? -ritesh On 10/24/19 4:56 AM, Theodore Y. Ts'o wrote: > Hi Ritesh, > > I haven't had a chance to dig into the test failures yet, but FYI.... > when I ran the auto test group in xfstests, I saw failures for > generic/219, generic 273, and generic/476 --- these errors did not > show up when running using a standard 4k blocksize on x86, and they > also did not show up when running dioread_nolock using a 4k blocksize. > > So I tried running "generic/219 generic/273 generic/476" 30 time > using in a Google Compute Engine VM, using gce-xfstests, and while I > wasn't able to get generic/219 to fail when run in isolation, > generic/273 seems to fail quite reliably, and generic/476 about a > third of the time. > > How much testing have you done with these patches? > > Thanks, > > - Ted > > TESTRUNID: tytso-20191023144956 > KERNEL: kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64 > CMDLINE: --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476 > CPUS: 2 > MEM: 7680 > > ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds > Failures: generic/273 generic/273 generic/273 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/476 generic/273 generic/476 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/273 generic/273 generic/476 generic/273 > generic/273 generic/273 generic/273 generic/476 generic/273 > generic/476 generic/273 generic/476 generic/273 generic/273 > generic/273 generic/273 generic/273 generic/273 generic/273 > generic/476 generic/273 generic/476 > Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s >
On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: > > So it looks like these failed tests does not seem to be because of this > patch series. But these are broken in general for at least 1K blocksize. Agreed, I failed to add them to the exclude list for diread_nolock_1k. Thanks for pointing that out! After looking through these patches, it looks good. So, I've landed this series on the ext4 git tree. There are some potential conflicts with Matthew's DIO using imap patch set. I tried resolving them in the obvious way (see the tt/mb-dio branch[1] on ext4.git), and unfortunately, there is a flaky test failure with generic/270 --- 2 times out 30 runs of generic/270, the file system is left inconsistent, with problems found in the block allocation bitmap. [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio I've verified that generic/270 isn't a problem on -rc3, and it's not a problem with just your patch series. So, it's almost certain it's because I screwed up the merge. I applied each of Matthew's patch one at a time, and conflict was in changes in ext4_end_io_dio, which is dropped in Matthew's patch. It wasn't obvious though where the dioread-nolock-1k change should be applied in Matthew's patch series. Could you take a look? Thanks!! > Also as an FYI, it seems generic/388 is also broken for blocksize < > pagesize for both platforms. So I will be planning to look into these > failures (generic/273 generic/388 generic/476) in parallel. generic/388 is broken in a flaky fashion on all of the tests. That's a shutdown test, and it's a known issue, having to do with how we forcibly shut down the journalling machinery not being quite right. Since unclean power off and/or dropped fibre channel/iSCSI case is handled correctly, this hasn't been high on the priority list to track down and fix. > Do you think we can work on these failing tests separately, since it does > not look to be failing because of these patches and go ahead in > reviewing this current patch series? Oh, I agree, those failures are pre-existing test failures, and so it shouldn't and doens't block this series. Thanks for your work on this feature! - Ted
On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote: > On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: > > > > So it looks like these failed tests does not seem to be because of this > > patch series. But these are broken in general for at least 1K blocksize. > > Agreed, I failed to add them to the exclude list for diread_nolock_1k. > Thanks for pointing that out! > > After looking through these patches, it looks good. So, I've landed > this series on the ext4 git tree. > > There are some potential conflicts with Matthew's DIO using imap patch > set. I tried resolving them in the obvious way (see the tt/mb-dio > branch[1] on ext4.git), and unfortunately, there is a flaky test > failure with generic/270 --- 2 times out 30 runs of generic/270, the > file system is left inconsistent, with problems found in the block > allocation bitmap. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio > > I've verified that generic/270 isn't a problem on -rc3, and it's not a > problem with just your patch series. So, it's almost certain it's > because I screwed up the merge. I applied each of Matthew's patch one > at a time, and conflict was in changes in ext4_end_io_dio, which is > dropped in Matthew's patch. It wasn't obvious though where the > dioread-nolock-1k change should be applied in Matthew's patch series. > Could you take a look? Thanks!! Hang on a second. Are we not prematurely merging this series in with master? I thought that this is something that should've come after the iomap direct I/O port, no? The use of io_end's within the new direct I/O implementation are effectively redundant... /M
On 11/4/19 3:46 PM, Matthew Bobrowski wrote: > On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote: >> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: >>> >>> So it looks like these failed tests does not seem to be because of this >>> patch series. But these are broken in general for at least 1K blocksize. >> >> Agreed, I failed to add them to the exclude list for diread_nolock_1k. >> Thanks for pointing that out! >> >> After looking through these patches, it looks good. So, I've landed >> this series on the ext4 git tree. >> >> There are some potential conflicts with Matthew's DIO using imap patch >> set. I tried resolving them in the obvious way (see the tt/mb-dio >> branch[1] on ext4.git), and unfortunately, there is a flaky test >> failure with generic/270 --- 2 times out 30 runs of generic/270, the >> file system is left inconsistent, with problems found in the block >> allocation bitmap. >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio >> >> I've verified that generic/270 isn't a problem on -rc3, and it's not a >> problem with just your patch series. So, it's almost certain it's >> because I screwed up the merge. I applied each of Matthew's patch one >> at a time, and conflict was in changes in ext4_end_io_dio, which is >> dropped in Matthew's patch. It wasn't obvious though where the >> dioread-nolock-1k change should be applied in Matthew's patch series. >> Could you take a look? Thanks!! > > Hang on a second. > > Are we not prematurely merging this series in with master? I thought > that this is something that should've come after the iomap direct I/O > port, no? The use of io_end's within the new direct I/O implementation > are effectively redundant... It sure may be giving a merge conflict (due to io_end structure). But this dioread_nolock series was not dependent over iomap series. -ritesh
On 11/4/19 12:46 AM, Theodore Y. Ts'o wrote: > On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: >> >> So it looks like these failed tests does not seem to be because of this >> patch series. But these are broken in general for at least 1K blocksize. > > Agreed, I failed to add them to the exclude list for diread_nolock_1k. > Thanks for pointing that out! > > After looking through these patches, it looks good. So, I've landed > this series on the ext4 git tree. > > There are some potential conflicts with Matthew's DIO using imap patch > set. I tried resolving them in the obvious way (see the tt/mb-dio > branch[1] on ext4.git), and unfortunately, there is a flaky test > failure with generic/270 --- 2 times out 30 runs of generic/270, the > file system is left inconsistent, with problems found in the block > allocation bitmap. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio > > I've verified that generic/270 isn't a problem on -rc3, and it's not a > problem with just your patch series. So, it's almost certain it's > because I screwed up the merge. I applied each of Matthew's patch one > at a time, and conflict was in changes in ext4_end_io_dio, which is > dropped in Matthew's patch. It wasn't obvious though where the > dioread-nolock-1k change should be applied in Matthew's patch series. > Could you take a look? Thanks!! Sure. Let me take a look at this and get back. Meanwhile, if possible could you please help with what xfstest config is failing and the failure details, if possible. Just curious to know about it. -ritesh
On Mon, Nov 04, 2019 at 04:07:56PM +0530, Ritesh Harjani wrote: > On 11/4/19 3:46 PM, Matthew Bobrowski wrote: > > On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote: > > > On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: > > > > > > > > So it looks like these failed tests does not seem to be because of this > > > > patch series. But these are broken in general for at least 1K blocksize. > > > > > > Agreed, I failed to add them to the exclude list for diread_nolock_1k. > > > Thanks for pointing that out! > > > > > > After looking through these patches, it looks good. So, I've landed > > > this series on the ext4 git tree. > > > > > > There are some potential conflicts with Matthew's DIO using imap patch > > > set. I tried resolving them in the obvious way (see the tt/mb-dio > > > branch[1] on ext4.git), and unfortunately, there is a flaky test > > > failure with generic/270 --- 2 times out 30 runs of generic/270, the > > > file system is left inconsistent, with problems found in the block > > > allocation bitmap. > > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio > > > > > > I've verified that generic/270 isn't a problem on -rc3, and it's not a > > > problem with just your patch series. So, it's almost certain it's > > > because I screwed up the merge. I applied each of Matthew's patch one > > > at a time, and conflict was in changes in ext4_end_io_dio, which is > > > dropped in Matthew's patch. It wasn't obvious though where the > > > dioread-nolock-1k change should be applied in Matthew's patch series. > > > Could you take a look? Thanks!! > > > > Hang on a second. > > > > Are we not prematurely merging this series in with master? I thought > > that this is something that should've come after the iomap direct I/O > > port, no? The use of io_end's within the new direct I/O implementation > > are effectively redundant... > > It sure may be giving a merge conflict (due to io_end structure). > But this dioread_nolock series was not dependent over iomap series. Uh ha. Well, there's been a chunk of code injected into ext4_end_io_dio() here and by me removing it, I'm not entirely sure what the downstream effects will be for this specific change... /M
On 11/4/19 4:13 PM, Ritesh Harjani wrote: > > > On 11/4/19 12:46 AM, Theodore Y. Ts'o wrote: >> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote: >>> >>> So it looks like these failed tests does not seem to be because of this >>> patch series. But these are broken in general for at least 1K blocksize. >> >> Agreed, I failed to add them to the exclude list for diread_nolock_1k. >> Thanks for pointing that out! >> >> After looking through these patches, it looks good. So, I've landed >> this series on the ext4 git tree. >> >> There are some potential conflicts with Matthew's DIO using imap patch >> set. I tried resolving them in the obvious way (see the tt/mb-dio >> branch[1] on ext4.git), and unfortunately, there is a flaky test >> failure with generic/270 --- 2 times out 30 runs of generic/270, the >> file system is left inconsistent, with problems found in the block >> allocation bitmap. >> >> [1] >> https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio >> >> >> I've verified that generic/270 isn't a problem on -rc3, and it's not a >> problem with just your patch series. So, it's almost certain it's >> because I screwed up the merge. I applied each of Matthew's patch one >> at a time, and conflict was in changes in ext4_end_io_dio, which is >> dropped in Matthew's patch. It wasn't obvious though where the >> dioread-nolock-1k change should be applied in Matthew's patch series. >> Could you take a look? Thanks!! Looked into the above mentioned branch. I see the patches of iomap patch series were properly applied. It is ok to completely remove function "ext4_end_io_dio" and directly call for "ext4_convert_unwritten_extents" from "ext4_dio_write_end_io" (which is also done by default). Actually io_end_vec in dioread_nolock series was used for AIO DIO, since AIO+DIO was already using io_end structure and all io_end usage was moved to use io_end_vec. But since in iomap, we don't use io_end structure so it is ok to directly call for ext4_convert_unwritten_extents with offset & size argument. hmm, I will try and run this generic/270 at my end to see what is going wrong with this. It would be good if you could share the xfstest config under which this is failing & if possible the failed logs/report. Meanwhile, I will test generic/270 on your given branch with dioread_nolock & 1K bs configuration. -ritesh
On Mon, Nov 04, 2019 at 09:49:14PM +1100, Matthew Bobrowski wrote: > > It sure may be giving a merge conflict (due to io_end structure). > > But this dioread_nolock series was not dependent over iomap series. > > Uh ha. Well, there's been a chunk of code injected into > ext4_end_io_dio() here and by me removing it, I'm not entirely sure > what the downstream effects will be for this specific change... Yeah, that was probably my failure to do the merge correctly; I'm hoping that Ritesh will be able to fix that up. If not we can throw an "experimental" config to enable dioread_nolock on subpage blocksizes, just to warn people that under some extreme workloads, they might end up corrupting their allocation bitmap, which then might lead to data loss. I suspect it would actually work fine for most users; but out of paranoia, if we can't figure out the generic/270 failure before the merge window, we can just make dioread_nolock_1k experimental for now. - Ted
On Wed 16-10-19 13:07:06, Ritesh Harjani wrote: > This patch series adds the support for blocksize < pagesize for > dioread_nolock feature. > > Since in case of blocksize < pagesize, we can have multiple > small buffers of page as unwritten extents, we need to > maintain a vector of these unwritten extents which needs > the conversion after the IO is complete. Thus, we maintain > a list of tuple <offset, size> pair (io_end_vec) for this & > traverse this list to do the unwritten to written conversion. > > Appreciate any reviews/comments on this patches. I know Ted has merged the patches already so this is just informational but I've read the patches and they look fine to me. Thanks for the work! I was just thinking that we could actually make the vector tracking more efficient because the io_end always looks like: one-big-extent-to-fully-write + whatever it takes to fully write out the last page So your vectors could be also expressed as "extent to write" + bitmap of blocks written in the last page. And 64-bits are enough for the bitmap for anything ext4 supports so we could easily save allocation of ioend_vec etc. Just a suggestion. Honza
On 11/6/19 10:53 PM, Jan Kara wrote: > On Wed 16-10-19 13:07:06, Ritesh Harjani wrote: >> This patch series adds the support for blocksize < pagesize for >> dioread_nolock feature. >> >> Since in case of blocksize < pagesize, we can have multiple >> small buffers of page as unwritten extents, we need to >> maintain a vector of these unwritten extents which needs >> the conversion after the IO is complete. Thus, we maintain >> a list of tuple <offset, size> pair (io_end_vec) for this & >> traverse this list to do the unwritten to written conversion. >> >> Appreciate any reviews/comments on this patches. > > I know Ted has merged the patches already so this is just informational but > I've read the patches and they look fine to me. Thanks for the work! I was Appreciate your help too for valuable feedback & pointers at various places. > just thinking that we could actually make the vector tracking more > efficient because the io_end always looks like: > > one-big-extent-to-fully-write + whatever it takes to fully write out the > last page > > So your vectors could be also expressed as "extent to write" + bitmap of > blocks written in the last page. And 64-bits are enough for the bitmap for > anything ext4 supports so we could easily save allocation of ioend_vec etc. > Just a suggestion. Yes, sounds good to me. Although slab allocations are also fast. However I agree this should be more efficient and will also avoid the list management and or list pointer traversal. Sure, will work over this optimization once I get some closure on some ongoing open items. Thanks & appreciate your feedback. ritesh