[RFC,0/5] Ext4: Add support for blocksize < pagesize for dioread_nolock

Message ID	20191016073711.4141-1-riteshh@linux.ibm.com (mailing list archive)
Headers	show Return-Path: <SRS0=CNkK=YJ=vger.kernel.org=linux-fsdevel-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <linux-fsdevel@vger.kernel.org> from <riteshh@linux.ibm.com>; Wed, 16 Oct 2019 08:37:20 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 16 Oct 2019 08:37:17 +0100 From: Ritesh Harjani <riteshh@linux.ibm.com> To: tytso@mit.edu, jack@suse.cz, linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, mbobrowski@mbobrowski.org, riteshh@linux.ibm.com Subject: [RFC 0/5] Ext4: Add support for blocksize < pagesize for dioread_nolock Date: Wed, 16 Oct 2019 13:07:06 +0530 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <20191016073711.4141-1-riteshh@linux.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk
Series	Ext4: Add support for blocksize < pagesize for dioread_nolock \| expand [RFC,0/5] Ext4: Add support for blocksize < pagesize for dioread_nolock [RFC,1/5] ext4: keep uniform naming convention for io & io_end variables [RFC,2/5] ext4: Add API to bring in support for unwritten io_end_vec conversion [RFC,3/5] ext4: Refactor mpage_map_and_submit_buffers function [RFC,4/5] ext4: Add support for blocksize < pagesize in dioread_nolock [RFC,5/5] ext4: Enable blocksize < pagesize for dioread_nolock

Ritesh Harjani Oct. 16, 2019, 7:37 a.m. UTC

This patch series adds the support for blocksize < pagesize for
dioread_nolock feature.

Since in case of blocksize < pagesize, we can have multiple
small buffers of page as unwritten extents, we need to
maintain a vector of these unwritten extents which needs
the conversion after the IO is complete. Thus, we maintain
a list of tuple <offset, size> pair (io_end_vec) for this &
traverse this list to do the unwritten to written conversion.

Appreciate any reviews/comments on this patches.

Tests completed
===============

All (which also passes in default config) "quick" group xfstests
are passing. Tested xfstests with below configurations.
	dioread_nolock with blocksize < pagesize
	dioread_nolock with blocksize == pagesize
	without dioread_nolock with blocksize < pagesize
	without dioread_nolock with blocksize == pagesize
ltp/fsx test with multiple iterations of 1 million ops
did not show any error.


About patches
=============

Patch 1 - 3: These are some cleanup and refactoring patches.
Patch 4: This patch adds the required support.
Patch 5: This patch removes the checks which was not allowing to mount
with dioread_nolock when blocksize != pagesize was true.

_Patches can be cleanly applied on today's linus tree master branch_

Ritesh Harjani (5):
  ext4: keep uniform naming convention for io & io_end variables
  ext4: Add API to bring in support for unwritten io_end_vec conversion
  ext4: Refactor mpage_map_and_submit_buffers function
  ext4: Add support for blocksize < pagesize in dioread_nolock
  ext4: Enable blocksize < pagesize for dioread_nolock

 fs/ext4/ext4.h    |  13 ++++-
 fs/ext4/extents.c |  49 ++++++++++++-----
 fs/ext4/inode.c   | 136 ++++++++++++++++++++++++++++++----------------
 fs/ext4/page-io.c | 110 ++++++++++++++++++++++++-------------
 fs/ext4/super.c   |  10 ----
 5 files changed, 208 insertions(+), 110 deletions(-)

Theodore Ts'o Oct. 23, 2019, 11:26 p.m. UTC | #1

Hi Ritesh,

I haven't had a chance to dig into the test failures yet, but FYI....
when I ran the auto test group in xfstests, I saw failures for
generic/219, generic 273, and generic/476 --- these errors did not
show up when running using a standard 4k blocksize on x86, and they
also did not show up when running dioread_nolock using a 4k blocksize.

So I tried running "generic/219 generic/273 generic/476" 30 times,
using in a Google Compute Engine VM, using gce-xfstests, and while I
wasn't able to get generic/219 to fail when run in isolation,
generic/273 seems to fail quite reliably, and generic/476 about a
third of the time.

How much testing have you done with these patches?

Thanks,

							- Ted

TESTRUNID: tytso-20191023144956
KERNEL:    kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64
CMDLINE:   --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476
CPUS:      2
MEM:       7680

ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds
  Failures: generic/273 generic/273 generic/273 generic/273
    generic/476 generic/273 generic/476 generic/273 generic/273
    generic/273 generic/476 generic/273 generic/476 generic/273
    generic/476 generic/273 generic/476 generic/273 generic/273
    generic/273 generic/273 generic/273 generic/476 generic/273
    generic/273 generic/273 generic/273 generic/476 generic/273
    generic/476 generic/273 generic/476 generic/273 generic/273
    generic/273 generic/273 generic/273 generic/273 generic/273
    generic/476 generic/273 generic/476
Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s

Ritesh Harjani Oct. 24, 2019, 1:12 a.m. UTC | #2

Hello Ted,

On 10/24/19 4:56 AM, Theodore Y. Ts'o wrote:
> Hi Ritesh,
> 
> I haven't had a chance to dig into the test failures yet, but FYI....
> when I ran the auto test group in xfstests, I saw failures for
> generic/219, generic 273, and generic/476 --- these errors did not
> show up when running using a standard 4k blocksize on x86, and they
> also did not show up when running dioread_nolock using a 4k blocksize.
> 

Sorry about that. Were these 3 the only tests you saw to be failing,
or there were more?


> So I tried running "generic/219 generic/273 generic/476" 30 times,
> using in a Google Compute Engine VM, using gce-xfstests, and while I
> wasn't able to get generic/219 to fail when run in isolation,
> generic/273 seems to fail quite reliably, and generic/476 about a
> third of the time.
> 
> How much testing have you done with these patches?

I did test "quick" group of xfstests & ltp/fsx tests with the posted
version. And had tested a full suite of xfstests with one of my previous 
version.
I guess I was comparing against 1K blocksize without dioread_nolock.
But as I think more about it, I may need to compare against vanilla
kernel with 1K blocksize even without dioread_nolock. Since there may
be some changes in blocksize < pagesize path with this patch.


Also I see these tests(generic/273 & generic/476) are not part
of quick group. Let me check more about these failing tests at my end.

Thanks for your inputs.

-ritesh


> 
> Thanks,
> 
> 							- Ted
> 
> TESTRUNID: tytso-20191023144956
> KERNEL:    kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64
> CMDLINE:   --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476
> CPUS:      2
> MEM:       7680
> 
> ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds
>    Failures: generic/273 generic/273 generic/273 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/476 generic/273 generic/476 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/273 generic/273 generic/476 generic/273
>      generic/273 generic/273 generic/273 generic/476 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/273 generic/273 generic/273 generic/273
>      generic/476 generic/273 generic/476
> Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s
>

Ritesh Harjani Oct. 29, 2019, 7:19 a.m. UTC | #3

Hello Ted,

I tried reproducing issue generic/273 & generic/476 on vanilla 5.4-rc4.
I could see that both of these could be very well reproduced with same
symptoms on vanilla 5.4-rc4 as well with 1K blocksize on x86
(i.e. without this patch series).

Although when I tried reproducing generic/273 & generic/476 on ppc64le
(64K pagesize & 4K blocksize) both with & without these patches,
I could not reproduce this issue on this platform.

As of now, I haven't gone deep into analyzing failure for generic/476,
but generic/273 seems to be failing with 1K blocksize because of
"No space left in device". This is happening, since the free inode count
is exhausted (mostly only with 1K blocksize).
I will have to look into this on whether it needs any xfstest fixing or 
if there is something else going on.

So it looks like these failed tests does not seem to be because of this
patch series. But these are broken in general for at least 1K blocksize.

Also as an FYI, it seems generic/388 is also broken for blocksize <
pagesize for both platforms. So I will be planning to look into these 
failures (generic/273 generic/388 generic/476) in parallel.

Do you think we can work on these failing tests separately, since it 
does not look to be failing because of these patches and go ahead in
reviewing this current patch series?

-ritesh

On 10/24/19 4:56 AM, Theodore Y. Ts'o wrote:
> Hi Ritesh,
> 
> I haven't had a chance to dig into the test failures yet, but FYI....
> when I ran the auto test group in xfstests, I saw failures for
> generic/219, generic 273, and generic/476 --- these errors did not
> show up when running using a standard 4k blocksize on x86, and they
> also did not show up when running dioread_nolock using a 4k blocksize.
> 
> So I tried running "generic/219 generic/273 generic/476" 30 time
> using in a Google Compute Engine VM, using gce-xfstests, and while I
> wasn't able to get generic/219 to fail when run in isolation,
> generic/273 seems to fail quite reliably, and generic/476 about a
> third of the time.
> 
> How much testing have you done with these patches?
> 
> Thanks,
> 
> 							- Ted
> 
> TESTRUNID: tytso-20191023144956
> KERNEL:    kernel 5.4.0-rc3-xfstests-00005-g39b811602906 #1244 SMP Wed Oct 23 11:30:25 EDT 2019 x86_64
> CMDLINE:   --update-files -C 30 -c dioread_nolock_1k generic/219 generic/273 generic/476
> CPUS:      2
> MEM:       7680
> 
> ext4/dioread_nolock_1k: 90 tests, 42 failures, 10434 seconds
>    Failures: generic/273 generic/273 generic/273 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/476 generic/273 generic/476 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/273 generic/273 generic/476 generic/273
>      generic/273 generic/273 generic/273 generic/476 generic/273
>      generic/476 generic/273 generic/476 generic/273 generic/273
>      generic/273 generic/273 generic/273 generic/273 generic/273
>      generic/476 generic/273 generic/476
> Totals: 90 tests, 0 skipped, 42 failures, 0 errors, 10434s
>

Theodore Ts'o Nov. 3, 2019, 7:16 p.m. UTC | #4

On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
> 
> So it looks like these failed tests does not seem to be because of this
> patch series. But these are broken in general for at least 1K blocksize.

Agreed, I failed to add them to the exclude list for diread_nolock_1k.  
Thanks for pointing that out!   

After looking through these patches, it looks good.  So, I've landed
this series on the ext4 git tree.

There are some potential conflicts with Matthew's DIO using imap patch
set.  I tried resolving them in the obvious way (see the tt/mb-dio
branch[1] on ext4.git), and unfortunately, there is a flaky test
failure with generic/270 --- 2 times out 30 runs of generic/270, the
file system is left inconsistent, with problems found in the block
allocation bitmap.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio

I've verified that generic/270 isn't a problem on -rc3, and it's not a
problem with just your patch series.  So, it's almost certain it's
because I screwed up the merge.  I applied each of Matthew's patch one
at a time, and conflict was in changes in ext4_end_io_dio, which is
dropped in Matthew's patch.  It wasn't obvious though where the
dioread-nolock-1k change should be applied in Matthew's patch series.
Could you take a look?  Thanks!!

> Also as an FYI, it seems generic/388 is also broken for blocksize <
> pagesize for both platforms. So I will be planning to look into these
> failures (generic/273 generic/388 generic/476) in parallel.

generic/388 is broken in a flaky fashion on all of the tests.  That's
a shutdown test, and it's a known issue, having to do with how we
forcibly shut down the journalling machinery not being quite right.
Since unclean power off and/or dropped fibre channel/iSCSI case is
handled correctly, this hasn't been high on the priority list to track
down and fix.

> Do you think we can work on these failing tests separately, since it does
> not look to be failing because of these patches and go ahead in
> reviewing this current patch series?

Oh, I agree, those failures are pre-existing test failures, and so it
shouldn't and doens't block this series.

Thanks for your work on this feature!

					- Ted

Matthew Bobrowski Nov. 4, 2019, 10:16 a.m. UTC | #5

On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote:
> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
> > 
> > So it looks like these failed tests does not seem to be because of this
> > patch series. But these are broken in general for at least 1K blocksize.
> 
> Agreed, I failed to add them to the exclude list for diread_nolock_1k.  
> Thanks for pointing that out!   
> 
> After looking through these patches, it looks good.  So, I've landed
> this series on the ext4 git tree.
> 
> There are some potential conflicts with Matthew's DIO using imap patch
> set.  I tried resolving them in the obvious way (see the tt/mb-dio
> branch[1] on ext4.git), and unfortunately, there is a flaky test
> failure with generic/270 --- 2 times out 30 runs of generic/270, the
> file system is left inconsistent, with problems found in the block
> allocation bitmap.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio
> 
> I've verified that generic/270 isn't a problem on -rc3, and it's not a
> problem with just your patch series.  So, it's almost certain it's
> because I screwed up the merge.  I applied each of Matthew's patch one
> at a time, and conflict was in changes in ext4_end_io_dio, which is
> dropped in Matthew's patch.  It wasn't obvious though where the
> dioread-nolock-1k change should be applied in Matthew's patch series.
> Could you take a look?  Thanks!!

Hang on a second.

Are we not prematurely merging this series in with master? I thought
that this is something that should've come after the iomap direct I/O
port, no? The use of io_end's within the new direct I/O implementation
are effectively redundant...

/M

Ritesh Harjani Nov. 4, 2019, 10:37 a.m. UTC | #6

On 11/4/19 3:46 PM, Matthew Bobrowski wrote:
> On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote:
>> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
>>>
>>> So it looks like these failed tests does not seem to be because of this
>>> patch series. But these are broken in general for at least 1K blocksize.
>>
>> Agreed, I failed to add them to the exclude list for diread_nolock_1k.
>> Thanks for pointing that out!
>>
>> After looking through these patches, it looks good.  So, I've landed
>> this series on the ext4 git tree.
>>
>> There are some potential conflicts with Matthew's DIO using imap patch
>> set.  I tried resolving them in the obvious way (see the tt/mb-dio
>> branch[1] on ext4.git), and unfortunately, there is a flaky test
>> failure with generic/270 --- 2 times out 30 runs of generic/270, the
>> file system is left inconsistent, with problems found in the block
>> allocation bitmap.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio
>>
>> I've verified that generic/270 isn't a problem on -rc3, and it's not a
>> problem with just your patch series.  So, it's almost certain it's
>> because I screwed up the merge.  I applied each of Matthew's patch one
>> at a time, and conflict was in changes in ext4_end_io_dio, which is
>> dropped in Matthew's patch.  It wasn't obvious though where the
>> dioread-nolock-1k change should be applied in Matthew's patch series.
>> Could you take a look?  Thanks!!
> 
> Hang on a second.
> 
> Are we not prematurely merging this series in with master? I thought
> that this is something that should've come after the iomap direct I/O
> port, no? The use of io_end's within the new direct I/O implementation
> are effectively redundant...

It sure may be giving a merge conflict (due to io_end structure).
But this dioread_nolock series was not dependent over iomap series.

-ritesh

Ritesh Harjani Nov. 4, 2019, 10:43 a.m. UTC | #7

On 11/4/19 12:46 AM, Theodore Y. Ts'o wrote:
> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
>>
>> So it looks like these failed tests does not seem to be because of this
>> patch series. But these are broken in general for at least 1K blocksize.
> 
> Agreed, I failed to add them to the exclude list for diread_nolock_1k.
> Thanks for pointing that out!
> 
> After looking through these patches, it looks good.  So, I've landed
> this series on the ext4 git tree.
> 
> There are some potential conflicts with Matthew's DIO using imap patch
> set.  I tried resolving them in the obvious way (see the tt/mb-dio
> branch[1] on ext4.git), and unfortunately, there is a flaky test
> failure with generic/270 --- 2 times out 30 runs of generic/270, the
> file system is left inconsistent, with problems found in the block
> allocation bitmap.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio
> 
> I've verified that generic/270 isn't a problem on -rc3, and it's not a
> problem with just your patch series.  So, it's almost certain it's
> because I screwed up the merge.  I applied each of Matthew's patch one
> at a time, and conflict was in changes in ext4_end_io_dio, which is
> dropped in Matthew's patch.  It wasn't obvious though where the
> dioread-nolock-1k change should be applied in Matthew's patch series.
> Could you take a look?  Thanks!!

Sure. Let me take a look at this and get back.
Meanwhile, if possible could you please help with what xfstest config is
failing and the failure details, if possible. Just curious to know
about it.

-ritesh

Matthew Bobrowski Nov. 4, 2019, 10:49 a.m. UTC | #8

On Mon, Nov 04, 2019 at 04:07:56PM +0530, Ritesh Harjani wrote:
> On 11/4/19 3:46 PM, Matthew Bobrowski wrote:
> > On Sun, Nov 03, 2019 at 02:16:06PM -0500, Theodore Y. Ts'o wrote:
> > > On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
> > > > 
> > > > So it looks like these failed tests does not seem to be because of this
> > > > patch series. But these are broken in general for at least 1K blocksize.
> > > 
> > > Agreed, I failed to add them to the exclude list for diread_nolock_1k.
> > > Thanks for pointing that out!
> > > 
> > > After looking through these patches, it looks good.  So, I've landed
> > > this series on the ext4 git tree.
> > > 
> > > There are some potential conflicts with Matthew's DIO using imap patch
> > > set.  I tried resolving them in the obvious way (see the tt/mb-dio
> > > branch[1] on ext4.git), and unfortunately, there is a flaky test
> > > failure with generic/270 --- 2 times out 30 runs of generic/270, the
> > > file system is left inconsistent, with problems found in the block
> > > allocation bitmap.
> > > 
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio
> > > 
> > > I've verified that generic/270 isn't a problem on -rc3, and it's not a
> > > problem with just your patch series.  So, it's almost certain it's
> > > because I screwed up the merge.  I applied each of Matthew's patch one
> > > at a time, and conflict was in changes in ext4_end_io_dio, which is
> > > dropped in Matthew's patch.  It wasn't obvious though where the
> > > dioread-nolock-1k change should be applied in Matthew's patch series.
> > > Could you take a look?  Thanks!!
> > 
> > Hang on a second.
> > 
> > Are we not prematurely merging this series in with master? I thought
> > that this is something that should've come after the iomap direct I/O
> > port, no? The use of io_end's within the new direct I/O implementation
> > are effectively redundant...
> 
> It sure may be giving a merge conflict (due to io_end structure).
> But this dioread_nolock series was not dependent over iomap series.

Uh ha. Well, there's been a chunk of code injected into
ext4_end_io_dio() here and by me removing it, I'm not entirely sure
what the downstream effects will be for this specific change...

/M

Ritesh Harjani Nov. 4, 2019, 11:59 a.m. UTC | #9

On 11/4/19 4:13 PM, Ritesh Harjani wrote:
> 
> 
> On 11/4/19 12:46 AM, Theodore Y. Ts'o wrote:
>> On Tue, Oct 29, 2019 at 12:49:24PM +0530, Ritesh Harjani wrote:
>>>
>>> So it looks like these failed tests does not seem to be because of this
>>> patch series. But these are broken in general for at least 1K blocksize.
>>
>> Agreed, I failed to add them to the exclude list for diread_nolock_1k.
>> Thanks for pointing that out!
>>
>> After looking through these patches, it looks good.  So, I've landed
>> this series on the ext4 git tree.
>>
>> There are some potential conflicts with Matthew's DIO using imap patch
>> set.  I tried resolving them in the obvious way (see the tt/mb-dio
>> branch[1] on ext4.git), and unfortunately, there is a flaky test
>> failure with generic/270 --- 2 times out 30 runs of generic/270, the
>> file system is left inconsistent, with problems found in the block
>> allocation bitmap.
>>
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=tt/mb-dio 
>>
>>
>> I've verified that generic/270 isn't a problem on -rc3, and it's not a
>> problem with just your patch series.  So, it's almost certain it's
>> because I screwed up the merge.  I applied each of Matthew's patch one
>> at a time, and conflict was in changes in ext4_end_io_dio, which is
>> dropped in Matthew's patch.  It wasn't obvious though where the
>> dioread-nolock-1k change should be applied in Matthew's patch series.
>> Could you take a look?  Thanks!!

Looked into the above mentioned branch.
I see the patches of iomap patch series were properly
applied. It is ok to completely remove function "ext4_end_io_dio" and
directly call for "ext4_convert_unwritten_extents" from 
"ext4_dio_write_end_io" (which is also done by default).

Actually io_end_vec in dioread_nolock series was used for AIO
DIO, since AIO+DIO was already using io_end structure and all
io_end usage was moved to use io_end_vec.
But since in iomap, we don't use io_end structure so it is ok
to directly call for ext4_convert_unwritten_extents with offset &
size argument.

hmm, I will try and run this generic/270 at my end to see what is going
wrong with this. It would be good if you could share the xfstest config
under which this is failing & if possible the failed logs/report.

Meanwhile, I will test generic/270 on your given branch with
dioread_nolock & 1K bs configuration.

-ritesh

Theodore Ts'o Nov. 4, 2019, 4:08 p.m. UTC | #10

On Mon, Nov 04, 2019 at 09:49:14PM +1100, Matthew Bobrowski wrote:
> > It sure may be giving a merge conflict (due to io_end structure).
> > But this dioread_nolock series was not dependent over iomap series.
> 
> Uh ha. Well, there's been a chunk of code injected into
> ext4_end_io_dio() here and by me removing it, I'm not entirely sure
> what the downstream effects will be for this specific change...

Yeah, that was probably my failure to do the merge correctly; I'm
hoping that Ritesh will be able to fix that up.  If not we can throw
an "experimental" config to enable dioread_nolock on subpage
blocksizes, just to warn people that under some extreme workloads,
they might end up corrupting their allocation bitmap, which then might
lead to data loss.  I suspect it would actually work fine for most
users; but out of paranoia, if we can't figure out the generic/270
failure before the merge window, we can just make dioread_nolock_1k
experimental for now.

  	      	     	       		- Ted

Jan Kara Nov. 6, 2019, 5:23 p.m. UTC | #11

On Wed 16-10-19 13:07:06, Ritesh Harjani wrote:
> This patch series adds the support for blocksize < pagesize for
> dioread_nolock feature.
> 
> Since in case of blocksize < pagesize, we can have multiple
> small buffers of page as unwritten extents, we need to
> maintain a vector of these unwritten extents which needs
> the conversion after the IO is complete. Thus, we maintain
> a list of tuple <offset, size> pair (io_end_vec) for this &
> traverse this list to do the unwritten to written conversion.
> 
> Appreciate any reviews/comments on this patches.

I know Ted has merged the patches already so this is just informational but
I've read the patches and they look fine to me. Thanks for the work! I was
just thinking that we could actually make the vector tracking more
efficient because the io_end always looks like:

one-big-extent-to-fully-write + whatever it takes to fully write out the
last page

So your vectors could be also expressed as "extent to write" + bitmap of
blocks written in the last page. And 64-bits are enough for the bitmap for
anything ext4 supports so we could easily save allocation of ioend_vec etc.
Just a suggestion.

								Honza

Ritesh Harjani Nov. 7, 2019, 11:15 a.m. UTC | #12

On 11/6/19 10:53 PM, Jan Kara wrote:
> On Wed 16-10-19 13:07:06, Ritesh Harjani wrote:
>> This patch series adds the support for blocksize < pagesize for
>> dioread_nolock feature.
>>
>> Since in case of blocksize < pagesize, we can have multiple
>> small buffers of page as unwritten extents, we need to
>> maintain a vector of these unwritten extents which needs
>> the conversion after the IO is complete. Thus, we maintain
>> a list of tuple <offset, size> pair (io_end_vec) for this &
>> traverse this list to do the unwritten to written conversion.
>>
>> Appreciate any reviews/comments on this patches.
> 
> I know Ted has merged the patches already so this is just informational but
> I've read the patches and they look fine to me. Thanks for the work! I was

Appreciate your help too for valuable feedback & pointers at various places.


> just thinking that we could actually make the vector tracking more
> efficient because the io_end always looks like:
> 
> one-big-extent-to-fully-write + whatever it takes to fully write out the
> last page
> 
> So your vectors could be also expressed as "extent to write" + bitmap of
> blocks written in the last page. And 64-bits are enough for the bitmap for
> anything ext4 supports so we could easily save allocation of ioend_vec etc.
> Just a suggestion.

Yes, sounds good to me. Although slab allocations are also fast.
However I agree this should be more efficient and will also avoid the
list management and or list pointer traversal.

Sure, will work over this optimization once I get some closure on some
ongoing open items.


Thanks & appreciate your feedback.
ritesh

[RFC,0/5] Ext4: Add support for blocksize < pagesize for dioread_nolock

Message

Comments