diff mbox series

[RFCv2,1/3] iomap: Move creation of iomap_page early in __iomap_write_begin

Message ID d879704250b5f890a755873aefe3171cbd193ae9.1675093524.git.ritesh.list@gmail.com (mailing list archive)
State New, archived
Headers show
Series iomap: Add support for subpage dirty state tracking to improve write performance | expand

Commit Message

Ritesh Harjani (IBM) Jan. 30, 2023, 4:14 p.m. UTC
Before this commit[1], we used to call iomap_page_create() before
checking folio_test_uptodate() in __iomap_write_begin().

The problem is that commit[1] moved iop creation later i.e. after checking for
whether the folio is uptodate. And if the folio is uptodate, it simply
returns and doesn't allocate a iop.
Now what can happen is that during __iomap_write_begin() for bs < ps,
there can be a folio which is marked uptodate but does not have a iomap_page
structure allocated.
(I think one of the reason it can happen is due to memory pressure, we
can end up freeing folio->private resource).

Thus the iop structure will only gets allocated at the time of writeback
in iomap_writepage_map(). This I think, was a not problem till now since
we anyway only track uptodate status in iop (no support of tracking
dirty bitmap status which later patches will add), and we also end up
setting all the bits in iomap_page_create(), if the page is uptodate.

[1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Jan. 30, 2023, 5:02 p.m. UTC | #1
On Mon, Jan 30, 2023 at 09:44:11PM +0530, Ritesh Harjani (IBM) wrote:
> The problem is that commit[1] moved iop creation later i.e. after checking for
> whether the folio is uptodate. And if the folio is uptodate, it simply
> returns and doesn't allocate a iop.
> Now what can happen is that during __iomap_write_begin() for bs < ps,
> there can be a folio which is marked uptodate but does not have a iomap_page
> structure allocated.
> (I think one of the reason it can happen is due to memory pressure, we
> can end up freeing folio->private resource).
> 
> Thus the iop structure will only gets allocated at the time of writeback
> in iomap_writepage_map(). This I think, was a not problem till now since
> we anyway only track uptodate status in iop (no support of tracking
> dirty bitmap status which later patches will add), and we also end up
> setting all the bits in iomap_page_create(), if the page is uptodate.

delayed iop allocation is a feature and not a bug.  We might have to
refine the criteria for sub-page dirty tracking, but in general having
the iop allocates is a memory and performance overhead and should be
avoided as much as possible.  In fact I still have some unfinished
work to allocate it even more lazily.
Ritesh Harjani (IBM) Jan. 30, 2023, 8:21 p.m. UTC | #2
On 23/01/30 09:02AM, Christoph Hellwig wrote:
> On Mon, Jan 30, 2023 at 09:44:11PM +0530, Ritesh Harjani (IBM) wrote:
> > The problem is that commit[1] moved iop creation later i.e. after checking for
> > whether the folio is uptodate. And if the folio is uptodate, it simply
> > returns and doesn't allocate a iop.
> > Now what can happen is that during __iomap_write_begin() for bs < ps,
> > there can be a folio which is marked uptodate but does not have a iomap_page
> > structure allocated.
> > (I think one of the reason it can happen is due to memory pressure, we
> > can end up freeing folio->private resource).
> >
> > Thus the iop structure will only gets allocated at the time of writeback
> > in iomap_writepage_map(). This I think, was a not problem till now since
> > we anyway only track uptodate status in iop (no support of tracking
> > dirty bitmap status which later patches will add), and we also end up
> > setting all the bits in iomap_page_create(), if the page is uptodate.
>
> delayed iop allocation is a feature and not a bug.  We might have to
> refine the criteria for sub-page dirty tracking, but in general having
> the iop allocates is a memory and performance overhead and should be
> avoided as much as possible.  In fact I still have some unfinished
> work to allocate it even more lazily.

So, what I meant here was that the commit[1] chaged the behavior/functionality
without indenting to. I agree it's not a bug.
But when I added dirty bitmap tracking support, I couldn't understand for
sometime on why were we allocating iop only at the time of writeback.
And it was due to a small line change which somehow slipped into this commit [1].
Hence I made this as a seperate patch so that it doesn't slip through again w/o
getting noticed/review.

Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
with subpage dirty tracking in iop->state[], if we end up allocating iop only
at the time of writeback, than that might cause some performance degradation
compared to, if we allocat iop at ->write_begin() and mark the required dirty
bit ranges in ->write_end(). Like how we do in this patch series.
(Ofcourse it is true only for bs < ps use case).

[1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/


Thanks again for your quick review!!
-ritesh
Matthew Wilcox Jan. 30, 2023, 9 p.m. UTC | #3
On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > Thus the iop structure will only gets allocated at the time of writeback
> > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > we anyway only track uptodate status in iop (no support of tracking
> > > dirty bitmap status which later patches will add), and we also end up
> > > setting all the bits in iomap_page_create(), if the page is uptodate.
> >
> > delayed iop allocation is a feature and not a bug.  We might have to
> > refine the criteria for sub-page dirty tracking, but in general having
> > the iop allocates is a memory and performance overhead and should be
> > avoided as much as possible.  In fact I still have some unfinished
> > work to allocate it even more lazily.
> 
> So, what I meant here was that the commit[1] chaged the behavior/functionality
> without indenting to. I agree it's not a bug.

It didn't change the behaviour or functionality.  It broke your patches,
but it certainly doesn't deserve its own commit reverting it -- because
it's not wrong.

> But when I added dirty bitmap tracking support, I couldn't understand for
> sometime on why were we allocating iop only at the time of writeback.
> And it was due to a small line change which somehow slipped into this commit [1].
> Hence I made this as a seperate patch so that it doesn't slip through again w/o
> getting noticed/review.

It didn't "slip through".  It was intended.

> Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> with subpage dirty tracking in iop->state[], if we end up allocating iop only
> at the time of writeback, than that might cause some performance degradation
> compared to, if we allocat iop at ->write_begin() and mark the required dirty
> bit ranges in ->write_end(). Like how we do in this patch series.
> (Ofcourse it is true only for bs < ps use case).
> 
> [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/

You absolutely can allocate it in iomap_write_begin, but you can avoid
allocating it until writeback time if (pos, len) entirely overlap the
folio.  ie:

	if (pos > folio_pos(folio) ||
	    pos + len < folio_pos(folio) + folio_size(folio))
		iop = iomap_page_create(iter->inode, folio, iter->flags, false);
Ritesh Harjani (IBM) Jan. 31, 2023, 6:37 p.m. UTC | #4
On 23/01/30 09:00PM, Matthew Wilcox wrote:
> On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > > Thus the iop structure will only gets allocated at the time of writeback
> > > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > > we anyway only track uptodate status in iop (no support of tracking
> > > > dirty bitmap status which later patches will add), and we also end up
> > > > setting all the bits in iomap_page_create(), if the page is uptodate.
> > >
> > > delayed iop allocation is a feature and not a bug.  We might have to
> > > refine the criteria for sub-page dirty tracking, but in general having
> > > the iop allocates is a memory and performance overhead and should be
> > > avoided as much as possible.  In fact I still have some unfinished
> > > work to allocate it even more lazily.
> >
> > So, what I meant here was that the commit[1] chaged the behavior/functionality
> > without indenting to. I agree it's not a bug.
>
> It didn't change the behaviour or functionality.  It broke your patches,
> but it certainly doesn't deserve its own commit reverting it -- because
> it's not wrong.
>
> > But when I added dirty bitmap tracking support, I couldn't understand for
> > sometime on why were we allocating iop only at the time of writeback.
> > And it was due to a small line change which somehow slipped into this commit [1].
> > Hence I made this as a seperate patch so that it doesn't slip through again w/o
> > getting noticed/review.
>
> It didn't "slip through".  It was intended.
>
> > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> > with subpage dirty tracking in iop->state[], if we end up allocating iop only
> > at the time of writeback, than that might cause some performance degradation
> > compared to, if we allocat iop at ->write_begin() and mark the required dirty
> > bit ranges in ->write_end(). Like how we do in this patch series.
> > (Ofcourse it is true only for bs < ps use case).
> >
> > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/
>
> You absolutely can allocate it in iomap_write_begin, but you can avoid
> allocating it until writeback time if (pos, len) entirely overlap the
> folio.  ie:
>
> 	if (pos > folio_pos(folio) ||
> 	    pos + len < folio_pos(folio) + folio_size(folio))
> 		iop = iomap_page_create(iter->inode, folio, iter->flags, false);

Thanks for the suggestion. However do you think it will be better if this is
introduced along with lazy allocation changes which Christoph was mentioning
about?
Why I am thinking that is because, with above approach we delay the allocation
of iop until writeback, for entire folio overlap case. But then later
in __iomap_write_begin(), we require iop if folio is not uptodate.
Hence we again will have to do some checks to see if the iop is not allocated
then allocate it (which is for entire folio overlap case).
That somehow looked like an overkill for a very little gain in the context of
this patch series. Kindly let me know your thoughts on this.

-ritesh
Matthew Wilcox Jan. 31, 2023, 6:48 p.m. UTC | #5
On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote:
> On 23/01/30 09:00PM, Matthew Wilcox wrote:
> > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > > > Thus the iop structure will only gets allocated at the time of writeback
> > > > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > > > we anyway only track uptodate status in iop (no support of tracking
> > > > > dirty bitmap status which later patches will add), and we also end up
> > > > > setting all the bits in iomap_page_create(), if the page is uptodate.
> > > >
> > > > delayed iop allocation is a feature and not a bug.  We might have to
> > > > refine the criteria for sub-page dirty tracking, but in general having
> > > > the iop allocates is a memory and performance overhead and should be
> > > > avoided as much as possible.  In fact I still have some unfinished
> > > > work to allocate it even more lazily.
> > >
> > > So, what I meant here was that the commit[1] chaged the behavior/functionality
> > > without indenting to. I agree it's not a bug.
> >
> > It didn't change the behaviour or functionality.  It broke your patches,
> > but it certainly doesn't deserve its own commit reverting it -- because
> > it's not wrong.
> >
> > > But when I added dirty bitmap tracking support, I couldn't understand for
> > > sometime on why were we allocating iop only at the time of writeback.
> > > And it was due to a small line change which somehow slipped into this commit [1].
> > > Hence I made this as a seperate patch so that it doesn't slip through again w/o
> > > getting noticed/review.
> >
> > It didn't "slip through".  It was intended.
> >
> > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> > > with subpage dirty tracking in iop->state[], if we end up allocating iop only
> > > at the time of writeback, than that might cause some performance degradation
> > > compared to, if we allocat iop at ->write_begin() and mark the required dirty
> > > bit ranges in ->write_end(). Like how we do in this patch series.
> > > (Ofcourse it is true only for bs < ps use case).
> > >
> > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/
> >
> > You absolutely can allocate it in iomap_write_begin, but you can avoid
> > allocating it until writeback time if (pos, len) entirely overlap the
> > folio.  ie:
> >
> > 	if (pos > folio_pos(folio) ||
> > 	    pos + len < folio_pos(folio) + folio_size(folio))
> > 		iop = iomap_page_create(iter->inode, folio, iter->flags, false);
> 
> Thanks for the suggestion. However do you think it will be better if this is
> introduced along with lazy allocation changes which Christoph was mentioning
> about?
> Why I am thinking that is because, with above approach we delay the allocation
> of iop until writeback, for entire folio overlap case. But then later
> in __iomap_write_begin(), we require iop if folio is not uptodate.
> Hence we again will have to do some checks to see if the iop is not allocated
> then allocate it (which is for entire folio overlap case).
> That somehow looked like an overkill for a very little gain in the context of
> this patch series. Kindly let me know your thoughts on this.

Look at *why* __iomap_write_begin() allocates an iop.  It's to read in the
blocks which are going to be partially-overwritten by the write.  If the
write overlaps the entire folio, there are no parts which need to be read
in, and we can simply return.  Maybe we should make that more obvious:

	if (folio_test_uptodate(folio))
		return 0;
	if (pos <= folio_pos(folio) &&
	    pos + len >= folio_pos(folio) + folio_size(folio))
		return 0;
	folio_clear_error(folio);

(I think pos must always be >= folio_pos(), so that <= could be ==, but
it doesn't hurt anything to use <=)
Ritesh Harjani (IBM) Jan. 31, 2023, 8 p.m. UTC | #6
On 23/01/31 06:48PM, Matthew Wilcox wrote:
> On Wed, Feb 01, 2023 at 12:07:25AM +0530, Ritesh Harjani (IBM) wrote:
> > On 23/01/30 09:00PM, Matthew Wilcox wrote:
> > > On Tue, Jan 31, 2023 at 01:51:50AM +0530, Ritesh Harjani (IBM) wrote:
> > > > > > Thus the iop structure will only gets allocated at the time of writeback
> > > > > > in iomap_writepage_map(). This I think, was a not problem till now since
> > > > > > we anyway only track uptodate status in iop (no support of tracking
> > > > > > dirty bitmap status which later patches will add), and we also end up
> > > > > > setting all the bits in iomap_page_create(), if the page is uptodate.
> > > > >
> > > > > delayed iop allocation is a feature and not a bug.  We might have to
> > > > > refine the criteria for sub-page dirty tracking, but in general having
> > > > > the iop allocates is a memory and performance overhead and should be
> > > > > avoided as much as possible.  In fact I still have some unfinished
> > > > > work to allocate it even more lazily.
> > > >
> > > > So, what I meant here was that the commit[1] chaged the behavior/functionality
> > > > without indenting to. I agree it's not a bug.
> > >
> > > It didn't change the behaviour or functionality.  It broke your patches,
> > > but it certainly doesn't deserve its own commit reverting it -- because
> > > it's not wrong.
> > >
> > > > But when I added dirty bitmap tracking support, I couldn't understand for
> > > > sometime on why were we allocating iop only at the time of writeback.
> > > > And it was due to a small line change which somehow slipped into this commit [1].
> > > > Hence I made this as a seperate patch so that it doesn't slip through again w/o
> > > > getting noticed/review.
> > >
> > > It didn't "slip through".  It was intended.
> > >
> > > > Thanks for the info on the lazy allocation work. Yes, though it is not a bug, but
> > > > with subpage dirty tracking in iop->state[], if we end up allocating iop only
> > > > at the time of writeback, than that might cause some performance degradation
> > > > compared to, if we allocat iop at ->write_begin() and mark the required dirty
> > > > bit ranges in ->write_end(). Like how we do in this patch series.
> > > > (Ofcourse it is true only for bs < ps use case).
> > > >
> > > > [1]: https://lore.kernel.org/all/20220623175157.1715274-5-shr@fb.com/
> > >
> > > You absolutely can allocate it in iomap_write_begin, but you can avoid
> > > allocating it until writeback time if (pos, len) entirely overlap the
> > > folio.  ie:
> > >
> > > 	if (pos > folio_pos(folio) ||
> > > 	    pos + len < folio_pos(folio) + folio_size(folio))
> > > 		iop = iomap_page_create(iter->inode, folio, iter->flags, false);
> >
> > Thanks for the suggestion. However do you think it will be better if this is
> > introduced along with lazy allocation changes which Christoph was mentioning
> > about?
> > Why I am thinking that is because, with above approach we delay the allocation
> > of iop until writeback, for entire folio overlap case. But then later
> > in __iomap_write_begin(), we require iop if folio is not uptodate.
> > Hence we again will have to do some checks to see if the iop is not allocated
> > then allocate it (which is for entire folio overlap case).
> > That somehow looked like an overkill for a very little gain in the context of
> > this patch series. Kindly let me know your thoughts on this.
>
> Look at *why* __iomap_write_begin() allocates an iop.  It's to read in the
> blocks which are going to be partially-overwritten by the write.  If the
> write overlaps the entire folio, there are no parts which need to be read
> in, and we can simply return.

Yes that make sense.

> Maybe we should make that more obvious:

Yes, I think this maybe required. Because otherwise we might end up using
uninitialized iop. generic/156 (funshare), can easily trigger that.
Will spend sometime on the unshare path of iomap.

>
> 	if (folio_test_uptodate(folio))
> 		return 0;
> 	if (pos <= folio_pos(folio) &&
> 	    pos + len >= folio_pos(folio) + folio_size(folio))
> 		return 0;
> 	folio_clear_error(folio);
>
> (I think pos must always be >= folio_pos(), so that <= could be ==, but
> it doesn't hurt anything to use <=)

Thanks for sharing this.

-ritesh
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 356193e44cf0..e9c85fcf7a1f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -527,7 +527,8 @@  static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 		size_t len, struct folio *folio)
 {
 	const struct iomap *srcmap = iomap_iter_srcmap(iter);
-	struct iomap_page *iop;
+	struct iomap_page *iop = iomap_page_create(iter->inode, folio,
+						   iter->flags);
 	loff_t block_size = i_blocksize(iter->inode);
 	loff_t block_start = round_down(pos, block_size);
 	loff_t block_end = round_up(pos + len, block_size);
@@ -539,7 +540,6 @@  static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 		return 0;
 	folio_clear_error(folio);
 
-	iop = iomap_page_create(iter->inode, folio, iter->flags);
 	if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
 		return -EAGAIN;