Message ID | 20231126124720.1249310-6-hch@lst.de (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [01/13] iomap: clear the per-folio dirty bits on all writeback failures | expand |
Christoph Hellwig <hch@lst.de> writes: > Most of iomap_do_writepage is dedidcated to handling a folio crossing or > beyond i_size. Split this is into a separate helper and update the > commens to deal with folios instead of pages and make them more readable. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/iomap/buffered-io.c | 128 ++++++++++++++++++++--------------------- > 1 file changed, 62 insertions(+), 66 deletions(-) Just a small nit below. But otherwise looks good to me. Please feel free to add - Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 8148e4c9765dac..4a5a21809b0182 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, > wbc_account_cgroup_owner(wbc, &folio->page, len); > } > > +/* > + * Check interaction of the folio with the file end. > + * > + * If the folio is entirely beyond i_size, return false. If it straddles > + * i_size, adjust end_pos and zero all data beyond i_size. > + */ > +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, > + u64 *end_pos) > +{ > + u64 isize = i_size_read(inode); i_size_read(inode) returns loff_t type. Can we make end_pos also as loff_t type. We anyway use loff_t for folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It would be more consistent with the data type then. Thoughts? -ritesh > + > + if (*end_pos > isize) { > + size_t poff = offset_in_folio(folio, isize); > + pgoff_t end_index = isize >> PAGE_SHIFT; > + > + /* > + * If the folio is entirely ouside of i_size, skip it. > + * > + * This can happen due to a truncate operation that is in > + * progress and in that case truncate will finish it off once > + * we've dropped the folio lock. > + * > + * Note that the pgoff_t used for end_index is an unsigned long. > + * If the given offset is greater than 16TB on a 32-bit system, > + * then if we checked if the folio is fully outside i_size with > + * "if (folio->index >= end_index + 1)", "end_index + 1" would > + * overflow and evaluate to 0. Hence this folio would be > + * redirtied and written out repeatedly, which would result in > + * an infinite loop; the user program performing this operation > + * would hang. Instead, we can detect this situation by > + * checking if the folio is totally beyond i_size or if its > + * offset is just equal to the EOF. > + */ > + if (folio->index > end_index || > + (folio->index == end_index && poff == 0)) > + return false; > + > + /* > + * The folio straddles i_size. > + * > + * It must be zeroed out on each and every writepage invocation > + * because it may be mmapped: > + * > + * A file is mapped in multiples of the page size. For a > + * file that is not a multiple of the page size, the > + * remaining memory is zeroed when mapped, and writes to that > + * region are not written out to the file. > + * > + * Also adjust the writeback range to skip all blocks entirely > + * beyond i_size. > + */ > + folio_zero_segment(folio, poff, folio_size(folio)); > + *end_pos = isize; > + } > + > + return true; > +} > + > /* > * We implement an immediate ioend submission policy here to avoid needing to > * chain multiple ioends and hence nest mempool allocations which can violate > @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio, > { > struct iomap_writepage_ctx *wpc = data; > struct inode *inode = folio->mapping->host; > - u64 end_pos, isize; > + u64 end_pos = folio_pos(folio) + folio_size(folio); > > trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); > > - /* > - * Is this folio beyond the end of the file? > - * > - * The folio index is less than the end_index, adjust the end_pos > - * to the highest offset that this folio should represent. > - * ----------------------------------------------------- > - * | file mapping | <EOF> | > - * ----------------------------------------------------- > - * | Page ... | Page N-2 | Page N-1 | Page N | | > - * ^--------------------------------^----------|-------- > - * | desired writeback range | see else | > - * ---------------------------------^------------------| > - */ > - isize = i_size_read(inode); > - end_pos = folio_pos(folio) + folio_size(folio); > - if (end_pos > isize) { > - /* > - * Check whether the page to write out is beyond or straddles > - * i_size or not. > - * ------------------------------------------------------- > - * | file mapping | <EOF> | > - * ------------------------------------------------------- > - * | Page ... | Page N-2 | Page N-1 | Page N | Beyond | > - * ^--------------------------------^-----------|--------- > - * | | Straddles | > - * ---------------------------------^-----------|--------| > - */ > - size_t poff = offset_in_folio(folio, isize); > - pgoff_t end_index = isize >> PAGE_SHIFT; > - > - /* > - * Skip the page if it's fully outside i_size, e.g. > - * due to a truncate operation that's in progress. We've > - * cleaned this page and truncate will finish things off for > - * us. > - * > - * Note that the end_index is unsigned long. If the given > - * offset is greater than 16TB on a 32-bit system then if we > - * checked if the page is fully outside i_size with > - * "if (page->index >= end_index + 1)", "end_index + 1" would > - * overflow and evaluate to 0. Hence this page would be > - * redirtied and written out repeatedly, which would result in > - * an infinite loop; the user program performing this operation > - * would hang. Instead, we can detect this situation by > - * checking if the page is totally beyond i_size or if its > - * offset is just equal to the EOF. > - */ > - if (folio->index > end_index || > - (folio->index == end_index && poff == 0)) > - goto unlock; > - > - /* > - * The page straddles i_size. It must be zeroed out on each > - * and every writepage invocation because it may be mmapped. > - * "A file is mapped in multiples of the page size. For a file > - * that is not a multiple of the page size, the remaining > - * memory is zeroed when mapped, and writes to that region are > - * not written out to the file." > - */ > - folio_zero_segment(folio, poff, folio_size(folio)); > - end_pos = isize; > + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { > + folio_unlock(folio); > + return 0; > } > > return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); > - > -unlock: > - folio_unlock(folio); > - return 0; > } > > int > -- > 2.39.2
Ritesh Harjani (IBM) <ritesh.list@gmail.com> writes: > Christoph Hellwig <hch@lst.de> writes: > >> Most of iomap_do_writepage is dedidcated to handling a folio crossing or >> beyond i_size. Split this is into a separate helper and update the >> commens to deal with folios instead of pages and make them more readable. >> >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> --- >> fs/iomap/buffered-io.c | 128 ++++++++++++++++++++--------------------- >> 1 file changed, 62 insertions(+), 66 deletions(-) > > Just a small nit below. > > But otherwise looks good to me. Please feel free to add - > > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > >> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 8148e4c9765dac..4a5a21809b0182 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, >> wbc_account_cgroup_owner(wbc, &folio->page, len); >> } >> >> +/* >> + * Check interaction of the folio with the file end. >> + * >> + * If the folio is entirely beyond i_size, return false. If it straddles >> + * i_size, adjust end_pos and zero all data beyond i_size. >> + */ >> +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, >> + u64 *end_pos) >> +{ >> + u64 isize = i_size_read(inode); > > i_size_read(inode) returns loff_t type. Can we make end_pos also as > loff_t type. We anyway use loff_t for > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It > would be more consistent with the data type then. > > Thoughts? aah, that might also require to change the types in iomap_writepage_map(). So I guess the data type consistency change should be a follow up change as this patch does only the refactoring. > > -ritesh > > >> + >> + if (*end_pos > isize) { >> + size_t poff = offset_in_folio(folio, isize); >> + pgoff_t end_index = isize >> PAGE_SHIFT; >> + >> + /* >> + * If the folio is entirely ouside of i_size, skip it. >> + * >> + * This can happen due to a truncate operation that is in >> + * progress and in that case truncate will finish it off once >> + * we've dropped the folio lock. >> + * >> + * Note that the pgoff_t used for end_index is an unsigned long. >> + * If the given offset is greater than 16TB on a 32-bit system, >> + * then if we checked if the folio is fully outside i_size with >> + * "if (folio->index >= end_index + 1)", "end_index + 1" would >> + * overflow and evaluate to 0. Hence this folio would be >> + * redirtied and written out repeatedly, which would result in >> + * an infinite loop; the user program performing this operation >> + * would hang. Instead, we can detect this situation by >> + * checking if the folio is totally beyond i_size or if its >> + * offset is just equal to the EOF. >> + */ >> + if (folio->index > end_index || >> + (folio->index == end_index && poff == 0)) >> + return false; >> + >> + /* >> + * The folio straddles i_size. >> + * >> + * It must be zeroed out on each and every writepage invocation >> + * because it may be mmapped: >> + * >> + * A file is mapped in multiples of the page size. For a >> + * file that is not a multiple of the page size, the >> + * remaining memory is zeroed when mapped, and writes to that >> + * region are not written out to the file. >> + * >> + * Also adjust the writeback range to skip all blocks entirely >> + * beyond i_size. >> + */ >> + folio_zero_segment(folio, poff, folio_size(folio)); >> + *end_pos = isize; >> + } >> + >> + return true; >> +} >> + >> /* >> * We implement an immediate ioend submission policy here to avoid needing to >> * chain multiple ioends and hence nest mempool allocations which can violate >> @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio, >> { >> struct iomap_writepage_ctx *wpc = data; >> struct inode *inode = folio->mapping->host; >> - u64 end_pos, isize; >> + u64 end_pos = folio_pos(folio) + folio_size(folio); >> >> trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); >> >> - /* >> - * Is this folio beyond the end of the file? >> - * >> - * The folio index is less than the end_index, adjust the end_pos >> - * to the highest offset that this folio should represent. >> - * ----------------------------------------------------- >> - * | file mapping | <EOF> | >> - * ----------------------------------------------------- >> - * | Page ... | Page N-2 | Page N-1 | Page N | | >> - * ^--------------------------------^----------|-------- >> - * | desired writeback range | see else | >> - * ---------------------------------^------------------| >> - */ >> - isize = i_size_read(inode); >> - end_pos = folio_pos(folio) + folio_size(folio); >> - if (end_pos > isize) { >> - /* >> - * Check whether the page to write out is beyond or straddles >> - * i_size or not. >> - * ------------------------------------------------------- >> - * | file mapping | <EOF> | >> - * ------------------------------------------------------- >> - * | Page ... | Page N-2 | Page N-1 | Page N | Beyond | >> - * ^--------------------------------^-----------|--------- >> - * | | Straddles | >> - * ---------------------------------^-----------|--------| >> - */ >> - size_t poff = offset_in_folio(folio, isize); >> - pgoff_t end_index = isize >> PAGE_SHIFT; >> - >> - /* >> - * Skip the page if it's fully outside i_size, e.g. >> - * due to a truncate operation that's in progress. We've >> - * cleaned this page and truncate will finish things off for >> - * us. >> - * >> - * Note that the end_index is unsigned long. If the given >> - * offset is greater than 16TB on a 32-bit system then if we >> - * checked if the page is fully outside i_size with >> - * "if (page->index >= end_index + 1)", "end_index + 1" would >> - * overflow and evaluate to 0. Hence this page would be >> - * redirtied and written out repeatedly, which would result in >> - * an infinite loop; the user program performing this operation >> - * would hang. Instead, we can detect this situation by >> - * checking if the page is totally beyond i_size or if its >> - * offset is just equal to the EOF. >> - */ >> - if (folio->index > end_index || >> - (folio->index == end_index && poff == 0)) >> - goto unlock; >> - >> - /* >> - * The page straddles i_size. It must be zeroed out on each >> - * and every writepage invocation because it may be mmapped. >> - * "A file is mapped in multiples of the page size. For a file >> - * that is not a multiple of the page size, the remaining >> - * memory is zeroed when mapped, and writes to that region are >> - * not written out to the file." >> - */ >> - folio_zero_segment(folio, poff, folio_size(folio)); >> - end_pos = isize; >> + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { >> + folio_unlock(folio); >> + return 0; >> } >> >> return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); >> - >> -unlock: >> - folio_unlock(folio); >> - return 0; >> } >> >> int >> -- >> 2.39.2
On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote: > > > > i_size_read(inode) returns loff_t type. Can we make end_pos also as > > loff_t type. We anyway use loff_t for > > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It > > would be more consistent with the data type then. > > > > Thoughts? > > aah, that might also require to change the types in > iomap_writepage_map(). So I guess the data type consistency change > should be a follow up change as this patch does only the refactoring. Yes, I'm trying to stay consistent in the writeback code. IIRC some of the u64 use was to better deal with overflows, but I'll have to look up the history.
On Mon, Nov 27, 2023 at 08:12:19AM +0100, Christoph Hellwig wrote: > On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote: > > > > > > i_size_read(inode) returns loff_t type. Can we make end_pos also as > > > loff_t type. We anyway use loff_t for > > > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It > > > would be more consistent with the data type then. > > > > > > Thoughts? > > > > aah, that might also require to change the types in > > iomap_writepage_map(). So I guess the data type consistency change > > should be a follow up change as this patch does only the refactoring. Separate patch for the cleanup, please. Then we can more easily target it for bisection in case there are signed comparison bugs. I hate C. > Yes, I'm trying to stay consistent in the writeback code. IIRC some > of the u64 use was to better deal with overflows, but I'll have to look > up the history. For this patch, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > >
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 8148e4c9765dac..4a5a21809b0182 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio, wbc_account_cgroup_owner(wbc, &folio->page, len); } +/* + * Check interaction of the folio with the file end. + * + * If the folio is entirely beyond i_size, return false. If it straddles + * i_size, adjust end_pos and zero all data beyond i_size. + */ +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode, + u64 *end_pos) +{ + u64 isize = i_size_read(inode); + + if (*end_pos > isize) { + size_t poff = offset_in_folio(folio, isize); + pgoff_t end_index = isize >> PAGE_SHIFT; + + /* + * If the folio is entirely ouside of i_size, skip it. + * + * This can happen due to a truncate operation that is in + * progress and in that case truncate will finish it off once + * we've dropped the folio lock. + * + * Note that the pgoff_t used for end_index is an unsigned long. + * If the given offset is greater than 16TB on a 32-bit system, + * then if we checked if the folio is fully outside i_size with + * "if (folio->index >= end_index + 1)", "end_index + 1" would + * overflow and evaluate to 0. Hence this folio would be + * redirtied and written out repeatedly, which would result in + * an infinite loop; the user program performing this operation + * would hang. Instead, we can detect this situation by + * checking if the folio is totally beyond i_size or if its + * offset is just equal to the EOF. + */ + if (folio->index > end_index || + (folio->index == end_index && poff == 0)) + return false; + + /* + * The folio straddles i_size. + * + * It must be zeroed out on each and every writepage invocation + * because it may be mmapped: + * + * A file is mapped in multiples of the page size. For a + * file that is not a multiple of the page size, the + * remaining memory is zeroed when mapped, and writes to that + * region are not written out to the file. + * + * Also adjust the writeback range to skip all blocks entirely + * beyond i_size. + */ + folio_zero_segment(folio, poff, folio_size(folio)); + *end_pos = isize; + } + + return true; +} + /* * We implement an immediate ioend submission policy here to avoid needing to * chain multiple ioends and hence nest mempool allocations which can violate @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio, { struct iomap_writepage_ctx *wpc = data; struct inode *inode = folio->mapping->host; - u64 end_pos, isize; + u64 end_pos = folio_pos(folio) + folio_size(folio); trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio)); - /* - * Is this folio beyond the end of the file? - * - * The folio index is less than the end_index, adjust the end_pos - * to the highest offset that this folio should represent. - * ----------------------------------------------------- - * | file mapping | <EOF> | - * ----------------------------------------------------- - * | Page ... | Page N-2 | Page N-1 | Page N | | - * ^--------------------------------^----------|-------- - * | desired writeback range | see else | - * ---------------------------------^------------------| - */ - isize = i_size_read(inode); - end_pos = folio_pos(folio) + folio_size(folio); - if (end_pos > isize) { - /* - * Check whether the page to write out is beyond or straddles - * i_size or not. - * ------------------------------------------------------- - * | file mapping | <EOF> | - * ------------------------------------------------------- - * | Page ... | Page N-2 | Page N-1 | Page N | Beyond | - * ^--------------------------------^-----------|--------- - * | | Straddles | - * ---------------------------------^-----------|--------| - */ - size_t poff = offset_in_folio(folio, isize); - pgoff_t end_index = isize >> PAGE_SHIFT; - - /* - * Skip the page if it's fully outside i_size, e.g. - * due to a truncate operation that's in progress. We've - * cleaned this page and truncate will finish things off for - * us. - * - * Note that the end_index is unsigned long. If the given - * offset is greater than 16TB on a 32-bit system then if we - * checked if the page is fully outside i_size with - * "if (page->index >= end_index + 1)", "end_index + 1" would - * overflow and evaluate to 0. Hence this page would be - * redirtied and written out repeatedly, which would result in - * an infinite loop; the user program performing this operation - * would hang. Instead, we can detect this situation by - * checking if the page is totally beyond i_size or if its - * offset is just equal to the EOF. - */ - if (folio->index > end_index || - (folio->index == end_index && poff == 0)) - goto unlock; - - /* - * The page straddles i_size. It must be zeroed out on each - * and every writepage invocation because it may be mmapped. - * "A file is mapped in multiples of the page size. For a file - * that is not a multiple of the page size, the remaining - * memory is zeroed when mapped, and writes to that region are - * not written out to the file." - */ - folio_zero_segment(folio, poff, folio_size(folio)); - end_pos = isize; + if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) { + folio_unlock(folio); + return 0; } return iomap_writepage_map(wpc, wbc, inode, folio, end_pos); - -unlock: - folio_unlock(folio); - return 0; } int
Most of iomap_do_writepage is dedidcated to handling a folio crossing or beyond i_size. Split this is into a separate helper and update the commens to deal with folios instead of pages and make them more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/iomap/buffered-io.c | 128 ++++++++++++++++++++--------------------- 1 file changed, 62 insertions(+), 66 deletions(-)