diff mbox series

[1/4] filemap: return pos of first dirty folio from range_has_writeback

Message ID 20240718130212.23905-2-bfoster@redhat.com (mailing list archive)
State Accepted, archived
Headers show
Series iomap: zero dirty folios over unwritten mappings on zero range | expand

Commit Message

Brian Foster July 18, 2024, 1:02 p.m. UTC
The iomap layer has a use case that wants to locate the first dirty
or writeback folio in a particular range.
filemap_range_has_writeback() already implements this with the
exception that it only returns a boolean.

Since the _needs_writeback() wrapper is currently the only caller,
tweak has_writeback to take a pointer for the starting offset and
update it to the offset of the first dirty folio found.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 include/linux/pagemap.h | 4 ++--
 mm/filemap.c            | 8 +++++---
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Matthew Wilcox July 18, 2024, 3:09 p.m. UTC | #1
On Thu, Jul 18, 2024 at 09:02:09AM -0400, Brian Foster wrote:
> @@ -655,6 +655,8 @@ bool filemap_range_has_writeback(struct address_space *mapping,
>  				folio_test_writeback(folio))
>  			break;
>  	}
> +	if (folio)
> +		*start_byte = folio_pos(folio);
>  	rcu_read_unlock();
>  	return folio != NULL;
>  }

Distressingly, this is unsafe.

We have no reference on the folio at this point (not one that matters,
anyway).  We have the rcu read lock, yes, but that doesn't protect enough
to make folio_pos() safe.

Since we do't have folio_get() here, the folio can be freed, sent back to
the page allocator, and then reallocated to literally any purpose.  As I'm
reviewing patch 1/4, I have no idea if this is just a hint and you can
survive it being completely wrong, or if this is going to cause problems.
Brian Foster July 18, 2024, 4:03 p.m. UTC | #2
On Thu, Jul 18, 2024 at 04:09:03PM +0100, Matthew Wilcox wrote:
> On Thu, Jul 18, 2024 at 09:02:09AM -0400, Brian Foster wrote:
> > @@ -655,6 +655,8 @@ bool filemap_range_has_writeback(struct address_space *mapping,
> >  				folio_test_writeback(folio))
> >  			break;
> >  	}
> > +	if (folio)
> > +		*start_byte = folio_pos(folio);
> >  	rcu_read_unlock();
> >  	return folio != NULL;
> >  }
> 
> Distressingly, this is unsafe.
> 
> We have no reference on the folio at this point (not one that matters,
> anyway).  We have the rcu read lock, yes, but that doesn't protect enough
> to make folio_pos() safe.
> 
> Since we do't have folio_get() here, the folio can be freed, sent back to
> the page allocator, and then reallocated to literally any purpose.  As I'm
> reviewing patch 1/4, I have no idea if this is just a hint and you can
> survive it being completely wrong, or if this is going to cause problems.
> 

Ah, thanks. I was unsure about this when I hacked it up but then got
more focused on patch 3. I think for this implementation I'd want it to
be an accurate pos of the first dirty/wb folio. I think this could
possibly use filemap_range_has_writeback() (without patch 1) as more of
a hint/optimization, but that might involve doing the FGP_NOCREAT thing
from the previous variant of this prototype has and I was trying to
avoid that.

Do you think it would be reasonable to create a variant of this function
that did the relevant bits from __filemap_get_folio():

        if (!folio_try_get(folio))
                goto repeat;

        if (unlikely(folio != xas_reload(&xas))) {
                folio_put(folio);
                goto repeat;
        }
	/* check dirty/wb etc. */

... in order to either return a correct pos or maybe even a reference to
the folio itself? iomap_zero_iter() wants the locked folio anyways, but
that might be too ugly to pass through the iomap_folio_ops thing in
current form.

If that doesn't work, then I might chalk this up as another reason to
just do the flush thing I was rambling about in the cover letter...

Brian
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a0a026d2d244..a15131a3fa12 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1217,7 +1217,7 @@  int __filemap_add_folio(struct address_space *mapping, struct folio *folio,
 		pgoff_t index, gfp_t gfp, void **shadowp);
 
 bool filemap_range_has_writeback(struct address_space *mapping,
-				 loff_t start_byte, loff_t end_byte);
+				 loff_t *start_byte, loff_t end_byte);
 
 /**
  * filemap_range_needs_writeback - check if range potentially needs writeback
@@ -1242,7 +1242,7 @@  static inline bool filemap_range_needs_writeback(struct address_space *mapping,
 	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
 	    !mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
 		return false;
-	return filemap_range_has_writeback(mapping, start_byte, end_byte);
+	return filemap_range_has_writeback(mapping, &start_byte, end_byte);
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 657bcd887fdb..be0a219e8d9e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -636,13 +636,13 @@  static bool mapping_needs_writeback(struct address_space *mapping)
 }
 
 bool filemap_range_has_writeback(struct address_space *mapping,
-				 loff_t start_byte, loff_t end_byte)
+				 loff_t *start_byte, loff_t end_byte)
 {
-	XA_STATE(xas, &mapping->i_pages, start_byte >> PAGE_SHIFT);
+	XA_STATE(xas, &mapping->i_pages, *start_byte >> PAGE_SHIFT);
 	pgoff_t max = end_byte >> PAGE_SHIFT;
 	struct folio *folio;
 
-	if (end_byte < start_byte)
+	if (end_byte < *start_byte)
 		return false;
 
 	rcu_read_lock();
@@ -655,6 +655,8 @@  bool filemap_range_has_writeback(struct address_space *mapping,
 				folio_test_writeback(folio))
 			break;
 	}
+	if (folio)
+		*start_byte = folio_pos(folio);
 	rcu_read_unlock();
 	return folio != NULL;
 }