Message ID | 20200204142514.15826-9-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: Speedup page cache truncation | expand |
On Tue, Feb 04, 2020 at 03:25:14PM +0100, Jan Kara wrote: > When storing NULL in xarray, xas_store() has been clearing all marks > because it could otherwise confuse xas_for_each_marked(). That is > however no longer true and no current user relies on this behavior. > Furthermore it seems as a cleaner API to not do clearing behind caller's > back in case we store NULL. > > This provides a nice boost to truncate numbers due to saving unnecessary > tag initialization when clearing shadow entries. Sample benchmark > showing time to truncate 128 files 1GB each on machine with 64GB of RAM > (so about half of entries are shadow entries): > > AVG STDDEV > Vanilla 4.825s 0.036 > Patched 4.516s 0.014 > > So we can see about 6% reduction in overall truncate time. > > Signed-off-by: Jan Kara <jack@suse.cz> > lib/xarray.c | 9 --------- > 1 file changed, 9 deletions(-) > > diff --git a/lib/xarray.c b/lib/xarray.c > index 4e32497c51bd..f165e83652f1 100644 > +++ b/lib/xarray.c > @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry) > if (xas->xa_sibs) > xas_squash_marks(xas); > } > - if (!entry) > - xas_init_marks(xas); > > for (;;) { > - /* > - * Must clear the marks before setting the entry to NULL, > - * otherwise xas_for_each_marked may find a NULL entry and > - * stop early. rcu_assign_pointer contains a release barrier > - * so the mark clearing will appear to happen before the > - * entry is set to NULL. > - */ > rcu_assign_pointer(*slot, entry); The above removed comment doesn't sound right (the release is paired with READ_ONCE, which is only an acquire for data dependent accesses), is this a reflection of the original bug in this thread? How is RCU mark reading used anyhow? There is no guarenteed ordering of the mark and the value, so nothing iterating under RCU over marks can rely on the marks being accurate. Are the algorithms using this tolerant of that, or provide some kind of external locking? This series looks good to me, and does seem to be an improvement. Actually the clearing of marks by xa_store(, NULL) is creating a very subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes line too: ib_set_client_data() is assuming the marks for the entry will not change, but if the caller passed in NULL they get wrongly reset, and three call sites pass in NULL: drivers/infiniband/ulp/srpt/ib_srpt.c net/rds/ib.c net/smc/smc_ib.c Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") Thanks, Jason
On Wed, Feb 05, 2020 at 02:43:44PM -0400, Jason Gunthorpe wrote: > On Tue, Feb 04, 2020 at 03:25:14PM +0100, Jan Kara wrote: > > When storing NULL in xarray, xas_store() has been clearing all marks > > because it could otherwise confuse xas_for_each_marked(). That is > > however no longer true and no current user relies on this behavior. > > Furthermore it seems as a cleaner API to not do clearing behind caller's > > back in case we store NULL. > > > > This provides a nice boost to truncate numbers due to saving unnecessary > > tag initialization when clearing shadow entries. Sample benchmark > > showing time to truncate 128 files 1GB each on machine with 64GB of RAM > > (so about half of entries are shadow entries): > > > > AVG STDDEV > > Vanilla 4.825s 0.036 > > Patched 4.516s 0.014 > > > > So we can see about 6% reduction in overall truncate time. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > lib/xarray.c | 9 --------- > > 1 file changed, 9 deletions(-) > > > > diff --git a/lib/xarray.c b/lib/xarray.c > > index 4e32497c51bd..f165e83652f1 100644 > > +++ b/lib/xarray.c > > @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry) > > if (xas->xa_sibs) > > xas_squash_marks(xas); > > } > > - if (!entry) > > - xas_init_marks(xas); > > > > for (;;) { > > - /* > > - * Must clear the marks before setting the entry to NULL, > > - * otherwise xas_for_each_marked may find a NULL entry and > > - * stop early. rcu_assign_pointer contains a release barrier > > - * so the mark clearing will appear to happen before the > > - * entry is set to NULL. > > - */ > > rcu_assign_pointer(*slot, entry); > > The above removed comment doesn't sound right (the release is paired > with READ_ONCE, which is only an acquire for data dependent accesses), > is this a reflection of the original bug in this thread? Yes. I was thinking about a classical race like so: read mark clear mark load entry store NULL but of course CPUs can execute many instructions asynchronously with each other, and read mark clear mark store NULL load entry can't be prevented against for an RCU reader. > How is RCU mark reading used anyhow? We iterate over pages in the page cache with, eg, the dirty bit set. This bug will lead to the loop terminating early and failing to find dirty pages that it should. > Actually the clearing of marks by xa_store(, NULL) is creating a very > subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes > line too: > > ib_set_client_data() is assuming the marks for the entry will not > change, but if the caller passed in NULL they get wrongly reset, and > three call sites pass in NULL: > drivers/infiniband/ulp/srpt/ib_srpt.c > net/rds/ib.c > net/smc/smc_ib.c > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") There's no bug here. If you're actually storing NULL in the array, then the marks would go away. That's inherent -- imagine you have an array with a single entry at 64. Then you store NULL there. That causes the node to be deleted, and the marks must necessarily disappear with it -- there's nowhere to store them! But you aren't storing NULL in the array. I mean, you think you are, and if you load back the entry from the array, you'll get a NULL. But this is an allocating array, and so when you go to store NULL in the array it _actually_ stores an XA_ZERO_ENTRY. Which is converted back to NULL when you load it. You have to call xa_erase() to make an entry disappear from an allocating array. Just storing NULL isn't going to do it.
On 2/4/20 6:25 AM, Jan Kara wrote: > When storing NULL in xarray, xas_store() has been clearing all marks > because it could otherwise confuse xas_for_each_marked(). That is > however no longer true and no current user relies on this behavior. However, let's not forget that the API was also documented to behave in this way--it's not an accidental detail. Below... > Furthermore it seems as a cleaner API to not do clearing behind caller's > back in case we store NULL. > > This provides a nice boost to truncate numbers due to saving unnecessary > tag initialization when clearing shadow entries. Sample benchmark > showing time to truncate 128 files 1GB each on machine with 64GB of RAM > (so about half of entries are shadow entries): > > AVG STDDEV > Vanilla 4.825s 0.036 > Patched 4.516s 0.014 > > So we can see about 6% reduction in overall truncate time. > > Signed-off-by: Jan Kara <jack@suse.cz> > --- > lib/xarray.c | 9 --------- > 1 file changed, 9 deletions(-) > > diff --git a/lib/xarray.c b/lib/xarray.c > index 4e32497c51bd..f165e83652f1 100644 > --- a/lib/xarray.c > +++ b/lib/xarray.c > @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry) > if (xas->xa_sibs) > xas_squash_marks(xas); > } > - if (!entry) > - xas_init_marks(xas); > > for (;;) { > - /* > - * Must clear the marks before setting the entry to NULL, > - * otherwise xas_for_each_marked may find a NULL entry and > - * stop early. rcu_assign_pointer contains a release barrier > - * so the mark clearing will appear to happen before the > - * entry is set to NULL. > - */ So if we do this, I think we'd also want something like this (probably with better wording, this is just a first draft): diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst index 640934b6f7b4..8adeaa8c012e 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -66,10 +66,11 @@ pointer at every index. You can then set entries using xa_store() and get entries using xa_load(). xa_store will overwrite any entry with the new entry and return the previous entry stored at that index. You can -use xa_erase() instead of calling xa_store() with a +use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a ``NULL`` entry. There is no difference between an entry that has never -been stored to, one that has been erased and one that has most recently -had ``NULL`` stored to it. +been stored to and one that has been erased. Those, in turn, are the same +as an entry that has had ``NULL`` stored to it and also had its marks +erased via xas_init_marks(). You can conditionally replace an entry at an index by using xa_cmpxchg(). Like cmpxchg(), it will only succeed if > rcu_assign_pointer(*slot, entry); > if (xa_is_node(next) && (!node || node->shift)) > xas_free_nodes(xas, xa_to_node(next)); > thanks,
On Wed, Feb 05, 2020 at 02:19:34PM -0800, John Hubbard wrote: > So if we do this, I think we'd also want something like this (probably with > better wording, this is just a first draft): > > diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst > index 640934b6f7b4..8adeaa8c012e 100644 > --- a/Documentation/core-api/xarray.rst > +++ b/Documentation/core-api/xarray.rst > @@ -66,10 +66,11 @@ pointer at every index. > You can then set entries using xa_store() and get entries > using xa_load(). xa_store will overwrite any entry with the > new entry and return the previous entry stored at that index. You can > -use xa_erase() instead of calling xa_store() with a > +use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a Woah, woah, woah. xa_erase() re-initialises the marks. Nobody's going to change that. Don't confuse the porcelain and plumbing APIs.
On 2/5/20 6:21 PM, Matthew Wilcox wrote: > On Wed, Feb 05, 2020 at 02:19:34PM -0800, John Hubbard wrote: >> So if we do this, I think we'd also want something like this (probably with >> better wording, this is just a first draft): >> >> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst >> index 640934b6f7b4..8adeaa8c012e 100644 >> --- a/Documentation/core-api/xarray.rst >> +++ b/Documentation/core-api/xarray.rst >> @@ -66,10 +66,11 @@ pointer at every index. >> You can then set entries using xa_store() and get entries >> using xa_load(). xa_store will overwrite any entry with the >> new entry and return the previous entry stored at that index. You can >> -use xa_erase() instead of calling xa_store() with a >> +use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a > > Woah, woah, woah. xa_erase() re-initialises the marks. Nobody's going Yes, I get that. But I mis-wrote it, it should have read more like: You can then set entries using xa_store() and get entries using xa_load(). xa_store will overwrite any entry with the new entry and return the previous entry stored at that index. You can use xa_erase(), instead of calling xa_store() with a ``NULL`` entry followed by xas_init_marks(). There is no difference between an entry that has never been stored to and one that has been erased. Those, in turn, are the same as an entry that has had ``NULL`` stored to it and also had its marks erased via xas_init_marks(). > to change that. Don't confuse the porcelain and plumbing APIs. > The API still documents a different behavior than this patchset changes it to, despite my imperfect attempts to update the documentation. So let's please keep the porcelain aligned with the plumbing. :) thanks,
On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote: > You can then set entries using xa_store() and get entries > using xa_load(). xa_store will overwrite any entry with the > new entry and return the previous entry stored at that index. You can > use xa_erase(), instead of calling xa_store() with a > ``NULL`` entry followed by xas_init_marks(). There is no difference between > an entry that has never been stored to and one that has been erased. Those, > in turn, are the same as an entry that has had ``NULL`` stored to it and > also had its marks erased via xas_init_marks(). There's a fundamental misunderstanding here. If you store a NULL, the marks go away. There is no such thing as a marked NULL entry. If you observe such a thing, it can only exist through some kind of permitted RCU race, and the entry must be ignored. If you're holding the xa_lock, there is no way to observe a NULL entry with a search mark set. What Jan is trying to do is allow code that knows what it's doing the ability to say "Skip clearing the marks for performance reasons. The marks are already clear." I'm still mulling over the patches from Jan. There's something I don't like about them, but I can't articulate it in a useful way yet. I'm on board with the general principle, and obviously the xas_for_each_marked() bug needs to be fixed.
On 2/5/20 8:28 PM, Matthew Wilcox wrote: > On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote: >> You can then set entries using xa_store() and get entries >> using xa_load(). xa_store will overwrite any entry with the >> new entry and return the previous entry stored at that index. You can >> use xa_erase(), instead of calling xa_store() with a >> ``NULL`` entry followed by xas_init_marks(). There is no difference between >> an entry that has never been stored to and one that has been erased. Those, >> in turn, are the same as an entry that has had ``NULL`` stored to it and >> also had its marks erased via xas_init_marks(). > > There's a fundamental misunderstanding here. If you store a NULL, the > marks go away. There is no such thing as a marked NULL entry. If you > observe such a thing, it can only exist through some kind of permitted > RCU race, and the entry must be ignored. If you're holding the xa_lock, > there is no way to observe a NULL entry with a search mark set. aha, the key point. Thanks for explaining it. thanks,
On Wed 05-02-20 14:19:34, John Hubbard wrote: > On 2/4/20 6:25 AM, Jan Kara wrote: > > When storing NULL in xarray, xas_store() has been clearing all marks > > because it could otherwise confuse xas_for_each_marked(). That is > > however no longer true and no current user relies on this behavior. > > > However, let's not forget that the API was also documented to behave > in this way--it's not an accidental detail. Below... Yeah, we'll need to sort out details how we want the API to be used but thanks for reminding me I should update the documentation :). It was on my radar but then I forgot... Honza
On Wed 05-02-20 20:28:01, Matthew Wilcox wrote: > On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote: > > You can then set entries using xa_store() and get entries > > using xa_load(). xa_store will overwrite any entry with the > > new entry and return the previous entry stored at that index. You can > > use xa_erase(), instead of calling xa_store() with a > > ``NULL`` entry followed by xas_init_marks(). There is no difference between > > an entry that has never been stored to and one that has been erased. Those, > > in turn, are the same as an entry that has had ``NULL`` stored to it and > > also had its marks erased via xas_init_marks(). > > There's a fundamental misunderstanding here. If you store a NULL, the > marks go away. There is no such thing as a marked NULL entry. If you > observe such a thing, it can only exist through some kind of permitted > RCU race, and the entry must be ignored. If you're holding the xa_lock, > there is no way to observe a NULL entry with a search mark set. > > What Jan is trying to do is allow code that knows what it's doing > the ability to say "Skip clearing the marks for performance reasons. > The marks are already clear." > > I'm still mulling over the patches from Jan. There's something I don't > like about them, but I can't articulate it in a useful way yet. I'm on > board with the general principle, and obviously the xas_for_each_marked() > bug needs to be fixed. There are different ways how to look at what I'm doing :) I was thinking about it more like "xas_store() is for storing value at some index", "xas_erase() is when I want the value at some index removed from the data structure". Because these are principially different operations for any data structure (as much as erasing can be *implemented* by just storing NULL at some index). You seem to recognize this for xa_ functions but you probably considered xas_ functions internal enough that they follow more the "how it is implemented" way of thinking. Now I agree that there are holes in my way of thinking about xas_store() because if you happen to store NULL at some index, marks may get destroyed as a side-effect. And some users of __xa_cmpxchg() (BTW nobody seems to be using xa_cmpxchg_bh()) do use the fact that storing NULL does effectively erase an entry which is BTW inconsistent with xa_store() itself as well... You've been probably thinking more about xarray API semantics than I was so I can be convinced otherwise but at this point, I'd rather move the API more towards "erase is different from storing NULL". Honza
On Wed, Feb 05, 2020 at 01:59:04PM -0800, Matthew Wilcox wrote: > > How is RCU mark reading used anyhow? > > We iterate over pages in the page cache with, eg, the dirty bit set. > This bug will lead to the loop terminating early and failing to find > dirty pages that it should. But the inhernetly weak sync of marks and pointers means that iteration will miss some dirty pages and return some clean pages. It is all OK for some reason? > > Actually the clearing of marks by xa_store(, NULL) is creating a very > > subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes > > line too: > > > > ib_set_client_data() is assuming the marks for the entry will not > > change, but if the caller passed in NULL they get wrongly reset, and > > three call sites pass in NULL: > > drivers/infiniband/ulp/srpt/ib_srpt.c > > net/rds/ib.c > > net/smc/smc_ib.c > > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") > > There's no bug here. > > If you're actually storing NULL in the array, then the marks would go > away. That's inherent -- imagine you have an array with a single entry > at 64. Then you store NULL there. That causes the node to be deleted, > and the marks must necessarily disappear with it -- there's nowhere to > store them! Ah, it is allocating! These little behavior differences are tricky to remember over with infrequent use :( Jason
On Thu 06-02-20 09:49:04, Jason Gunthorpe wrote: > On Wed, Feb 05, 2020 at 01:59:04PM -0800, Matthew Wilcox wrote: > > > > How is RCU mark reading used anyhow? > > > > We iterate over pages in the page cache with, eg, the dirty bit set. > > This bug will lead to the loop terminating early and failing to find > > dirty pages that it should. > > But the inhernetly weak sync of marks and pointers means that > iteration will miss some dirty pages and return some clean pages. It > is all OK for some reason? Yes. The reason is - the definitive source of dirtiness is in page flags so if iteration returns more pages, we don't care. WRT missing pages we only need to make sure pages that were dirty before the iteration started are returned and the current code fulfills that. > > > Actually the clearing of marks by xa_store(, NULL) is creating a very > > > subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes > > > line too: > > > > > > ib_set_client_data() is assuming the marks for the entry will not > > > change, but if the caller passed in NULL they get wrongly reset, and > > > three call sites pass in NULL: > > > drivers/infiniband/ulp/srpt/ib_srpt.c > > > net/rds/ib.c > > > net/smc/smc_ib.c > > > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data") > > > > There's no bug here. > > > > If you're actually storing NULL in the array, then the marks would go > > away. That's inherent -- imagine you have an array with a single entry > > at 64. Then you store NULL there. That causes the node to be deleted, > > and the marks must necessarily disappear with it -- there's nowhere to > > store them! > > Ah, it is allocating! These little behavior differences are tricky to > remember over with infrequent use :( Yeah, that's why I'd prefer if NULL was not "special value" at all and if someone wanted to remove index from xarray he'd always have to use a special function. My patches go towards that direction but not the full way because there's still xa_cmpxchg() whose users use the fact that NULL is in fact 'erase'. Honza
On Thu, Feb 06, 2020 at 03:36:27PM +0100, Jan Kara wrote: > Yeah, that's why I'd prefer if NULL was not "special value" at all and if > someone wanted to remove index from xarray he'd always have to use a > special function. My patches go towards that direction but not the full way > because there's still xa_cmpxchg() whose users use the fact that NULL is in > fact 'erase'. IMHO, this is more appealing. The fact that xa_store(NULL) on non-allocating arrays changes marks seems very surprising/counter intuitive. It feels wise to avoid subtle differences like this between allocating/non-allocating mode. So, it would be more uniform if xa_store and xa_cmpxchg never altered marks. I suppose in practice this means that xa_store(NULL) has to store a XA_ZERO_ENTRY even for non-allocating arrays, and related. Perhaps xa_cmp_erase() could be introduced to take the place of cmpxchg(NULL), and the distinction between erase and store NULL is that erase has the mark-destroying property and guarentees the tree can be pruned. Jason
diff --git a/lib/xarray.c b/lib/xarray.c index 4e32497c51bd..f165e83652f1 100644 --- a/lib/xarray.c +++ b/lib/xarray.c @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry) if (xas->xa_sibs) xas_squash_marks(xas); } - if (!entry) - xas_init_marks(xas); for (;;) { - /* - * Must clear the marks before setting the entry to NULL, - * otherwise xas_for_each_marked may find a NULL entry and - * stop early. rcu_assign_pointer contains a release barrier - * so the mark clearing will appear to happen before the - * entry is set to NULL. - */ rcu_assign_pointer(*slot, entry); if (xa_is_node(next) && (!node || node->shift)) xas_free_nodes(xas, xa_to_node(next));
When storing NULL in xarray, xas_store() has been clearing all marks because it could otherwise confuse xas_for_each_marked(). That is however no longer true and no current user relies on this behavior. Furthermore it seems as a cleaner API to not do clearing behind caller's back in case we store NULL. This provides a nice boost to truncate numbers due to saving unnecessary tag initialization when clearing shadow entries. Sample benchmark showing time to truncate 128 files 1GB each on machine with 64GB of RAM (so about half of entries are shadow entries): AVG STDDEV Vanilla 4.825s 0.036 Patched 4.516s 0.014 So we can see about 6% reduction in overall truncate time. Signed-off-by: Jan Kara <jack@suse.cz> --- lib/xarray.c | 9 --------- 1 file changed, 9 deletions(-)