mbox series

[RESEND,V2,0/5] Fixes and cleanups to xarray

Message ID 20241111215359.246937-1-shikemeng@huaweicloud.com (mailing list archive)
Headers show
Series Fixes and cleanups to xarray | expand

Message

Kemeng Shi Nov. 11, 2024, 9:53 p.m. UTC
resend:
-also cc linux-mm

v1->v2:
-Drop patch "Xarray: skip unneeded xas_store() and xas_clear_mark() in
__xa_alloc()"

This series contains some random fixes and cleanups to xarray. Patch 1-3
are fixes and patch 4-5 are cleanups. More details can be found in
respective patches. Thanks!

Kemeng Shi (5):
  Xarray: Do not return sibling entries from xas_find_marked()
  Xarray: distinguish large entries correctly in xas_split_alloc()
  Xarray: move forward index correctly in xas_pause()
  Xarray: remove repeat check in xas_squash_marks()
  Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent

 lib/test_xarray.c                     | 35 +++++++++++++++++++++++++++
 lib/xarray.c                          | 26 +++++++++++---------
 tools/testing/radix-tree/multiorder.c |  4 +++
 3 files changed, 54 insertions(+), 11 deletions(-)

Comments

Andrew Morton Nov. 11, 2024, 9:28 p.m. UTC | #1
On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:

> This series contains some random fixes and cleanups to xarray. Patch 1-3
> are fixes and patch 4-5 are cleanups. More details can be found in
> respective patches. 

As we're at -rc7 I'm looking to stash low-priority things away for
consideration after 6.13-r1.

But I don't know if all of these are low-priority things.  Because you
didn't tell me!

Please please always always describe the possible userspace-visible effects of
all fixes.  And all non-fixes, come to that.

So.  Please go through all of the patches in this series and redo the
changelogs, describing the impact each patch might have for our users.

If any of these patches should, in your opinion, be included in 6.12
and/or backported into -stable trees then it would be better to
separate them into a different patch series.  But a single patch series
is acceptable - I can figure this stuff out, if I'm given the necessary
information!  

Thanks.
Matthew Wilcox Nov. 11, 2024, 9:45 p.m. UTC | #2
On Mon, Nov 11, 2024 at 01:28:16PM -0800, Andrew Morton wrote:
> On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> 
> > This series contains some random fixes and cleanups to xarray. Patch 1-3
> > are fixes and patch 4-5 are cleanups. More details can be found in
> > respective patches. 
> 
> As we're at -rc7 I'm looking to stash low-priority things away for
> consideration after 6.13-r1.
> 
> But I don't know if all of these are low-priority things.  Because you
> didn't tell me!

I don't think any of them fix any bugs that any code currently in the
kernel would stumble across.
Kemeng Shi Nov. 12, 2024, 7:05 a.m. UTC | #3
on 11/12/2024 5:28 AM, Andrew Morton wrote:
> On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> 
>> This series contains some random fixes and cleanups to xarray. Patch 1-3
>> are fixes and patch 4-5 are cleanups. More details can be found in
>> respective patches. 
> 
> As we're at -rc7 I'm looking to stash low-priority things away for
> consideration after 6.13-r1.
> 
> But I don't know if all of these are low-priority things.  Because you
> didn't tell me!
> 
> Please please always always describe the possible userspace-visible effects of
> all fixes.  And all non-fixes, come to that.>
> So.  Please go through all of the patches in this series and redo the
> changelogs, describing the impact each patch might have for our users.
Sorry for missing these information, I will add detail of impact to userspace
in next version.
As there are a lot of users of xarray in kernel and I'm not familar to
most of them, I maybe not able to send new series with description of
userspace-visible effects in time. I'd like to describe effect to users
of xarray in kernel here to give some early inputs.

Patch 1 fixes the issue that xas_find_marked() will return a sibling entry
to users when there is a race to replace old entry with small range with
a new entry with bigger range. The kernel will crash if users use returned
sibling entry as a valid entry.
For example, find_get_entry() will crash if it gets a sibling entry from
xas_find_marked() when trying to search a writeback folios.

Patch 2 fixes the issue that xas_split_alloc() doesn't distinguish large
entries whose order is == xas->xa_shift + 2 * XA_CHUNK_SHIFT in whice case
the splited entries have bigger order than expected. Then afterward
xas_store will assign stored entry with bigger range than expected and we
will get wrong entry from slot after actual range of stored entry. Not
sure if we will split large entry with order == xas->xa_shift + 2 *
XA_CHUNK_SHIFT.

Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly
after xas_load()(may be called in xas_for_each for first entry) loads a
large entry with index point at mid of the entry. So we may miss to writeback
some dirty page and lost user data.

> 
> If any of these patches should, in your opinion, be included in 6.12
> and/or backported into -stable trees then it would be better to
> separate them into a different patch series.  But a single patch series
> is acceptable - I can figure this stuff out, if I'm given the necessary
> information!  
As impact descripted above, patch 1-3 should be include in 6.12 and backported
into -stable trees. But we need to confirm if the trigger condition exists
in users(filemap, shmem and so on) of xarray in real world. Will include more
information in next version.

Thanks.
> 
> Thanks.
>
Matthew Wilcox Nov. 12, 2024, 3:48 p.m. UTC | #4
On Tue, Nov 12, 2024 at 03:05:51PM +0800, Kemeng Shi wrote:
> Patch 1 fixes the issue that xas_find_marked() will return a sibling entry
> to users when there is a race to replace old entry with small range with
> a new entry with bigger range. The kernel will crash if users use returned
> sibling entry as a valid entry.
> For example, find_get_entry() will crash if it gets a sibling entry from
> xas_find_marked() when trying to search a writeback folios.

You _think_ this can happen, or you've observed a crash where it _has_
happened?  I don't think it can happen due to the various locks we
have.  I'm not opposed to including this patch, but I don't think it
can happen today.

> Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly
> after xas_load()(may be called in xas_for_each for first entry) loads a
> large entry with index point at mid of the entry. So we may miss to writeback
> some dirty page and lost user data.

How do we lose user data?  The inode will still contain dirty pages.
We might skip some pages we should not have skipped, but they'll be
caught by a subsequent pass, no?
Kemeng Shi Nov. 13, 2024, 1:25 a.m. UTC | #5
on 11/12/2024 11:48 PM, Matthew Wilcox wrote:
> On Tue, Nov 12, 2024 at 03:05:51PM +0800, Kemeng Shi wrote:
>> Patch 1 fixes the issue that xas_find_marked() will return a sibling entry
>> to users when there is a race to replace old entry with small range with
>> a new entry with bigger range. The kernel will crash if users use returned
>> sibling entry as a valid entry.
>> For example, find_get_entry() will crash if it gets a sibling entry from
>> xas_find_marked() when trying to search a writeback folios.
> 
> You _think_ this can happen, or you've observed a crash where it _has_
> happened?  I don't think it can happen due to the various locks we
> have.  I'm not opposed to including this patch, but I don't think it
> can happen today.Sorry for confusion. As I mentioned at last email, we need to confirm if
the trigger condition exists in real world of all three potential bug. Here,
I want to describe the potential impact if the potential bug happens for
some early input.
> 
>> Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly
>> after xas_load()(may be called in xas_for_each for first entry) loads a
>> large entry with index point at mid of the entry. So we may miss to writeback
>> some dirty page and lost user data.
> 
> How do we lose user data?  The inode will still contain dirty pages.
> We might skip some pages we should not have skipped, but they'll be
> caught by a subsequent pass, no?
> 
After flush, users may expect dirty pages are in storage and are persistent. But
if we miss to writeback some dirty pages and kernel is reboot after flush, then
the data in un-written pages are last to users. I'm not sure if I miss something
are. But again, we need to confirm if the trigger condition exists in real world.

Thanks.