Message ID | 20241111215359.246937-1-shikemeng@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | Fixes and cleanups to xarray | expand |
On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > This series contains some random fixes and cleanups to xarray. Patch 1-3 > are fixes and patch 4-5 are cleanups. More details can be found in > respective patches. As we're at -rc7 I'm looking to stash low-priority things away for consideration after 6.13-r1. But I don't know if all of these are low-priority things. Because you didn't tell me! Please please always always describe the possible userspace-visible effects of all fixes. And all non-fixes, come to that. So. Please go through all of the patches in this series and redo the changelogs, describing the impact each patch might have for our users. If any of these patches should, in your opinion, be included in 6.12 and/or backported into -stable trees then it would be better to separate them into a different patch series. But a single patch series is acceptable - I can figure this stuff out, if I'm given the necessary information! Thanks.
On Mon, Nov 11, 2024 at 01:28:16PM -0800, Andrew Morton wrote: > On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > This series contains some random fixes and cleanups to xarray. Patch 1-3 > > are fixes and patch 4-5 are cleanups. More details can be found in > > respective patches. > > As we're at -rc7 I'm looking to stash low-priority things away for > consideration after 6.13-r1. > > But I don't know if all of these are low-priority things. Because you > didn't tell me! I don't think any of them fix any bugs that any code currently in the kernel would stumble across.
on 11/12/2024 5:28 AM, Andrew Morton wrote: > On Tue, 12 Nov 2024 05:53:54 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > >> This series contains some random fixes and cleanups to xarray. Patch 1-3 >> are fixes and patch 4-5 are cleanups. More details can be found in >> respective patches. > > As we're at -rc7 I'm looking to stash low-priority things away for > consideration after 6.13-r1. > > But I don't know if all of these are low-priority things. Because you > didn't tell me! > > Please please always always describe the possible userspace-visible effects of > all fixes. And all non-fixes, come to that.> > So. Please go through all of the patches in this series and redo the > changelogs, describing the impact each patch might have for our users. Sorry for missing these information, I will add detail of impact to userspace in next version. As there are a lot of users of xarray in kernel and I'm not familar to most of them, I maybe not able to send new series with description of userspace-visible effects in time. I'd like to describe effect to users of xarray in kernel here to give some early inputs. Patch 1 fixes the issue that xas_find_marked() will return a sibling entry to users when there is a race to replace old entry with small range with a new entry with bigger range. The kernel will crash if users use returned sibling entry as a valid entry. For example, find_get_entry() will crash if it gets a sibling entry from xas_find_marked() when trying to search a writeback folios. Patch 2 fixes the issue that xas_split_alloc() doesn't distinguish large entries whose order is == xas->xa_shift + 2 * XA_CHUNK_SHIFT in whice case the splited entries have bigger order than expected. Then afterward xas_store will assign stored entry with bigger range than expected and we will get wrong entry from slot after actual range of stored entry. Not sure if we will split large entry with order == xas->xa_shift + 2 * XA_CHUNK_SHIFT. Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly after xas_load()(may be called in xas_for_each for first entry) loads a large entry with index point at mid of the entry. So we may miss to writeback some dirty page and lost user data. > > If any of these patches should, in your opinion, be included in 6.12 > and/or backported into -stable trees then it would be better to > separate them into a different patch series. But a single patch series > is acceptable - I can figure this stuff out, if I'm given the necessary > information! As impact descripted above, patch 1-3 should be include in 6.12 and backported into -stable trees. But we need to confirm if the trigger condition exists in users(filemap, shmem and so on) of xarray in real world. Will include more information in next version. Thanks. > > Thanks. >
On Tue, Nov 12, 2024 at 03:05:51PM +0800, Kemeng Shi wrote: > Patch 1 fixes the issue that xas_find_marked() will return a sibling entry > to users when there is a race to replace old entry with small range with > a new entry with bigger range. The kernel will crash if users use returned > sibling entry as a valid entry. > For example, find_get_entry() will crash if it gets a sibling entry from > xas_find_marked() when trying to search a writeback folios. You _think_ this can happen, or you've observed a crash where it _has_ happened? I don't think it can happen due to the various locks we have. I'm not opposed to including this patch, but I don't think it can happen today. > Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly > after xas_load()(may be called in xas_for_each for first entry) loads a > large entry with index point at mid of the entry. So we may miss to writeback > some dirty page and lost user data. How do we lose user data? The inode will still contain dirty pages. We might skip some pages we should not have skipped, but they'll be caught by a subsequent pass, no?
on 11/12/2024 11:48 PM, Matthew Wilcox wrote: > On Tue, Nov 12, 2024 at 03:05:51PM +0800, Kemeng Shi wrote: >> Patch 1 fixes the issue that xas_find_marked() will return a sibling entry >> to users when there is a race to replace old entry with small range with >> a new entry with bigger range. The kernel will crash if users use returned >> sibling entry as a valid entry. >> For example, find_get_entry() will crash if it gets a sibling entry from >> xas_find_marked() when trying to search a writeback folios. > > You _think_ this can happen, or you've observed a crash where it _has_ > happened? I don't think it can happen due to the various locks we > have. I'm not opposed to including this patch, but I don't think it > can happen today.Sorry for confusion. As I mentioned at last email, we need to confirm if the trigger condition exists in real world of all three potential bug. Here, I want to describe the potential impact if the potential bug happens for some early input. > >> Patch 3 fixes the issue that xas_pause() may skip some entry unepxectedly >> after xas_load()(may be called in xas_for_each for first entry) loads a >> large entry with index point at mid of the entry. So we may miss to writeback >> some dirty page and lost user data. > > How do we lose user data? The inode will still contain dirty pages. > We might skip some pages we should not have skipped, but they'll be > caught by a subsequent pass, no? > After flush, users may expect dirty pages are in storage and are persistent. But if we miss to writeback some dirty pages and kernel is reboot after flush, then the data in un-written pages are last to users. I'm not sure if I miss something are. But again, we need to confirm if the trigger condition exists in real world. Thanks.