mbox series

[6.6,00/28] fix CVE-2024-46701

Message ID 20241024132009.2267260-1-yukuai1@huaweicloud.com (mailing list archive)
Headers show
Series fix CVE-2024-46701 | expand

Message

Yu Kuai Oct. 24, 2024, 1:19 p.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Fix patch is patch 27, relied patches are from:

 - patches from set [1] to add helpers to maple_tree, the last patch to
improve fork() performance is not backported;
 - patches from set [2] to change maple_tree, and follow up fixes;
 - patches from set [3] to convert offset_ctx from xarray to maple_tree;

Please notice that I'm not an expert in this area, and I'm afraid to
make manual changes. That's why patch 16 revert the commit that is
different from mainline and will cause conflict backporting new patches.
patch 28 pick the original mainline patch again.

(And this is what we did to fix the CVE in downstream kernels).

[1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
[2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
[3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/

Andrew Morton (1):
  lib/maple_tree.c: fix build error due to hotfix alteration

Chuck Lever (5):
  libfs: Re-arrange locking in offset_iterate_dir()
  libfs: Define a minimum directory offset
  libfs: Add simple_offset_empty()
  maple_tree: Add mtree_alloc_cyclic()
  libfs: Convert simple directory offsets to use a Maple Tree

Liam R. Howlett (12):
  maple_tree: remove unnecessary default labels from switch statements
  maple_tree: make mas_erase() more robust
  maple_tree: move debug check to __mas_set_range()
  maple_tree: add end of node tracking to the maple state
  maple_tree: use cached node end in mas_next()
  maple_tree: use cached node end in mas_destroy()
  maple_tree: clean up inlines for some functions
  maple_tree: separate ma_state node from status
  maple_tree: remove mas_searchable()
  maple_tree: use maple state end for write operations
  maple_tree: don't find node end in mtree_lookup_walk()
  maple_tree: mtree_range_walk() clean up

Lorenzo Stoakes (1):
  maple_tree: correct tree corruption on spanning store

Peng Zhang (7):
  maple_tree: add mt_free_one() and mt_attr() helpers
  maple_tree: introduce {mtree,mas}_lock_nested()
  maple_tree: introduce interfaces __mt_dup() and mtree_dup()
  maple_tree: skip other tests when BENCH is enabled
  maple_tree: preserve the tree attributes when destroying maple tree
  maple_tree: add test for mtree_dup()
  maple_tree: avoid checking other gaps after getting the largest gap

Yu Kuai (1):
  Revert "maple_tree: correct tree corruption on spanning store"

yangerkun (1):
  libfs: fix infinite directory reads for offset dir

 fs/libfs.c                                  |  129 ++-
 include/linux/fs.h                          |    6 +-
 include/linux/maple_tree.h                  |  356 +++---
 include/linux/mm_types.h                    |    3 +-
 lib/maple_tree.c                            | 1096 +++++++++++++------
 lib/test_maple_tree.c                       |  218 ++--
 mm/internal.h                               |   10 +-
 mm/shmem.c                                  |    4 +-
 tools/include/linux/spinlock.h              |    1 +
 tools/testing/radix-tree/linux/maple_tree.h |    2 +-
 tools/testing/radix-tree/maple.c            |  390 ++++++-
 11 files changed, 1564 insertions(+), 651 deletions(-)

Comments

Greg KH Nov. 6, 2024, 6:16 a.m. UTC | #1
On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Fix patch is patch 27, relied patches are from:
> 
>  - patches from set [1] to add helpers to maple_tree, the last patch to
> improve fork() performance is not backported;

So things slowed down?

>  - patches from set [2] to change maple_tree, and follow up fixes;
>  - patches from set [3] to convert offset_ctx from xarray to maple_tree;
> 
> Please notice that I'm not an expert in this area, and I'm afraid to
> make manual changes. That's why patch 16 revert the commit that is
> different from mainline and will cause conflict backporting new patches.
> patch 28 pick the original mainline patch again.
> 
> (And this is what we did to fix the CVE in downstream kernels).
> 
> [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
> [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
> [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/

This series looks rough.  I want to have the maintainers of these
files/subsystems to ack this before being able to take them.

thanks,

greg k-h
Lorenzo Stoakes Nov. 6, 2024, 2:43 p.m. UTC | #2
NACK.

Do this some other way that isn't a terrible mess.

You've reverted my CRITICAL fix, then didn't cc- me so I'm grumpy.

Even if you bizarrely brought it back later.

Don't fail to cc- people you revert in future, please, especially in
stable. It's not only discourteous it's also an actual security risk.

Thanks.

Also this commit log is ridiculous, you don't even explain WHAT ON EARTH
YOU ARE DOING HERE. It's not just good enough to reference a CVE and expect
us to go research this for you, especially one you've 'addressed' in this
totally bizarre fashion.

On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Fix patch is patch 27, relied patches are from:
>
>  - patches from set [1] to add helpers to maple_tree, the last patch to
> improve fork() performance is not backported;
>  - patches from set [2] to change maple_tree, and follow up fixes;
>  - patches from set [3] to convert offset_ctx from xarray to maple_tree;
>
> Please notice that I'm not an expert in this area, and I'm afraid to
> make manual changes. That's why patch 16 revert the commit that is
> different from mainline and will cause conflict backporting new patches.
> patch 28 pick the original mainline patch again.

This is... what? :/

You have to fix conflicts, that's part of what backporting involves.

Yeah, rethink your whole approach, thanks.

>
> (And this is what we did to fix the CVE in downstream kernels).
>
> [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
> [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
> [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
>
> Andrew Morton (1):
>   lib/maple_tree.c: fix build error due to hotfix alteration
>
> Chuck Lever (5):
>   libfs: Re-arrange locking in offset_iterate_dir()
>   libfs: Define a minimum directory offset
>   libfs: Add simple_offset_empty()
>   maple_tree: Add mtree_alloc_cyclic()
>   libfs: Convert simple directory offsets to use a Maple Tree
>
> Liam R. Howlett (12):
>   maple_tree: remove unnecessary default labels from switch statements
>   maple_tree: make mas_erase() more robust
>   maple_tree: move debug check to __mas_set_range()
>   maple_tree: add end of node tracking to the maple state
>   maple_tree: use cached node end in mas_next()
>   maple_tree: use cached node end in mas_destroy()
>   maple_tree: clean up inlines for some functions
>   maple_tree: separate ma_state node from status
>   maple_tree: remove mas_searchable()
>   maple_tree: use maple state end for write operations
>   maple_tree: don't find node end in mtree_lookup_walk()
>   maple_tree: mtree_range_walk() clean up
>
> Lorenzo Stoakes (1):
>   maple_tree: correct tree corruption on spanning store
>
> Peng Zhang (7):
>   maple_tree: add mt_free_one() and mt_attr() helpers
>   maple_tree: introduce {mtree,mas}_lock_nested()
>   maple_tree: introduce interfaces __mt_dup() and mtree_dup()
>   maple_tree: skip other tests when BENCH is enabled
>   maple_tree: preserve the tree attributes when destroying maple tree
>   maple_tree: add test for mtree_dup()
>   maple_tree: avoid checking other gaps after getting the largest gap
>
> Yu Kuai (1):
>   Revert "maple_tree: correct tree corruption on spanning store"
>
> yangerkun (1):
>   libfs: fix infinite directory reads for offset dir
>
>  fs/libfs.c                                  |  129 ++-
>  include/linux/fs.h                          |    6 +-
>  include/linux/maple_tree.h                  |  356 +++---
>  include/linux/mm_types.h                    |    3 +-
>  lib/maple_tree.c                            | 1096 +++++++++++++------
>  lib/test_maple_tree.c                       |  218 ++--
>  mm/internal.h                               |   10 +-
>  mm/shmem.c                                  |    4 +-
>  tools/include/linux/spinlock.h              |    1 +
>  tools/testing/radix-tree/linux/maple_tree.h |    2 +-
>  tools/testing/radix-tree/maple.c            |  390 ++++++-
>  11 files changed, 1564 insertions(+), 651 deletions(-)
>
> --
> 2.39.2
>
Liam R. Howlett Nov. 6, 2024, 2:44 p.m. UTC | #3
* Greg KH <gregkh@linuxfoundation.org> [241106 01:16]:
> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> > From: Yu Kuai <yukuai3@huawei.com>
> > 
> > Fix patch is patch 27, relied patches are from:
> > 
> >  - patches from set [1] to add helpers to maple_tree, the last patch to
> > improve fork() performance is not backported;
> 
> So things slowed down?

Fork got faster in modern kernels.  The backport contains helpers as
they are dependencies for later patches.

> 
> >  - patches from set [2] to change maple_tree, and follow up fixes;
> >  - patches from set [3] to convert offset_ctx from xarray to maple_tree;
> > 
> > Please notice that I'm not an expert in this area, and I'm afraid to
> > make manual changes. That's why patch 16 revert the commit that is
> > different from mainline and will cause conflict backporting new patches.
> > patch 28 pick the original mainline patch again.

You reverted and forward ported a patch but didn't Cc the author of the
patch you changed.  That is probably one of the most important Cc's to
have on this list.

By the way, that fix is already in 6.6

> > 
> > (And this is what we did to fix the CVE in downstream kernels).
> > 
> > [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
> > [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
> > [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
> 
> This series looks rough.  I want to have the maintainers of these
> files/subsystems to ack this before being able to take them.

The entire backporting of all of this to fix an issue is extreme, and
although it will solve the issue, you end up running something very
different than 6.6 for a single fix.

Looking at the details of the cve, it seems very odd.  This is an issue
in libfs and the affected kernel is 6.6 to 6.10.7.  It then goes into
details of how the maple tree allows this - but 6.6 doesn't use the
maple tree in libfs so either the patch needs to be backported to an
older stable (6.6) or the CVE is wrong.

Almost all of these patches are to backport using the maple tree in
libfs and that should not be done.

I don't know if the CVE is incorrectly labeled or if the patch wasn't
backported far enough because I was not involved in the discussion of
this CVE - which seems like an oversight if this is specifically caused
by the maple tree?

The patch in question is 64a7ce76fb90 ("libfs: fix infinite directory
reads for offset dir").  I think we just need the one?

To be clear:
 - Do not take this serioes
 - Someone in libfs land should respond stating if the fix above needs
   to be backported.

Thanks,
Liam
Chuck Lever III Nov. 6, 2024, 3:19 p.m. UTC | #4
> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> 
> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>> 
>> Fix patch is patch 27, relied patches are from:

I assume patch 27 is:

libfs: fix infinite directory reads for offset dir

https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/

I don't think the Maple tree patches are a hard
requirement for this fix. And note that libfs did
not use Maple tree originally because I was told
at that time that Maple tree was not yet mature.

So, a better approach might be to fit the fix
onto linux-6.6.y while sticking with xarray.

This is the first I've heard of this CVE. It
would help if the patch authors got some
notification when these are filed.


>> - patches from set [1] to add helpers to maple_tree, the last patch to
>> improve fork() performance is not backported;
> 
> So things slowed down?
> 
>> - patches from set [2] to change maple_tree, and follow up fixes;
>> - patches from set [3] to convert offset_ctx from xarray to maple_tree;
>> 
>> Please notice that I'm not an expert in this area, and I'm afraid to
>> make manual changes. That's why patch 16 revert the commit that is
>> different from mainline and will cause conflict backporting new patches.
>> patch 28 pick the original mainline patch again.
>> 
>> (And this is what we did to fix the CVE in downstream kernels).
>> 
>> [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
>> [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
>> [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
> 
> This series looks rough.  I want to have the maintainers of these
> files/subsystems to ack this before being able to take them.
> 
> thanks,
> 
> greg k-h

--
Chuck Lever
James Bottomley Nov. 6, 2024, 4:21 p.m. UTC | #5
On Wed, 2024-11-06 at 15:19 +0000, Chuck Lever III wrote:
> This is the first I've heard of this CVE. It
> would help if the patch authors got some
> notification when these are filed.

Greg did it; it came from the kernel CNA:

https://www.cve.org/CVERecord?id=CVE-2024-46701

The way it seems to work is that this is simply a wrapper for the
upstream commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a

Which is what appears as the last stable reference.  I assume someone
investigated and added the vulnerable kernel details.  I think the
theory is that since you reviewed the original upstream patch, stable
just takes care of the backports and CVE management of the existing fix
through the normal stable process.

James
Yu Kuai Nov. 7, 2024, 12:57 a.m. UTC | #6
Hi,

在 2024/11/06 23:19, Chuck Lever III 写道:
> 
> 
>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Fix patch is patch 27, relied patches are from:
> 
> I assume patch 27 is:
> 
> libfs: fix infinite directory reads for offset dir
> 
> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
> 
> I don't think the Maple tree patches are a hard
> requirement for this fix. And note that libfs did
> not use Maple tree originally because I was told
> at that time that Maple tree was not yet mature.
> 
> So, a better approach might be to fit the fix
> onto linux-6.6.y while sticking with xarray.

The painful part is that using xarray is not acceptable, the offet
is just 32 bit and if it overflows, readdir will read nothing. That's
why maple_tree has to be used.

Thanks,
Kuai

> 
> This is the first I've heard of this CVE. It
> would help if the patch authors got some
> notification when these are filed.
> 
> 
>>> - patches from set [1] to add helpers to maple_tree, the last patch to
>>> improve fork() performance is not backported;
>>
>> So things slowed down?
>>
>>> - patches from set [2] to change maple_tree, and follow up fixes;
>>> - patches from set [3] to convert offset_ctx from xarray to maple_tree;
>>>
>>> Please notice that I'm not an expert in this area, and I'm afraid to
>>> make manual changes. That's why patch 16 revert the commit that is
>>> different from mainline and will cause conflict backporting new patches.
>>> patch 28 pick the original mainline patch again.
>>>
>>> (And this is what we did to fix the CVE in downstream kernels).
>>>
>>> [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
>>> [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
>>> [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
>>
>> This series looks rough.  I want to have the maintainers of these
>> files/subsystems to ack this before being able to take them.
>>
>> thanks,
>>
>> greg k-h
> 
> --
> Chuck Lever
> 
>
Yu Kuai Nov. 7, 2024, 1:43 a.m. UTC | #7
Hi,

在 2024/11/06 22:43, Lorenzo Stoakes 写道:
> NACK.
> 
> Do this some other way that isn't a terrible mess.
> 
> You've reverted my CRITICAL fix, then didn't cc- me so I'm grumpy.
> 
> Even if you bizarrely brought it back later.
> 
> Don't fail to cc- people you revert in future, please, especially in
> stable. It's not only discourteous it's also an actual security risk.

ok, that's my fault.
> 
> Thanks.
> 
> Also this commit log is ridiculous, you don't even explain WHAT ON EARTH
> YOU ARE DOING HERE. It's not just good enough to reference a CVE and expect
> us to go research this for you, especially one you've 'addressed' in this
> totally bizarre fashion.
> 
> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Fix patch is patch 27, relied patches are from:
>>
>>   - patches from set [1] to add helpers to maple_tree, the last patch to
>> improve fork() performance is not backported;
>>   - patches from set [2] to change maple_tree, and follow up fixes;
>>   - patches from set [3] to convert offset_ctx from xarray to maple_tree;
>>
>> Please notice that I'm not an expert in this area, and I'm afraid to
>> make manual changes. That's why patch 16 revert the commit that is
>> different from mainline and will cause conflict backporting new patches.
>> patch 28 pick the original mainline patch again.
> 
> This is... what? :/
> 
> You have to fix conflicts, that's part of what backporting involves.

So, that's the best I can do in this area. I agree that this is
unacceptable now. So I'll just ignore this cve for v6.6, unless
some expert in this area will try to fix conflicts for patch 27 in
a better way.

Thanks,
Kuai

> 
> Yeah, rethink your whole approach, thanks.
> 
>>
>> (And this is what we did to fix the CVE in downstream kernels).
>>
>> [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
>> [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
>> [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
>>
>> Andrew Morton (1):
>>    lib/maple_tree.c: fix build error due to hotfix alteration
>>
>> Chuck Lever (5):
>>    libfs: Re-arrange locking in offset_iterate_dir()
>>    libfs: Define a minimum directory offset
>>    libfs: Add simple_offset_empty()
>>    maple_tree: Add mtree_alloc_cyclic()
>>    libfs: Convert simple directory offsets to use a Maple Tree
>>
>> Liam R. Howlett (12):
>>    maple_tree: remove unnecessary default labels from switch statements
>>    maple_tree: make mas_erase() more robust
>>    maple_tree: move debug check to __mas_set_range()
>>    maple_tree: add end of node tracking to the maple state
>>    maple_tree: use cached node end in mas_next()
>>    maple_tree: use cached node end in mas_destroy()
>>    maple_tree: clean up inlines for some functions
>>    maple_tree: separate ma_state node from status
>>    maple_tree: remove mas_searchable()
>>    maple_tree: use maple state end for write operations
>>    maple_tree: don't find node end in mtree_lookup_walk()
>>    maple_tree: mtree_range_walk() clean up
>>
>> Lorenzo Stoakes (1):
>>    maple_tree: correct tree corruption on spanning store
>>
>> Peng Zhang (7):
>>    maple_tree: add mt_free_one() and mt_attr() helpers
>>    maple_tree: introduce {mtree,mas}_lock_nested()
>>    maple_tree: introduce interfaces __mt_dup() and mtree_dup()
>>    maple_tree: skip other tests when BENCH is enabled
>>    maple_tree: preserve the tree attributes when destroying maple tree
>>    maple_tree: add test for mtree_dup()
>>    maple_tree: avoid checking other gaps after getting the largest gap
>>
>> Yu Kuai (1):
>>    Revert "maple_tree: correct tree corruption on spanning store"
>>
>> yangerkun (1):
>>    libfs: fix infinite directory reads for offset dir
>>
>>   fs/libfs.c                                  |  129 ++-
>>   include/linux/fs.h                          |    6 +-
>>   include/linux/maple_tree.h                  |  356 +++---
>>   include/linux/mm_types.h                    |    3 +-
>>   lib/maple_tree.c                            | 1096 +++++++++++++------
>>   lib/test_maple_tree.c                       |  218 ++--
>>   mm/internal.h                               |   10 +-
>>   mm/shmem.c                                  |    4 +-
>>   tools/include/linux/spinlock.h              |    1 +
>>   tools/testing/radix-tree/linux/maple_tree.h |    2 +-
>>   tools/testing/radix-tree/maple.c            |  390 ++++++-
>>   11 files changed, 1564 insertions(+), 651 deletions(-)
>>
>> --
>> 2.39.2
>>
> 
> .
>
Chuck Lever III Nov. 7, 2024, 2:41 p.m. UTC | #8
On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2024/11/06 23:19, Chuck Lever III 写道:
> > 
> > 
> > > On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > 
> > > On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> > > > From: Yu Kuai <yukuai3@huawei.com>
> > > > 
> > > > Fix patch is patch 27, relied patches are from:
> > 
> > I assume patch 27 is:
> > 
> > libfs: fix infinite directory reads for offset dir
> > 
> > https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
> > 
> > I don't think the Maple tree patches are a hard
> > requirement for this fix. And note that libfs did
> > not use Maple tree originally because I was told
> > at that time that Maple tree was not yet mature.
> > 
> > So, a better approach might be to fit the fix
> > onto linux-6.6.y while sticking with xarray.
> 
> The painful part is that using xarray is not acceptable, the offet
> is just 32 bit and if it overflows, readdir will read nothing. That's
> why maple_tree has to be used.

A 32-bit range should be entirely adequate for this usage.

 - The offset allocator wraps when it reaches the maximum, it
   doesn't overflow unless there are actually billions of extant
   entries in the directory, which IMO is not likely.

 - The offset values are dense, so the directory can use all 2- or
   4- billion in the 32-bit integer range before wrapping.

 - No-one complained about this limitation when offset_readdir() was
   first merged. The xarray was replaced for performance reasons,
   not because of the 32-bit range limit.

It is always possible that I have misunderstood your concern!


> Thanks,
> Kuai
> 
> > 
> > This is the first I've heard of this CVE. It
> > would help if the patch authors got some
> > notification when these are filed.
> > 
> > 
> > > > - patches from set [1] to add helpers to maple_tree, the last patch to
> > > > improve fork() performance is not backported;
> > > 
> > > So things slowed down?
> > > 
> > > > - patches from set [2] to change maple_tree, and follow up fixes;
> > > > - patches from set [3] to convert offset_ctx from xarray to maple_tree;
> > > > 
> > > > Please notice that I'm not an expert in this area, and I'm afraid to
> > > > make manual changes. That's why patch 16 revert the commit that is
> > > > different from mainline and will cause conflict backporting new patches.
> > > > patch 28 pick the original mainline patch again.
> > > > 
> > > > (And this is what we did to fix the CVE in downstream kernels).
> > > > 
> > > > [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
> > > > [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
> > > > [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
> > > 
> > > This series looks rough.  I want to have the maintainers of these
> > > files/subsystems to ack this before being able to take them.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > --
> > Chuck Lever
> > 
> > 
>
Liam R. Howlett Nov. 7, 2024, 2:44 p.m. UTC | #9
* Yu Kuai <yukuai1@huaweicloud.com> [241106 19:57]:
> Hi,
> 
> 在 2024/11/06 23:19, Chuck Lever III 写道:
> > 
> > 
> > > On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > 
> > > On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> > > > From: Yu Kuai <yukuai3@huawei.com>
> > > > 
> > > > Fix patch is patch 27, relied patches are from:
> > 
> > I assume patch 27 is:
> > 
> > libfs: fix infinite directory reads for offset dir
> > 
> > https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
> > 
> > I don't think the Maple tree patches are a hard
> > requirement for this fix. And note that libfs did
> > not use Maple tree originally because I was told
> > at that time that Maple tree was not yet mature.
> > 
> > So, a better approach might be to fit the fix
> > onto linux-6.6.y while sticking with xarray.
> 
> The painful part is that using xarray is not acceptable, the offet
> is just 32 bit and if it overflows, readdir will read nothing. That's
> why maple_tree has to be used.

Why does the xarray cause it to overflow vs the maple tree? The maple
tree conversion was for performance reasons, as far as I know [1].

Thanks,
Liam

[1]. https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/

> 
> Thanks,
> Kuai
> 
> > 
> > This is the first I've heard of this CVE. It
> > would help if the patch authors got some
> > notification when these are filed.
> > 
> > 
> > > > - patches from set [1] to add helpers to maple_tree, the last patch to
> > > > improve fork() performance is not backported;
> > > 
> > > So things slowed down?
> > > 
> > > > - patches from set [2] to change maple_tree, and follow up fixes;
> > > > - patches from set [3] to convert offset_ctx from xarray to maple_tree;
> > > > 
> > > > Please notice that I'm not an expert in this area, and I'm afraid to
> > > > make manual changes. That's why patch 16 revert the commit that is
> > > > different from mainline and will cause conflict backporting new patches.
> > > > patch 28 pick the original mainline patch again.
> > > > 
> > > > (And this is what we did to fix the CVE in downstream kernels).
> > > > 
> > > > [1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.com/
> > > > [2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.com/T/
> > > > [3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net/
> > > 
> > > This series looks rough.  I want to have the maintainers of these
> > > files/subsystems to ack this before being able to take them.
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > --
> > Chuck Lever
> > 
> > 
>
Yu Kuai Nov. 8, 2024, 1:19 a.m. UTC | #10
Hi,

在 2024/11/07 22:41, Chuck Lever 写道:
> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>
>>>
>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>
>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> Fix patch is patch 27, relied patches are from:
>>>
>>> I assume patch 27 is:
>>>
>>> libfs: fix infinite directory reads for offset dir
>>>
>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>
>>> I don't think the Maple tree patches are a hard
>>> requirement for this fix. And note that libfs did
>>> not use Maple tree originally because I was told
>>> at that time that Maple tree was not yet mature.
>>>
>>> So, a better approach might be to fit the fix
>>> onto linux-6.6.y while sticking with xarray.
>>
>> The painful part is that using xarray is not acceptable, the offet
>> is just 32 bit and if it overflows, readdir will read nothing. That's
>> why maple_tree has to be used.
> 
> A 32-bit range should be entirely adequate for this usage.
> 
>   - The offset allocator wraps when it reaches the maximum, it
>     doesn't overflow unless there are actually billions of extant
>     entries in the directory, which IMO is not likely.

Yes, it's not likely, but it's possible, and not hard to trigger for
test. And please notice that the offset will increase for each new file,
and file can be removed, while offset stays the same.
> 
>   - The offset values are dense, so the directory can use all 2- or
>     4- billion in the 32-bit integer range before wrapping.

A simple math, if user create and remove 1 file in each seconds, it will
cost about 130 years to overflow. And if user create and remove 1000
files in each second, it will cost about 1 month to overflow.

maple tree use 64 bit value for the offset, which is impossible to
overflow for the rest of our lifes.
> 
>   - No-one complained about this limitation when offset_readdir() was
>     first merged. The xarray was replaced for performance reasons,
>     not because of the 32-bit range limit.
> 
> It is always possible that I have misunderstood your concern!

The problem is that if the next_offset overflows to 0, then after patch
27, offset_dir_open() will record the 0, and later offset_readdir will
return directly, while there can be many files.

Thanks,
Kuai
Chuck Lever III Nov. 8, 2024, 1:23 p.m. UTC | #11
> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> 
> Hi,
> 
> 在 2024/11/07 22:41, Chuck Lever 写道:
>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>> Hi,
>>> 
>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>> 
>>>> 
>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>> 
>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>> 
>>>>>> Fix patch is patch 27, relied patches are from:
>>>> 
>>>> I assume patch 27 is:
>>>> 
>>>> libfs: fix infinite directory reads for offset dir
>>>> 
>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>> 
>>>> I don't think the Maple tree patches are a hard
>>>> requirement for this fix. And note that libfs did
>>>> not use Maple tree originally because I was told
>>>> at that time that Maple tree was not yet mature.
>>>> 
>>>> So, a better approach might be to fit the fix
>>>> onto linux-6.6.y while sticking with xarray.
>>> 
>>> The painful part is that using xarray is not acceptable, the offet
>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>> why maple_tree has to be used.
>> A 32-bit range should be entirely adequate for this usage.
>>  - The offset allocator wraps when it reaches the maximum, it
>>    doesn't overflow unless there are actually billions of extant
>>    entries in the directory, which IMO is not likely.
> 
> Yes, it's not likely, but it's possible, and not hard to trigger for
> test.

I question whether such a test reflects any real-world
workload.

Besides, there are a number of other limits that will impact
the ability to create that many entries in one directory.
The number of inodes in one tmpfs instance is limited, for
instance.


> And please notice that the offset will increase for each new file,
> and file can be removed, while offset stays the same.
>>  - The offset values are dense, so the directory can use all 2- or
>>    4- billion in the 32-bit integer range before wrapping.
> 
> A simple math, if user create and remove 1 file in each seconds, it will
> cost about 130 years to overflow. And if user create and remove 1000
> files in each second, it will cost about 1 month to overflow.

The question is what happens when there are no more offset
values available. xa_alloc_cyclic should fail, and file
creation is supposed to fail at that point. If it doesn't,
that's a bug that is outside of the use of xarray or Maple.


> maple tree use 64 bit value for the offset, which is impossible to
> overflow for the rest of our lifes.
>>  - No-one complained about this limitation when offset_readdir() was
>>    first merged. The xarray was replaced for performance reasons,
>>    not because of the 32-bit range limit.
>> It is always possible that I have misunderstood your concern!
> 
> The problem is that if the next_offset overflows to 0, then after patch
> 27, offset_dir_open() will record the 0, and later offset_readdir will
> return directly, while there can be many files.

That's a separate bug that has nothing to do with the maximum
number of entries one directory can have. Again, you don't
need Maple tree to address that.

My understanding from Liam is that backporting Maple into
v6.6 is just not practical to do. We must explore alternate
ways to address these concerns.


--
Chuck Lever
Liam R. Howlett Nov. 8, 2024, 5:03 p.m. UTC | #12
* Chuck Lever III <chuck.lever@oracle.com> [241108 08:23]:
> 
> 
> > On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > 
> > Hi,
> > 
> > 在 2024/11/07 22:41, Chuck Lever 写道:
> >> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
> >>> Hi,
> >>> 
> >>> 在 2024/11/06 23:19, Chuck Lever III 写道:
> >>>> 
> >>>> 
> >>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >>>>> 
> >>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
> >>>>>> From: Yu Kuai <yukuai3@huawei.com>
> >>>>>> 
> >>>>>> Fix patch is patch 27, relied patches are from:
> >>>> 
> >>>> I assume patch 27 is:
> >>>> 
> >>>> libfs: fix infinite directory reads for offset dir
> >>>> 
> >>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
> >>>> 
> >>>> I don't think the Maple tree patches are a hard
> >>>> requirement for this fix. And note that libfs did
> >>>> not use Maple tree originally because I was told
> >>>> at that time that Maple tree was not yet mature.
> >>>> 
> >>>> So, a better approach might be to fit the fix
> >>>> onto linux-6.6.y while sticking with xarray.
> >>> 
> >>> The painful part is that using xarray is not acceptable, the offet
> >>> is just 32 bit and if it overflows, readdir will read nothing. That's
> >>> why maple_tree has to be used.
> >> A 32-bit range should be entirely adequate for this usage.
> >>  - The offset allocator wraps when it reaches the maximum, it
> >>    doesn't overflow unless there are actually billions of extant
> >>    entries in the directory, which IMO is not likely.
> > 
> > Yes, it's not likely, but it's possible, and not hard to trigger for
> > test.
> 
> I question whether such a test reflects any real-world
> workload.
> 
> Besides, there are a number of other limits that will impact
> the ability to create that many entries in one directory.
> The number of inodes in one tmpfs instance is limited, for
> instance.
> 
> 
> > And please notice that the offset will increase for each new file,
> > and file can be removed, while offset stays the same.
> >>  - The offset values are dense, so the directory can use all 2- or
> >>    4- billion in the 32-bit integer range before wrapping.
> > 
> > A simple math, if user create and remove 1 file in each seconds, it will
> > cost about 130 years to overflow. And if user create and remove 1000
> > files in each second, it will cost about 1 month to overflow.
> 
> The question is what happens when there are no more offset
> values available. xa_alloc_cyclic should fail, and file
> creation is supposed to fail at that point. If it doesn't,
> that's a bug that is outside of the use of xarray or Maple.
> 
> 
> > maple tree use 64 bit value for the offset, which is impossible to
> > overflow for the rest of our lifes.
> >>  - No-one complained about this limitation when offset_readdir() was
> >>    first merged. The xarray was replaced for performance reasons,
> >>    not because of the 32-bit range limit.
> >> It is always possible that I have misunderstood your concern!
> > 
> > The problem is that if the next_offset overflows to 0, then after patch
> > 27, offset_dir_open() will record the 0, and later offset_readdir will
> > return directly, while there can be many files.
> 
> That's a separate bug that has nothing to do with the maximum
> number of entries one directory can have. Again, you don't
> need Maple tree to address that.
> 
> My understanding from Liam is that backporting Maple into
> v6.6 is just not practical to do. We must explore alternate
> ways to address these concerns.
> 

The tree itself is in v6.6, but the evolution of the tree to fit the
needs of this and other subsystems isn't something that would be well
tested.  This is really backporting features and that's not the point of
stable.

I think this is what Lorenzo was saying about changing your approach, we
can't backport 28 patches to fix this when it isn't needed.

Thanks,
Liam
Yu Kuai Nov. 9, 2024, 1:30 a.m. UTC | #13
Hi,

在 2024/11/08 21:23, Chuck Lever III 写道:
> 
> 
>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/11/07 22:41, Chuck Lever 写道:
>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>>>
>>>>>
>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>
>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>
>>>>>>> Fix patch is patch 27, relied patches are from:
>>>>>
>>>>> I assume patch 27 is:
>>>>>
>>>>> libfs: fix infinite directory reads for offset dir
>>>>>
>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>>>
>>>>> I don't think the Maple tree patches are a hard
>>>>> requirement for this fix. And note that libfs did
>>>>> not use Maple tree originally because I was told
>>>>> at that time that Maple tree was not yet mature.
>>>>>
>>>>> So, a better approach might be to fit the fix
>>>>> onto linux-6.6.y while sticking with xarray.
>>>>
>>>> The painful part is that using xarray is not acceptable, the offet
>>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>>> why maple_tree has to be used.
>>> A 32-bit range should be entirely adequate for this usage.
>>>   - The offset allocator wraps when it reaches the maximum, it
>>>     doesn't overflow unless there are actually billions of extant
>>>     entries in the directory, which IMO is not likely.
>>
>> Yes, it's not likely, but it's possible, and not hard to trigger for
>> test.
> 
> I question whether such a test reflects any real-world
> workload.
> 
> Besides, there are a number of other limits that will impact
> the ability to create that many entries in one directory.
> The number of inodes in one tmpfs instance is limited, for
> instance.
> 
> 
>> And please notice that the offset will increase for each new file,
>> and file can be removed, while offset stays the same.

Did you see the above explanation? files can be removed, you don't have
to store that much files to triggger the offset to overflow.

>>>   - The offset values are dense, so the directory can use all 2- or
>>>     4- billion in the 32-bit integer range before wrapping.
>>
>> A simple math, if user create and remove 1 file in each seconds, it will
>> cost about 130 years to overflow. And if user create and remove 1000
>> files in each second, it will cost about 1 month to overflow.
> 
> The question is what happens when there are no more offset
> values available. xa_alloc_cyclic should fail, and file
> creation is supposed to fail at that point. If it doesn't,
> that's a bug that is outside of the use of xarray or Maple.

Can you show me the code that xa_alloc_cyclic should fail? At least
according to the commets, it will return 1 if the allocation succeeded
after wrapping.

  * Context: Any context.  Takes and releases the xa_lock.  May sleep if
  * the @gfp flags permit.
  * Return: 0 if the allocation succeeded without wrapping.  1 if the
  * allocation succeeded after wrapping, -ENOMEM if memory could not be
  * allocated or -EBUSY if there are no free entries in @limit.
  */
static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry,
		struct xa_limit limit, u32 *next, gfp_t gfp)
> 
> 
>> maple tree use 64 bit value for the offset, which is impossible to
>> overflow for the rest of our lifes.
>>>   - No-one complained about this limitation when offset_readdir() was
>>>     first merged. The xarray was replaced for performance reasons,
>>>     not because of the 32-bit range limit.
>>> It is always possible that I have misunderstood your concern!
>>
>> The problem is that if the next_offset overflows to 0, then after patch
>> 27, offset_dir_open() will record the 0, and later offset_readdir will
>> return directly, while there can be many files.
> 
> That's a separate bug that has nothing to do with the maximum
> number of entries one directory can have. Again, you don't
> need Maple tree to address that.
> 
> My understanding from Liam is that backporting Maple into
> v6.6 is just not practical to do. We must explore alternate
> ways to address these concerns.

Like I said, I'll just give up for this cve for v6.6.
> 
> 
> --
> Chuck Lever
> 
>
Yu Kuai Nov. 9, 2024, 1:38 a.m. UTC | #14
Hi,

在 2024/11/09 1:03, Liam R. Howlett 写道:
> * Chuck Lever III <chuck.lever@oracle.com> [241108 08:23]:
>>
>>
>>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>
>>> Hi,
>>>
>>> 在 2024/11/07 22:41, Chuck Lever 写道:
>>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>>>> Hi,
>>>>>
>>>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>>>>
>>>>>>
>>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>>
>>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>>
>>>>>>>> Fix patch is patch 27, relied patches are from:
>>>>>>
>>>>>> I assume patch 27 is:
>>>>>>
>>>>>> libfs: fix infinite directory reads for offset dir
>>>>>>
>>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>>>>
>>>>>> I don't think the Maple tree patches are a hard
>>>>>> requirement for this fix. And note that libfs did
>>>>>> not use Maple tree originally because I was told
>>>>>> at that time that Maple tree was not yet mature.
>>>>>>
>>>>>> So, a better approach might be to fit the fix
>>>>>> onto linux-6.6.y while sticking with xarray.
>>>>>
>>>>> The painful part is that using xarray is not acceptable, the offet
>>>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>>>> why maple_tree has to be used.
>>>> A 32-bit range should be entirely adequate for this usage.
>>>>   - The offset allocator wraps when it reaches the maximum, it
>>>>     doesn't overflow unless there are actually billions of extant
>>>>     entries in the directory, which IMO is not likely.
>>>
>>> Yes, it's not likely, but it's possible, and not hard to trigger for
>>> test.
>>
>> I question whether such a test reflects any real-world
>> workload.
>>
>> Besides, there are a number of other limits that will impact
>> the ability to create that many entries in one directory.
>> The number of inodes in one tmpfs instance is limited, for
>> instance.
>>
>>
>>> And please notice that the offset will increase for each new file,
>>> and file can be removed, while offset stays the same.
>>>>   - The offset values are dense, so the directory can use all 2- or
>>>>     4- billion in the 32-bit integer range before wrapping.
>>>
>>> A simple math, if user create and remove 1 file in each seconds, it will
>>> cost about 130 years to overflow. And if user create and remove 1000
>>> files in each second, it will cost about 1 month to overflow.
>>
>> The question is what happens when there are no more offset
>> values available. xa_alloc_cyclic should fail, and file
>> creation is supposed to fail at that point. If it doesn't,
>> that's a bug that is outside of the use of xarray or Maple.
>>
>>
>>> maple tree use 64 bit value for the offset, which is impossible to
>>> overflow for the rest of our lifes.
>>>>   - No-one complained about this limitation when offset_readdir() was
>>>>     first merged. The xarray was replaced for performance reasons,
>>>>     not because of the 32-bit range limit.
>>>> It is always possible that I have misunderstood your concern!
>>>
>>> The problem is that if the next_offset overflows to 0, then after patch
>>> 27, offset_dir_open() will record the 0, and later offset_readdir will
>>> return directly, while there can be many files.
>>
>> That's a separate bug that has nothing to do with the maximum
>> number of entries one directory can have. Again, you don't
>> need Maple tree to address that.
>>
>> My understanding from Liam is that backporting Maple into
>> v6.6 is just not practical to do. We must explore alternate
>> ways to address these concerns.
>>
> 
> The tree itself is in v6.6, but the evolution of the tree to fit the
> needs of this and other subsystems isn't something that would be well
> tested.  This is really backporting features and that's not the point of
> stable.

Of course.
> 
> I think this is what Lorenzo was saying about changing your approach, we
> can't backport 28 patches to fix this when it isn't needed.

I don't have other approach now, so I'll not follow on fixing this cve.
I'll be great if someone has a beeter apporch. :)

Thanks,
Kuai

> 
> Thanks,
> Liam
> 
> .
>
Chuck Lever III Nov. 9, 2024, 4:58 p.m. UTC | #15
> On Nov 8, 2024, at 8:30 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> 
> Hi,
> 
> 在 2024/11/08 21:23, Chuck Lever III 写道:
>>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>> 
>>> Hi,
>>> 
>>> 在 2024/11/07 22:41, Chuck Lever 写道:
>>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>>>> Hi,
>>>>> 
>>>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>>>> 
>>>>>> 
>>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>> 
>>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>> 
>>>>>>>> Fix patch is patch 27, relied patches are from:
>>>>>> 
>>>>>> I assume patch 27 is:
>>>>>> 
>>>>>> libfs: fix infinite directory reads for offset dir
>>>>>> 
>>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>>>> 
>>>>>> I don't think the Maple tree patches are a hard
>>>>>> requirement for this fix. And note that libfs did
>>>>>> not use Maple tree originally because I was told
>>>>>> at that time that Maple tree was not yet mature.
>>>>>> 
>>>>>> So, a better approach might be to fit the fix
>>>>>> onto linux-6.6.y while sticking with xarray.
>>>>> 
>>>>> The painful part is that using xarray is not acceptable, the offet
>>>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>>>> why maple_tree has to be used.
>>>> A 32-bit range should be entirely adequate for this usage.
>>>>  - The offset allocator wraps when it reaches the maximum, it
>>>>    doesn't overflow unless there are actually billions of extant
>>>>    entries in the directory, which IMO is not likely.
>>> 
>>> Yes, it's not likely, but it's possible, and not hard to trigger for
>>> test.
>> I question whether such a test reflects any real-world
>> workload.
>> Besides, there are a number of other limits that will impact
>> the ability to create that many entries in one directory.
>> The number of inodes in one tmpfs instance is limited, for
>> instance.
>>> And please notice that the offset will increase for each new file,
>>> and file can be removed, while offset stays the same.
> 
> Did you see the above explanation? files can be removed, you don't have
> to store that much files to trigger the offset to overflow.
>>>>  - The offset values are dense, so the directory can use all 2- or
>>>>    4- billion in the 32-bit integer range before wrapping.
>>> 
>>> A simple math, if user create and remove 1 file in each seconds, it will
>>> cost about 130 years to overflow. And if user create and remove 1000
>>> files in each second, it will cost about 1 month to overflow.

> The problem is that if the next_offset overflows to 0, then after patch
> 27, offset_dir_open() will record the 0, and later offset_readdir will
> return directly, while there can be many files.


Let me revisit this for a moment. The xa_alloc_cyclic() call
in simple_offset_add() has a range limit argument of 2 - U32_MAX.

So I'm not clear how an overflow (or, more precisely, the
reuse of an offset value) would result in a "0" offset being
recorded. The range limit prevents the use of 0 and 1.

A "0" offset value would be a bug, I agree, but I don't see
how that can happen.


>> The question is what happens when there are no more offset
>> values available. xa_alloc_cyclic should fail, and file
>> creation is supposed to fail at that point. If it doesn't,
>> that's a bug that is outside of the use of xarray or Maple.
> 
> Can you show me the code that xa_alloc_cyclic should fail? At least
> according to the commets, it will return 1 if the allocation succeeded
> after wrapping.
> 
> * Context: Any context.  Takes and releases the xa_lock.  May sleep if
> * the @gfp flags permit.
> * Return: 0 if the allocation succeeded without wrapping.  1 if the
> * allocation succeeded after wrapping, -ENOMEM if memory could not be
> * allocated or -EBUSY if there are no free entries in @limit.
> */
> static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry,
> struct xa_limit limit, u32 *next, gfp_t gfp)

I recall (dimly) that directory entry offset value re-use
is acceptable and preferred, so I think ignoring a "1"
return value from xa_alloc_cyclic() is OK. If there are
no unused offset values available, it will return -EBUSY,
and file creation will fail.

Perhaps Christian or Al can chime in here on whether
directory entry offset value re-use is indeed expected
to be acceptable.

Further, my understanding is that:

https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/

fixes a rename issue that results in an infinite loop,
and that's the (only) issue that underlies CVE-2024-46701.

You are suggesting that there are other overflow problems
with the xarray-based simple_offset implementation. If I
can confirm them, then I can get these fixed in v6.6. But
so far, I'm not sure I completely understand these other
failure modes.

Are you suggesting that the above fix /introduces/ the
0 offset problem?

--
Chuck Lever
Yu Kuai Nov. 11, 2024, 12:56 a.m. UTC | #16
Hi,

在 2024/11/10 0:58, Chuck Lever III 写道:
> 
> 
>> On Nov 8, 2024, at 8:30 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/11/08 21:23, Chuck Lever III 写道:
>>>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2024/11/07 22:41, Chuck Lever 写道:
>>>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>>>>> Hi,
>>>>>>
>>>>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>>>>>
>>>>>>>
>>>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>>>>>
>>>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>>>
>>>>>>>>> Fix patch is patch 27, relied patches are from:
>>>>>>>
>>>>>>> I assume patch 27 is:
>>>>>>>
>>>>>>> libfs: fix infinite directory reads for offset dir
>>>>>>>
>>>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>>>>>
>>>>>>> I don't think the Maple tree patches are a hard
>>>>>>> requirement for this fix. And note that libfs did
>>>>>>> not use Maple tree originally because I was told
>>>>>>> at that time that Maple tree was not yet mature.
>>>>>>>
>>>>>>> So, a better approach might be to fit the fix
>>>>>>> onto linux-6.6.y while sticking with xarray.
>>>>>>
>>>>>> The painful part is that using xarray is not acceptable, the offet
>>>>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>>>>> why maple_tree has to be used.
>>>>> A 32-bit range should be entirely adequate for this usage.
>>>>>   - The offset allocator wraps when it reaches the maximum, it
>>>>>     doesn't overflow unless there are actually billions of extant
>>>>>     entries in the directory, which IMO is not likely.
>>>>
>>>> Yes, it's not likely, but it's possible, and not hard to trigger for
>>>> test.
>>> I question whether such a test reflects any real-world
>>> workload.
>>> Besides, there are a number of other limits that will impact
>>> the ability to create that many entries in one directory.
>>> The number of inodes in one tmpfs instance is limited, for
>>> instance.
>>>> And please notice that the offset will increase for each new file,
>>>> and file can be removed, while offset stays the same.
>>
>> Did you see the above explanation? files can be removed, you don't have
>> to store that much files to trigger the offset to overflow.
>>>>>   - The offset values are dense, so the directory can use all 2- or
>>>>>     4- billion in the 32-bit integer range before wrapping.
>>>>
>>>> A simple math, if user create and remove 1 file in each seconds, it will
>>>> cost about 130 years to overflow. And if user create and remove 1000
>>>> files in each second, it will cost about 1 month to overflow.
> 
>> The problem is that if the next_offset overflows to 0, then after patch
>> 27, offset_dir_open() will record the 0, and later offset_readdir will
>> return directly, while there can be many files.
> 
> 
> Let me revisit this for a moment. The xa_alloc_cyclic() call
> in simple_offset_add() has a range limit argument of 2 - U32_MAX.
> 
> So I'm not clear how an overflow (or, more precisely, the
> reuse of an offset value) would result in a "0" offset being
> recorded. The range limit prevents the use of 0 and 1.
> 
> A "0" offset value would be a bug, I agree, but I don't see
> how that can happen.
> 
> 
>>> The question is what happens when there are no more offset
>>> values available. xa_alloc_cyclic should fail, and file
>>> creation is supposed to fail at that point. If it doesn't,
>>> that's a bug that is outside of the use of xarray or Maple.
>>
>> Can you show me the code that xa_alloc_cyclic should fail? At least
>> according to the commets, it will return 1 if the allocation succeeded
>> after wrapping.
>>
>> * Context: Any context.  Takes and releases the xa_lock.  May sleep if
>> * the @gfp flags permit.
>> * Return: 0 if the allocation succeeded without wrapping.  1 if the
>> * allocation succeeded after wrapping, -ENOMEM if memory could not be
>> * allocated or -EBUSY if there are no free entries in @limit.
>> */
>> static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry,
>> struct xa_limit limit, u32 *next, gfp_t gfp)
> 
> I recall (dimly) that directory entry offset value re-use
> is acceptable and preferred, so I think ignoring a "1"
> return value from xa_alloc_cyclic() is OK. If there are
> no unused offset values available, it will return -EBUSY,
> and file creation will fail.
> 
> Perhaps Christian or Al can chime in here on whether
> directory entry offset value re-use is indeed expected
> to be acceptable.

This can't be acceptable in this case, the reason is straightforward,
it will mess readdir, and this is mucth more serious than the cve
itself.

Thanks,
Kuai

> 
> Further, my understanding is that:
> 
> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
> 
> fixes a rename issue that results in an infinite loop,
> and that's the (only) issue that underlies CVE-2024-46701.
> 
> You are suggesting that there are other overflow problems
> with the xarray-based simple_offset implementation. If I
> can confirm them, then I can get these fixed in v6.6. But
> so far, I'm not sure I completely understand these other
> failure modes.
> 
> Are you suggesting that the above fix /introduces/ the
> 0 offset problem?
> 
> --
> Chuck Lever
> 
>