mbox series

[0/3] btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot

Message ID 20221205095122.17011-1-robbieko@synology.com (mailing list archive)
Headers show
Series btrfs: fix unexpected -ENOMEM with percpu_counter_init when create snapshot | expand

Message

robbieko Dec. 5, 2022, 9:51 a.m. UTC
From: Robbie Ko <robbieko@synology.com>

[Issue]
When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM

  WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
  CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
  Call Trace:
    create_pending_snapshot+0xe30/0xe70 [btrfs]
    create_pending_snapshots+0x89/0xb0 [btrfs]
    btrfs_commit_transaction+0x469/0xc60 [btrfs]
    btrfs_mksubvol+0x5bd/0x690 [btrfs]
    btrfs_mksnapshot+0x102/0x170 [btrfs]
    btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
    btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
    btrfs_ioctl+0x2111/0x3130 [btrfs]
    do_vfs_ioctl+0x7ea/0xa80
    SyS_ioctl+0xa1/0xb0
    entry_SYSCALL_64_fastpath+0x1e/0x8e
  ---[ end trace 910c8f86780ca385 ]---
  BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory

[Cause]
During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
However, atomic allocation is required when processing percpu_counter_init
without GFP_KERNEL due to the unique structure of percpu_counter.
In this situation, allocating memory for initializing fs root may cause
unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.

[Fix]
We allocate memory at the beginning of creating a subvolume/snapshot.
This way, we can ensure the memory is enough when initializing fs root.
Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
the transaction won’t abort since it hasn’t started yet.

Robbie Ko (3):
  btrfs: refactor anon_dev with new_fs_root_args for create
    subvolume/snapshot
  btrfs: change snapshot_lock to dynamic pointer
  btrfs: add snapshot_lock to new_fs_root_args

 fs/btrfs/ctree.h       |   2 +-
 fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
 fs/btrfs/disk-io.h     |  12 ++++-
 fs/btrfs/file.c        |   8 +--
 fs/btrfs/inode.c       |  12 ++---
 fs/btrfs/ioctl.c       |  38 +++++++--------
 fs/btrfs/transaction.c |   2 +-
 fs/btrfs/transaction.h |   5 +-
 8 files changed, 123 insertions(+), 63 deletions(-)

Comments

Filipe Manana Dec. 5, 2022, 11:15 a.m. UTC | #1
On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> [Issue]
> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
>
>   WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>   CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661

4.4.180...

How often does that happen on a supported kernel? The oldest supported
kernel is 4.9 at the moment.

>   Call Trace:
>     create_pending_snapshot+0xe30/0xe70 [btrfs]
>     create_pending_snapshots+0x89/0xb0 [btrfs]
>     btrfs_commit_transaction+0x469/0xc60 [btrfs]
>     btrfs_mksubvol+0x5bd/0x690 [btrfs]
>     btrfs_mksnapshot+0x102/0x170 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>     btrfs_ioctl+0x2111/0x3130 [btrfs]
>     do_vfs_ioctl+0x7ea/0xa80
>     SyS_ioctl+0xa1/0xb0
>     entry_SYSCALL_64_fastpath+0x1e/0x8e
>   ---[ end trace 910c8f86780ca385 ]---
>   BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
>
> [Cause]
> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
> However, atomic allocation is required when processing percpu_counter_init
> without GFP_KERNEL due to the unique structure of percpu_counter.
> In this situation, allocating memory for initializing fs root may cause
> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.

This sounds familiar, and we had a regression in mm that made
percepu_counter_init fail very often with -ENOMEM.
See:

https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/

The kernel fix was this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78

I'm assuming that you are probably running a Synology kernel based on
4.4 with a lot of backported patches, is that correct?
Maybe you have the patchset that introduced the regression in that
kernel, but that later fix is not in there.

Thanks.


>
> [Fix]
> We allocate memory at the beginning of creating a subvolume/snapshot.
> This way, we can ensure the memory is enough when initializing fs root.
> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> the transaction won’t abort since it hasn’t started yet.
>
> Robbie Ko (3):
>   btrfs: refactor anon_dev with new_fs_root_args for create
>     subvolume/snapshot
>   btrfs: change snapshot_lock to dynamic pointer
>   btrfs: add snapshot_lock to new_fs_root_args
>
>  fs/btrfs/ctree.h       |   2 +-
>  fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
>  fs/btrfs/disk-io.h     |  12 ++++-
>  fs/btrfs/file.c        |   8 +--
>  fs/btrfs/inode.c       |  12 ++---
>  fs/btrfs/ioctl.c       |  38 +++++++--------
>  fs/btrfs/transaction.c |   2 +-
>  fs/btrfs/transaction.h |   5 +-
>  8 files changed, 123 insertions(+), 63 deletions(-)
>
> --
> 2.17.1
>
robbieko Dec. 6, 2022, 8:58 a.m. UTC | #2
Filipe Manana 於 2022/12/5 下午7:15 寫道:
> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> [Issue]
>> When creating subvolume/snapshot, the transaction may be abort due to -ENOMEM
>>
>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> 4.4.180...
>
> How often does that happen on a supported kernel? The oldest supported
> kernel is 4.9 at the moment.

The occurrence of this issue is extremely low, and it cannot be 
reproduced stably.
We have millions of machines out there, and this issue happened 3 times 
in the last two years.


>
>>    Call Trace:
>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
>>      create_pending_snapshots+0x89/0xb0 [btrfs]
>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
>>      do_vfs_ioctl+0x7ea/0xa80
>>      SyS_ioctl+0xa1/0xb0
>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
>>    ---[ end trace 910c8f86780ca385 ]---
>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: errno=-12 Out of memory
>>
>> [Cause]
>> During creating a subvolume/snapshot, it is necessary to allocate memory for Initializing fs root.
>> Therefore, it can only use GFP_NOFS to allocate memory to avoid deadlock issues.
>> However, atomic allocation is required when processing percpu_counter_init
>> without GFP_KERNEL due to the unique structure of percpu_counter.
>> In this situation, allocating memory for initializing fs root may cause
>> unexpected -ENOMEM when free memory is low and causes btrfs transaction to abort.
> This sounds familiar, and we had a regression in mm that made
> percepu_counter_init fail very often with -ENOMEM.
> See:
>
> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
>
> The kernel fix was this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
>
> I'm assuming that you are probably running a Synology kernel based on
> 4.4 with a lot of backported patches, is that correct?
> Maybe you have the patchset that introduced the regression in that
> kernel, but that later fix is not in there.
>
> Thanks.

Yes, We do backport many patches, but we don't backport "mm: 
memcg/percpu: account percpu memory to memory cgroups",
So we think the issue has nothing to do with "percpu: make 
pcpu_nr_empty_pop_pages per chunk type".

This issue is just a corner case, when percpu_counter_init is executed 
with GFP_NOFS, there is a chance to fail.
So we feel that the snapshot_lock should be preallocated.

Thanks.


>
>> [Fix]
>> We allocate memory at the beginning of creating a subvolume/snapshot.
>> This way, we can ensure the memory is enough when initializing fs root.
>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>> the transaction won’t abort since it hasn’t started yet.
>>
>> Robbie Ko (3):
>>    btrfs: refactor anon_dev with new_fs_root_args for create
>>      subvolume/snapshot
>>    btrfs: change snapshot_lock to dynamic pointer
>>    btrfs: add snapshot_lock to new_fs_root_args
>>
>>   fs/btrfs/ctree.h       |   2 +-
>>   fs/btrfs/disk-io.c     | 107 ++++++++++++++++++++++++++++++-----------
>>   fs/btrfs/disk-io.h     |  12 ++++-
>>   fs/btrfs/file.c        |   8 +--
>>   fs/btrfs/inode.c       |  12 ++---
>>   fs/btrfs/ioctl.c       |  38 +++++++--------
>>   fs/btrfs/transaction.c |   2 +-
>>   fs/btrfs/transaction.h |   5 +-
>>   8 files changed, 123 insertions(+), 63 deletions(-)
>>
>> --
>> 2.17.1
>>
robbieko Dec. 8, 2022, 6:24 a.m. UTC | #3
robbieko 於 2022/12/6 下午4:58 寫道:
>
> Filipe Manana 於 2022/12/5 下午7:15 寫道:
>> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>>> From: Robbie Ko <robbieko@synology.com>
>>>
>>> [Issue]
>>> When creating subvolume/snapshot, the transaction may be abort due 
>>> to -ENOMEM
>>>
>>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937 
>>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>> 4.4.180...
>>
>> How often does that happen on a supported kernel? The oldest supported
>> kernel is 4.9 at the moment.
>
> The occurrence of this issue is extremely low, and it cannot be 
> reproduced stably.
> We have millions of machines out there, and this issue happened 3 
> times in the last two years.
>
>
>>
>>>    Call Trace:
>>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
>>>      create_pending_snapshots+0x89/0xb0 [btrfs]
>>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
>>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
>>>      do_vfs_ioctl+0x7ea/0xa80
>>>      SyS_ioctl+0xa1/0xb0
>>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
>>>    ---[ end trace 910c8f86780ca385 ]---
>>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937: 
>>> errno=-12 Out of memory
>>>
>>> [Cause]
>>> During creating a subvolume/snapshot, it is necessary to allocate 
>>> memory for Initializing fs root.
>>> Therefore, it can only use GFP_NOFS to allocate memory to avoid 
>>> deadlock issues.
>>> However, atomic allocation is required when processing 
>>> percpu_counter_init
>>> without GFP_KERNEL due to the unique structure of percpu_counter.
>>> In this situation, allocating memory for initializing fs root may cause
>>> unexpected -ENOMEM when free memory is low and causes btrfs 
>>> transaction to abort.
>> This sounds familiar, and we had a regression in mm that made
>> percepu_counter_init fail very often with -ENOMEM.
>> See:
>>
>> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ 
>>
>>
>> The kernel fix was this:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78 
>>
>>
>> I'm assuming that you are probably running a Synology kernel based on
>> 4.4 with a lot of backported patches, is that correct?
>> Maybe you have the patchset that introduced the regression in that
>> kernel, but that later fix is not in there.
>>
>> Thanks.
>
> Yes, We do backport many patches, but we don't backport "mm: 
> memcg/percpu: account percpu memory to memory cgroups",
> So we think the issue has nothing to do with "percpu: make 
> pcpu_nr_empty_pop_pages per chunk type".
>
> This issue is just a corner case, when percpu_counter_init is executed 
> with GFP_NOFS, there is a chance to fail.
> So we feel that the snapshot_lock should be preallocated.
>
> Thanks.
>
>

Anyone have any suggestions ?


>>
>>> [Fix]
>>> We allocate memory at the beginning of creating a subvolume/snapshot.
>>> This way, we can ensure the memory is enough when initializing fs root.
>>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>>> the transaction won’t abort since it hasn’t started yet.
>>>
>>> Robbie Ko (3):
>>>    btrfs: refactor anon_dev with new_fs_root_args for create
>>>      subvolume/snapshot
>>>    btrfs: change snapshot_lock to dynamic pointer
>>>    btrfs: add snapshot_lock to new_fs_root_args
>>>
>>>   fs/btrfs/ctree.h       |   2 +-
>>>   fs/btrfs/disk-io.c     | 107 
>>> ++++++++++++++++++++++++++++++-----------
>>>   fs/btrfs/disk-io.h     |  12 ++++-
>>>   fs/btrfs/file.c        |   8 +--
>>>   fs/btrfs/inode.c       |  12 ++---
>>>   fs/btrfs/ioctl.c       |  38 +++++++--------
>>>   fs/btrfs/transaction.c |   2 +-
>>>   fs/btrfs/transaction.h |   5 +-
>>>   8 files changed, 123 insertions(+), 63 deletions(-)
>>>
>>> -- 
>>> 2.17.1
>>>
Filipe Manana Dec. 12, 2022, 5:04 p.m. UTC | #4
On Thu, Dec 8, 2022 at 6:24 AM robbieko <robbieko@synology.com> wrote:
>
>
> robbieko 於 2022/12/6 下午4:58 寫道:
> >
> > Filipe Manana 於 2022/12/5 下午7:15 寫道:
> >> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
> >>> From: Robbie Ko <robbieko@synology.com>
> >>>
> >>> [Issue]
> >>> When creating subvolume/snapshot, the transaction may be abort due
> >>> to -ENOMEM
> >>>
> >>>    WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937
> >>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
> >>>    CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
> >> 4.4.180...
> >>
> >> How often does that happen on a supported kernel? The oldest supported
> >> kernel is 4.9 at the moment.
> >
> > The occurrence of this issue is extremely low, and it cannot be
> > reproduced stably.
> > We have millions of machines out there, and this issue happened 3
> > times in the last two years.
> >
> >
> >>
> >>>    Call Trace:
> >>>      create_pending_snapshot+0xe30/0xe70 [btrfs]
> >>>      create_pending_snapshots+0x89/0xb0 [btrfs]
> >>>      btrfs_commit_transaction+0x469/0xc60 [btrfs]
> >>>      btrfs_mksubvol+0x5bd/0x690 [btrfs]
> >>>      btrfs_mksnapshot+0x102/0x170 [btrfs]
> >>>      btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
> >>>      btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
> >>>      btrfs_ioctl+0x2111/0x3130 [btrfs]
> >>>      do_vfs_ioctl+0x7ea/0xa80
> >>>      SyS_ioctl+0xa1/0xb0
> >>>      entry_SYSCALL_64_fastpath+0x1e/0x8e
> >>>    ---[ end trace 910c8f86780ca385 ]---
> >>>    BTRFS: error (device dm-2) in create_pending_snapshot:1937:
> >>> errno=-12 Out of memory
> >>>
> >>> [Cause]
> >>> During creating a subvolume/snapshot, it is necessary to allocate
> >>> memory for Initializing fs root.
> >>> Therefore, it can only use GFP_NOFS to allocate memory to avoid
> >>> deadlock issues.
> >>> However, atomic allocation is required when processing
> >>> percpu_counter_init
> >>> without GFP_KERNEL due to the unique structure of percpu_counter.
> >>> In this situation, allocating memory for initializing fs root may cause
> >>> unexpected -ENOMEM when free memory is low and causes btrfs
> >>> transaction to abort.
> >> This sounds familiar, and we had a regression in mm that made
> >> percepu_counter_init fail very often with -ENOMEM.
> >> See:
> >>
> >> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
> >>
> >>
> >> The kernel fix was this:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
> >>
> >>
> >> I'm assuming that you are probably running a Synology kernel based on
> >> 4.4 with a lot of backported patches, is that correct?
> >> Maybe you have the patchset that introduced the regression in that
> >> kernel, but that later fix is not in there.
> >>
> >> Thanks.
> >
> > Yes, We do backport many patches, but we don't backport "mm:
> > memcg/percpu: account percpu memory to memory cgroups",
> > So we think the issue has nothing to do with "percpu: make
> > pcpu_nr_empty_pop_pages per chunk type".
> >
> > This issue is just a corner case, when percpu_counter_init is executed
> > with GFP_NOFS, there is a chance to fail.
> > So we feel that the snapshot_lock should be preallocated.
> >
> > Thanks.
> >
> >
>
> Anyone have any suggestions ?

Sorry, some holidays and other priorities came in.

The idea seems fine to me. I was just wondering if you weren't hitting
that regression in mm.
Patches 1 and 2 don't apply with current misc-next, btw, they require
manual conflict resolution.

I'll leave review comments on each patch, small things, nothing major.

Thanks.

>
>
> >>
> >>> [Fix]
> >>> We allocate memory at the beginning of creating a subvolume/snapshot.
> >>> This way, we can ensure the memory is enough when initializing fs root.
> >>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
> >>> the transaction won’t abort since it hasn’t started yet.
> >>>
> >>> Robbie Ko (3):
> >>>    btrfs: refactor anon_dev with new_fs_root_args for create
> >>>      subvolume/snapshot
> >>>    btrfs: change snapshot_lock to dynamic pointer
> >>>    btrfs: add snapshot_lock to new_fs_root_args
> >>>
> >>>   fs/btrfs/ctree.h       |   2 +-
> >>>   fs/btrfs/disk-io.c     | 107
> >>> ++++++++++++++++++++++++++++++-----------
> >>>   fs/btrfs/disk-io.h     |  12 ++++-
> >>>   fs/btrfs/file.c        |   8 +--
> >>>   fs/btrfs/inode.c       |  12 ++---
> >>>   fs/btrfs/ioctl.c       |  38 +++++++--------
> >>>   fs/btrfs/transaction.c |   2 +-
> >>>   fs/btrfs/transaction.h |   5 +-
> >>>   8 files changed, 123 insertions(+), 63 deletions(-)
> >>>
> >>> --
> >>> 2.17.1
> >>>
robbieko Dec. 13, 2022, 2:59 a.m. UTC | #5
Filipe Manana 於 2022/12/13 上午1:04 寫道:
> On Thu, Dec 8, 2022 at 6:24 AM robbieko <robbieko@synology.com> wrote:
>>
>> robbieko 於 2022/12/6 下午4:58 寫道:
>>> Filipe Manana 於 2022/12/5 下午7:15 寫道:
>>>> On Mon, Dec 5, 2022 at 10:55 AM robbieko <robbieko@synology.com> wrote:
>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>>
>>>>> [Issue]
>>>>> When creating subvolume/snapshot, the transaction may be abort due
>>>>> to -ENOMEM
>>>>>
>>>>>     WARNING: CPU: 1 PID: 411 at fs/btrfs/transaction.c:1937
>>>>> create_pending_snapshot+0xe30/0xe70 [btrfs]()
>>>>>     CPU: 1 PID: 411 Comm: btrfs Tainted: P O 4.4.180+ #42661
>>>> 4.4.180...
>>>>
>>>> How often does that happen on a supported kernel? The oldest supported
>>>> kernel is 4.9 at the moment.
>>> The occurrence of this issue is extremely low, and it cannot be
>>> reproduced stably.
>>> We have millions of machines out there, and this issue happened 3
>>> times in the last two years.
>>>
>>>
>>>>>     Call Trace:
>>>>>       create_pending_snapshot+0xe30/0xe70 [btrfs]
>>>>>       create_pending_snapshots+0x89/0xb0 [btrfs]
>>>>>       btrfs_commit_transaction+0x469/0xc60 [btrfs]
>>>>>       btrfs_mksubvol+0x5bd/0x690 [btrfs]
>>>>>       btrfs_mksnapshot+0x102/0x170 [btrfs]
>>>>>       btrfs_ioctl_snap_create_transid+0x1ad/0x1c0 [btrfs]
>>>>>       btrfs_ioctl_snap_create_v2+0x102/0x160 [btrfs]
>>>>>       btrfs_ioctl+0x2111/0x3130 [btrfs]
>>>>>       do_vfs_ioctl+0x7ea/0xa80
>>>>>       SyS_ioctl+0xa1/0xb0
>>>>>       entry_SYSCALL_64_fastpath+0x1e/0x8e
>>>>>     ---[ end trace 910c8f86780ca385 ]---
>>>>>     BTRFS: error (device dm-2) in create_pending_snapshot:1937:
>>>>> errno=-12 Out of memory
>>>>>
>>>>> [Cause]
>>>>> During creating a subvolume/snapshot, it is necessary to allocate
>>>>> memory for Initializing fs root.
>>>>> Therefore, it can only use GFP_NOFS to allocate memory to avoid
>>>>> deadlock issues.
>>>>> However, atomic allocation is required when processing
>>>>> percpu_counter_init
>>>>> without GFP_KERNEL due to the unique structure of percpu_counter.
>>>>> In this situation, allocating memory for initializing fs root may cause
>>>>> unexpected -ENOMEM when free memory is low and causes btrfs
>>>>> transaction to abort.
>>>> This sounds familiar, and we had a regression in mm that made
>>>> percepu_counter_init fail very often with -ENOMEM.
>>>> See:
>>>>
>>>> https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
>>>>
>>>>
>>>> The kernel fix was this:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0760fa3d8f7fceeea508b98899f1c826e10ffe78
>>>>
>>>>
>>>> I'm assuming that you are probably running a Synology kernel based on
>>>> 4.4 with a lot of backported patches, is that correct?
>>>> Maybe you have the patchset that introduced the regression in that
>>>> kernel, but that later fix is not in there.
>>>>
>>>> Thanks.
>>> Yes, We do backport many patches, but we don't backport "mm:
>>> memcg/percpu: account percpu memory to memory cgroups",
>>> So we think the issue has nothing to do with "percpu: make
>>> pcpu_nr_empty_pop_pages per chunk type".
>>>
>>> This issue is just a corner case, when percpu_counter_init is executed
>>> with GFP_NOFS, there is a chance to fail.
>>> So we feel that the snapshot_lock should be preallocated.
>>>
>>> Thanks.
>>>
>>>
>> Anyone have any suggestions ?
> Sorry, some holidays and other priorities came in.
>
> The idea seems fine to me. I was just wondering if you weren't hitting
> that regression in mm.
> Patches 1 and 2 don't apply with current misc-next, btw, they require
> manual conflict resolution.

Ok. I will switch to using misc-next for development and send patch v2

Thanks.

> I'll leave review comments on each patch, small things, nothing major.
>
> Thanks.
>
>>
>>>>> [Fix]
>>>>> We allocate memory at the beginning of creating a subvolume/snapshot.
>>>>> This way, we can ensure the memory is enough when initializing fs root.
>>>>> Even -ENOMEM happens at the beginning of creating a subvolume/snapshot,
>>>>> the transaction won’t abort since it hasn’t started yet.
>>>>>
>>>>> Robbie Ko (3):
>>>>>     btrfs: refactor anon_dev with new_fs_root_args for create
>>>>>       subvolume/snapshot
>>>>>     btrfs: change snapshot_lock to dynamic pointer
>>>>>     btrfs: add snapshot_lock to new_fs_root_args
>>>>>
>>>>>    fs/btrfs/ctree.h       |   2 +-
>>>>>    fs/btrfs/disk-io.c     | 107
>>>>> ++++++++++++++++++++++++++++++-----------
>>>>>    fs/btrfs/disk-io.h     |  12 ++++-
>>>>>    fs/btrfs/file.c        |   8 +--
>>>>>    fs/btrfs/inode.c       |  12 ++---
>>>>>    fs/btrfs/ioctl.c       |  38 +++++++--------
>>>>>    fs/btrfs/transaction.c |   2 +-
>>>>>    fs/btrfs/transaction.h |   5 +-
>>>>>    8 files changed, 123 insertions(+), 63 deletions(-)
>>>>>
>>>>> --
>>>>> 2.17.1
>>>>>