diff mbox series

btrfs: lock the inode in shared mode before starting fiemap

Message ID 5c6c1cefa7df5fbe9c4dc2fe517f521760a2f4be.1674492773.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: lock the inode in shared mode before starting fiemap | expand

Commit Message

Filipe Manana Jan. 23, 2023, 4:54 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Currently fiemap does not take the inode's lock (VFS lock), it only locks
a file range in the inode's io tree. This however can lead to a deadlock
if we have a concurrent fsync on the file and fiemap code triggers a fault
when accessing the user space buffer with fiemap_fill_next_extent(). The
deadlock happens on the inode's i_mmap_lock semaphore, which is taken both
by fsync and btrfs_page_mkwrite(). This deadlock was recently reported by
syzbot and triggers a trace like the following:

   task:syz-executor361 state:D stack:20264 pid:5668  ppid:5119   flags:0x00004004
   Call Trace:
    <TASK>
    context_switch kernel/sched/core.c:5293 [inline]
    __schedule+0x995/0xe20 kernel/sched/core.c:6606
    schedule+0xcb/0x190 kernel/sched/core.c:6682
    wait_on_state fs/btrfs/extent-io-tree.c:707 [inline]
    wait_extent_bit+0x577/0x6f0 fs/btrfs/extent-io-tree.c:751
    lock_extent+0x1c2/0x280 fs/btrfs/extent-io-tree.c:1742
    find_lock_delalloc_range+0x4e6/0x9c0 fs/btrfs/extent_io.c:488
    writepage_delalloc+0x1ef/0x540 fs/btrfs/extent_io.c:1863
    __extent_writepage+0x736/0x14e0 fs/btrfs/extent_io.c:2174
    extent_write_cache_pages+0x983/0x1220 fs/btrfs/extent_io.c:3091
    extent_writepages+0x219/0x540 fs/btrfs/extent_io.c:3211
    do_writepages+0x3c3/0x680 mm/page-writeback.c:2581
    filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388
    __filemap_fdatawrite_range mm/filemap.c:421 [inline]
    filemap_fdatawrite_range+0x175/0x200 mm/filemap.c:439
    btrfs_fdatawrite_range fs/btrfs/file.c:3850 [inline]
    start_ordered_ops fs/btrfs/file.c:1737 [inline]
    btrfs_sync_file+0x4ff/0x1190 fs/btrfs/file.c:1839
    generic_write_sync include/linux/fs.h:2885 [inline]
    btrfs_do_write_iter+0xcd3/0x1280 fs/btrfs/file.c:1684
    call_write_iter include/linux/fs.h:2189 [inline]
    new_sync_write fs/read_write.c:491 [inline]
    vfs_write+0x7dc/0xc50 fs/read_write.c:584
    ksys_write+0x177/0x2a0 fs/read_write.c:637
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd
   RIP: 0033:0x7f7d4054e9b9
   RSP: 002b:00007f7d404fa2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
   RAX: ffffffffffffffda RBX: 00007f7d405d87a0 RCX: 00007f7d4054e9b9
   RDX: 0000000000000090 RSI: 0000000020000000 RDI: 0000000000000006
   RBP: 00007f7d405a51d0 R08: 0000000000000000 R09: 0000000000000000
   R10: 0000000000000000 R11: 0000000000000246 R12: 61635f65646f6e69
   R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87a8
    </TASK>
   INFO: task syz-executor361:5697 blocked for more than 145 seconds.
         Not tainted 6.2.0-rc3-syzkaller-00376-g7c6984405241 #0
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   task:syz-executor361 state:D stack:21216 pid:5697  ppid:5119   flags:0x00004004
   Call Trace:
    <TASK>
    context_switch kernel/sched/core.c:5293 [inline]
    __schedule+0x995/0xe20 kernel/sched/core.c:6606
    schedule+0xcb/0x190 kernel/sched/core.c:6682
    rwsem_down_read_slowpath+0x5f9/0x930 kernel/locking/rwsem.c:1095
    __down_read_common+0x54/0x2a0 kernel/locking/rwsem.c:1260
    btrfs_page_mkwrite+0x417/0xc80 fs/btrfs/inode.c:8526
    do_page_mkwrite+0x19e/0x5e0 mm/memory.c:2947
    wp_page_shared+0x15e/0x380 mm/memory.c:3295
    handle_pte_fault mm/memory.c:4949 [inline]
    __handle_mm_fault mm/memory.c:5073 [inline]
    handle_mm_fault+0x1b79/0x26b0 mm/memory.c:5219
    do_user_addr_fault+0x69b/0xcb0 arch/x86/mm/fault.c:1428
    handle_page_fault arch/x86/mm/fault.c:1519 [inline]
    exc_page_fault+0x7a/0x110 arch/x86/mm/fault.c:1575
    asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:570
   RIP: 0010:copy_user_short_string+0xd/0x40 arch/x86/lib/copy_user_64.S:233
   Code: 74 0a 89 (...)
   RSP: 0018:ffffc9000570f330 EFLAGS: 00050202
   RAX: ffffffff843e6601 RBX: 00007fffffffefc8 RCX: 0000000000000007
   RDX: 0000000000000000 RSI: ffffc9000570f3e0 RDI: 0000000020000120
   RBP: ffffc9000570f490 R08: 0000000000000000 R09: fffff52000ae1e83
   R10: fffff52000ae1e83 R11: 1ffff92000ae1e7c R12: 0000000000000038
   R13: ffffc9000570f3e0 R14: 0000000020000120 R15: ffffc9000570f3e0
    copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
    raw_copy_to_user arch/x86/include/asm/uaccess_64.h:58 [inline]
    _copy_to_user+0xe9/0x130 lib/usercopy.c:34
    copy_to_user include/linux/uaccess.h:169 [inline]
    fiemap_fill_next_extent+0x22e/0x410 fs/ioctl.c:144
    emit_fiemap_extent+0x22d/0x3c0 fs/btrfs/extent_io.c:3458
    fiemap_process_hole+0xa00/0xad0 fs/btrfs/extent_io.c:3716
    extent_fiemap+0xe27/0x2100 fs/btrfs/extent_io.c:3922
    btrfs_fiemap+0x172/0x1e0 fs/btrfs/inode.c:8209
    ioctl_fiemap fs/ioctl.c:219 [inline]
    do_vfs_ioctl+0x185b/0x2980 fs/ioctl.c:810
    __do_sys_ioctl fs/ioctl.c:868 [inline]
    __se_sys_ioctl+0x83/0x170 fs/ioctl.c:856
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd
   RIP: 0033:0x7f7d4054e9b9
   RSP: 002b:00007f7d390d92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
   RAX: ffffffffffffffda RBX: 00007f7d405d87b0 RCX: 00007f7d4054e9b9
   RDX: 0000000020000100 RSI: 00000000c020660b RDI: 0000000000000005
   RBP: 00007f7d405a51d0 R08: 00007f7d390d9700 R09: 0000000000000000
   R10: 00007f7d390d9700 R11: 0000000000000246 R12: 61635f65646f6e69
   R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87b8
    </TASK>

What happens is the following:

1) Task A is doing an fsync, enters btrfs_sync_file() and flushes delalloc
   before locking the inode and the i_mmap_lock semaphore, that is, before
   calling btrfs_inode_lock();

2) After task A flushes delalloc and before it calls btrfs_inode_lock(),
   another task dirties a page;

3) Task B starts a fiemap without FIEMAP_FLAG_SYNC, so the page dirtied
   at step 2 remains dirty and unflushed. Then when it enters
   extent_fiemap() and it locks a file range that includes the range of
   the page dirtied in step 2;

4) Task A calls btrfs_inode_lock() and locks the inode (VFS lock) and the
   inode's i_mmap_lock semaphore in write mode. Then it tries to flush
   delalloc by calling start_ordered_ops(), which will block, at
   find_lock_delalloc_range(), when trying to lock the range of the page
   dirtied at step 2, since this range was locked by the fiemap task (at
   step 3);

5) Task B generates a page fault when accessing the user space fiemap
   buffer with a call to fiemap_fill_next_extent().

   The fault handler needs to call btrfs_page_mkwrite() for some other
   page of our inode, and there we deadlock when trying to lock the
   inode's i_mmap_lock semaphore in read mode, since the fsync task locked
   it in write mode (step 4) and the fsync task can not progress because
   it's waiting to lock a file range that is currently locked by us (the
   fiemap task, step 3).

Fix this by taking the inode's lock (VFS lock) in shared mode when
entering fiemap. This effectively serializes fiemap with fsync (except the
most expensive part of fsync, the log sync), preventing this deadlock.

Reported-by: syzbot+cc35f55c41e34c30dcb5@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/00000000000032dc7305f2a66f46@google.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/extent_io.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Josef Bacik Jan. 30, 2023, 7:30 p.m. UTC | #1
On Mon, Jan 23, 2023 at 04:54:46PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Currently fiemap does not take the inode's lock (VFS lock), it only locks
> a file range in the inode's io tree. This however can lead to a deadlock
> if we have a concurrent fsync on the file and fiemap code triggers a fault
> when accessing the user space buffer with fiemap_fill_next_extent(). The
> deadlock happens on the inode's i_mmap_lock semaphore, which is taken both
> by fsync and btrfs_page_mkwrite(). This deadlock was recently reported by
> syzbot and triggers a trace like the following:
> 
>    task:syz-executor361 state:D stack:20264 pid:5668  ppid:5119   flags:0x00004004
>    Call Trace:
>     <TASK>
>     context_switch kernel/sched/core.c:5293 [inline]
>     __schedule+0x995/0xe20 kernel/sched/core.c:6606
>     schedule+0xcb/0x190 kernel/sched/core.c:6682
>     wait_on_state fs/btrfs/extent-io-tree.c:707 [inline]
>     wait_extent_bit+0x577/0x6f0 fs/btrfs/extent-io-tree.c:751
>     lock_extent+0x1c2/0x280 fs/btrfs/extent-io-tree.c:1742
>     find_lock_delalloc_range+0x4e6/0x9c0 fs/btrfs/extent_io.c:488
>     writepage_delalloc+0x1ef/0x540 fs/btrfs/extent_io.c:1863
>     __extent_writepage+0x736/0x14e0 fs/btrfs/extent_io.c:2174
>     extent_write_cache_pages+0x983/0x1220 fs/btrfs/extent_io.c:3091
>     extent_writepages+0x219/0x540 fs/btrfs/extent_io.c:3211
>     do_writepages+0x3c3/0x680 mm/page-writeback.c:2581
>     filemap_fdatawrite_wbc+0x11e/0x170 mm/filemap.c:388
>     __filemap_fdatawrite_range mm/filemap.c:421 [inline]
>     filemap_fdatawrite_range+0x175/0x200 mm/filemap.c:439
>     btrfs_fdatawrite_range fs/btrfs/file.c:3850 [inline]
>     start_ordered_ops fs/btrfs/file.c:1737 [inline]
>     btrfs_sync_file+0x4ff/0x1190 fs/btrfs/file.c:1839
>     generic_write_sync include/linux/fs.h:2885 [inline]
>     btrfs_do_write_iter+0xcd3/0x1280 fs/btrfs/file.c:1684
>     call_write_iter include/linux/fs.h:2189 [inline]
>     new_sync_write fs/read_write.c:491 [inline]
>     vfs_write+0x7dc/0xc50 fs/read_write.c:584
>     ksys_write+0x177/0x2a0 fs/read_write.c:637
>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>    RIP: 0033:0x7f7d4054e9b9
>    RSP: 002b:00007f7d404fa2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>    RAX: ffffffffffffffda RBX: 00007f7d405d87a0 RCX: 00007f7d4054e9b9
>    RDX: 0000000000000090 RSI: 0000000020000000 RDI: 0000000000000006
>    RBP: 00007f7d405a51d0 R08: 0000000000000000 R09: 0000000000000000
>    R10: 0000000000000000 R11: 0000000000000246 R12: 61635f65646f6e69
>    R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87a8
>     </TASK>
>    INFO: task syz-executor361:5697 blocked for more than 145 seconds.
>          Not tainted 6.2.0-rc3-syzkaller-00376-g7c6984405241 #0
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    task:syz-executor361 state:D stack:21216 pid:5697  ppid:5119   flags:0x00004004
>    Call Trace:
>     <TASK>
>     context_switch kernel/sched/core.c:5293 [inline]
>     __schedule+0x995/0xe20 kernel/sched/core.c:6606
>     schedule+0xcb/0x190 kernel/sched/core.c:6682
>     rwsem_down_read_slowpath+0x5f9/0x930 kernel/locking/rwsem.c:1095
>     __down_read_common+0x54/0x2a0 kernel/locking/rwsem.c:1260
>     btrfs_page_mkwrite+0x417/0xc80 fs/btrfs/inode.c:8526
>     do_page_mkwrite+0x19e/0x5e0 mm/memory.c:2947
>     wp_page_shared+0x15e/0x380 mm/memory.c:3295
>     handle_pte_fault mm/memory.c:4949 [inline]
>     __handle_mm_fault mm/memory.c:5073 [inline]
>     handle_mm_fault+0x1b79/0x26b0 mm/memory.c:5219
>     do_user_addr_fault+0x69b/0xcb0 arch/x86/mm/fault.c:1428
>     handle_page_fault arch/x86/mm/fault.c:1519 [inline]
>     exc_page_fault+0x7a/0x110 arch/x86/mm/fault.c:1575
>     asm_exc_page_fault+0x22/0x30 arch/x86/include/asm/idtentry.h:570
>    RIP: 0010:copy_user_short_string+0xd/0x40 arch/x86/lib/copy_user_64.S:233
>    Code: 74 0a 89 (...)
>    RSP: 0018:ffffc9000570f330 EFLAGS: 00050202
>    RAX: ffffffff843e6601 RBX: 00007fffffffefc8 RCX: 0000000000000007
>    RDX: 0000000000000000 RSI: ffffc9000570f3e0 RDI: 0000000020000120
>    RBP: ffffc9000570f490 R08: 0000000000000000 R09: fffff52000ae1e83
>    R10: fffff52000ae1e83 R11: 1ffff92000ae1e7c R12: 0000000000000038
>    R13: ffffc9000570f3e0 R14: 0000000020000120 R15: ffffc9000570f3e0
>     copy_user_generic arch/x86/include/asm/uaccess_64.h:37 [inline]
>     raw_copy_to_user arch/x86/include/asm/uaccess_64.h:58 [inline]
>     _copy_to_user+0xe9/0x130 lib/usercopy.c:34
>     copy_to_user include/linux/uaccess.h:169 [inline]
>     fiemap_fill_next_extent+0x22e/0x410 fs/ioctl.c:144
>     emit_fiemap_extent+0x22d/0x3c0 fs/btrfs/extent_io.c:3458
>     fiemap_process_hole+0xa00/0xad0 fs/btrfs/extent_io.c:3716
>     extent_fiemap+0xe27/0x2100 fs/btrfs/extent_io.c:3922
>     btrfs_fiemap+0x172/0x1e0 fs/btrfs/inode.c:8209
>     ioctl_fiemap fs/ioctl.c:219 [inline]
>     do_vfs_ioctl+0x185b/0x2980 fs/ioctl.c:810
>     __do_sys_ioctl fs/ioctl.c:868 [inline]
>     __se_sys_ioctl+0x83/0x170 fs/ioctl.c:856
>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>    RIP: 0033:0x7f7d4054e9b9
>    RSP: 002b:00007f7d390d92f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    RAX: ffffffffffffffda RBX: 00007f7d405d87b0 RCX: 00007f7d4054e9b9
>    RDX: 0000000020000100 RSI: 00000000c020660b RDI: 0000000000000005
>    RBP: 00007f7d405a51d0 R08: 00007f7d390d9700 R09: 0000000000000000
>    R10: 00007f7d390d9700 R11: 0000000000000246 R12: 61635f65646f6e69
>    R13: 65646f7475616f6e R14: 7261637369646f6e R15: 00007f7d405d87b8
>     </TASK>
> 
> What happens is the following:
> 
> 1) Task A is doing an fsync, enters btrfs_sync_file() and flushes delalloc
>    before locking the inode and the i_mmap_lock semaphore, that is, before
>    calling btrfs_inode_lock();
> 
> 2) After task A flushes delalloc and before it calls btrfs_inode_lock(),
>    another task dirties a page;
> 
> 3) Task B starts a fiemap without FIEMAP_FLAG_SYNC, so the page dirtied
>    at step 2 remains dirty and unflushed. Then when it enters
>    extent_fiemap() and it locks a file range that includes the range of
>    the page dirtied in step 2;
> 
> 4) Task A calls btrfs_inode_lock() and locks the inode (VFS lock) and the
>    inode's i_mmap_lock semaphore in write mode. Then it tries to flush
>    delalloc by calling start_ordered_ops(), which will block, at
>    find_lock_delalloc_range(), when trying to lock the range of the page
>    dirtied at step 2, since this range was locked by the fiemap task (at
>    step 3);
> 
> 5) Task B generates a page fault when accessing the user space fiemap
>    buffer with a call to fiemap_fill_next_extent().
> 
>    The fault handler needs to call btrfs_page_mkwrite() for some other
>    page of our inode, and there we deadlock when trying to lock the
>    inode's i_mmap_lock semaphore in read mode, since the fsync task locked
>    it in write mode (step 4) and the fsync task can not progress because
>    it's waiting to lock a file range that is currently locked by us (the
>    fiemap task, step 3).
> 
> Fix this by taking the inode's lock (VFS lock) in shared mode when
> entering fiemap. This effectively serializes fiemap with fsync (except the
> most expensive part of fsync, the log sync), preventing this deadlock.
> 
> Reported-by: syzbot+cc35f55c41e34c30dcb5@syzkaller.appspotmail.com
> Link: https://lore.kernel.org/linux-btrfs/00000000000032dc7305f2a66f46@google.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
David Sterba Feb. 7, 2023, 3:40 p.m. UTC | #2
On Mon, Jan 23, 2023 at 04:54:46PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>

> Fix this by taking the inode's lock (VFS lock) in shared mode when
> entering fiemap. This effectively serializes fiemap with fsync (except the
> most expensive part of fsync, the log sync), preventing this deadlock.

Could this be a problem, when a continuous fiemap would block fsync on a
file? Fsync is more important and I'd give it priority when the inode
lock is contended, fiemap is only informative and best effort. The
deadlock needs to be fixed of course but some mechanism should be in
place to favor other lock holders than fiemap. Quick idea is to relock
after each say 100 extents.
Filipe Manana Feb. 7, 2023, 4:15 p.m. UTC | #3
On Tue, Feb 7, 2023 at 3:46 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Jan 23, 2023 at 04:54:46PM +0000, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
>
> > Fix this by taking the inode's lock (VFS lock) in shared mode when
> > entering fiemap. This effectively serializes fiemap with fsync (except the
> > most expensive part of fsync, the log sync), preventing this deadlock.
>
> Could this be a problem, when a continuous fiemap would block fsync on a
> file? Fsync is more important and I'd give it priority when the inode
> lock is contended, fiemap is only informative and best effort. The
> deadlock needs to be fixed of course but some mechanism should be in
> place to favor other lock holders than fiemap. Quick idea is to relock
> after each say 100 extents.

Typically fiemap is called with a small buffer for extent entries. cp
and filefrag for example use a small buffer.
So they repeatedly call fiemap with such a small buffer (not more than
a 100 or 200 entries).

Do you think it's that common to have fiemap and fsync attempts in parallel?

I don't think it's a problem, not only it shouldn't be common, but if
it happens it's for a very short time. Even for a very long fiemap
buffer.
David Sterba Feb. 8, 2023, 8:31 p.m. UTC | #4
On Tue, Feb 07, 2023 at 04:15:00PM +0000, Filipe Manana wrote:
> On Tue, Feb 7, 2023 at 3:46 PM David Sterba <dsterba@suse.cz> wrote:
> >
> > On Mon, Jan 23, 2023 at 04:54:46PM +0000, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> >
> > > Fix this by taking the inode's lock (VFS lock) in shared mode when
> > > entering fiemap. This effectively serializes fiemap with fsync (except the
> > > most expensive part of fsync, the log sync), preventing this deadlock.
> >
> > Could this be a problem, when a continuous fiemap would block fsync on a
> > file? Fsync is more important and I'd give it priority when the inode
> > lock is contended, fiemap is only informative and best effort. The
> > deadlock needs to be fixed of course but some mechanism should be in
> > place to favor other lock holders than fiemap. Quick idea is to relock
> > after each say 100 extents.
> 
> Typically fiemap is called with a small buffer for extent entries. cp
> and filefrag for example use a small buffer.
> So they repeatedly call fiemap with such a small buffer (not more than
> a 100 or 200 entries).

I see, so the buffer size can limit that naturally.

> Do you think it's that common to have fiemap and fsync attempts in parallel?
> 
> I don't think it's a problem, not only it shouldn't be common, but if
> it happens it's for a very short time. Even for a very long fiemap
> buffer.

Yeah it's not supposed to be common, I was thinking about the corner
case. The worst case is when several parallel fiemaps are running that
don't relinquish the inode lock and another thread writes+fsyncs the
file and stalls. Imagine use case like a monitoring tool for VM images
that watches used space due to thin provisioning and eg. qemu that uses
the image file. The fix is to do fiemap in small chunks as said above.
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9bd32daa9b9a..3bbf8703db2a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3826,6 +3826,7 @@  int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 	lockend = round_up(start + len, inode->root->fs_info->sectorsize);
 	prev_extent_end = lockstart;
 
+	btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
 	lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
 
 	ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end);
@@ -4019,6 +4020,7 @@  int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 
 out_unlock:
 	unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
+	btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
 out:
 	free_extent_state(delalloc_cached_state);
 	btrfs_free_backref_share_ctx(backref_ctx);