Message ID | 20180709202549.auxwkb6memlegb4a@eaf (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi: On Mon, Jul 09, 2018 at 02:16:33PM -0700, Andrew Morton wrote: > On Mon, 9 Jul 2018 17:25:52 -0300 Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> wrote: > > > After an extent is removed from the extent tree, the corresponding bits > > are also cleared from the block allocation file. This is currently done > > without releasing the tree lock. > > > > The problem is that the allocation file has extents of its own; if it is > > fragmented enough, some of them may be in the extent tree as well, and > > hfsplus_get_block() will try to take the lock again. > > > > To avoid deadlock, only hold the extent tree lock during the actual tree > > operations. > > There's now a whole bunch of code which no longer has ->tree_lock > coverage. Are you sure this doesn't introduce races? > > Do we know what specifically the tree_lock is protecting? That doesn't > just mean what is it *supposed* to protect - it might be protecting > other things by accident... The tree_lock is only taken in two other places: hfsplus_ext_read_extent() and hfsplus_ext_write_extent_locked(). This is always for the purpose of working with the extent tree, and hip->extents_lock is taken beforehand to protect the extents kept in the inode. hfsplus_free_extents() doesn't work with the extents in the tree; only with local copies (in hfsplus_free_fork()), or with the cached copies and in-inode extents (in hfsplus_file_truncate()). Those are protected by hip->extents_lock, and are marked as dirty if needed so they can later be copied to the extent tree, under the tree_lock. So hfsplus_free_extents() doesn't do anything that could cause a race until the call to hfsplus_block_free(), which quickly takes the alloc mutex (and will take the extent tree_lock again if it's needed, as well as the extents_lock of the alloc inode itself). There is one problem though. If the extent file itself is full and needs new blocks, a call to hfsplus_ext_write_extent_locked() will at some point call hfsplus_block_allocate() and try to take the alloc mutex while holding the tree_lock. This could deadlock with hfsplus_block_free() because of the inverted order of the locks (only if the allocation file is fragmented enough to have extents in the tree). I mostly ignored situations where the extent file needs to grow and the allocation file has extents in the tree, because even without my patch those will always deadlock at hfsplus_file_extend() when hfsplus_block_allocate() tries to take the extent tree_lock again. And the next call to hfsplus_ext_write_extent_locked() would also deadlock, if it was ever reached. The bug here is that hfsplus_file_extend() was clearly never supposed to be called while holding ->tree_lock, since it takes hip->extents_lock. Even more potential for deadlock there. I think the solution to this whole mess is to drop the lock of the extent tree whenever possible, and keep the locking order as: hip->extents_lock alloc_mutex alloc_hip->extents_lock ->tree_lock If you prefer I can try to solve the whole thing in a single patch, but I first need to understand better what happens when the extent file gains new in-tree extents itself. Of course if you can think of a simpler solution it would be great. I am overwhelmed by this. > > > > --- a/fs/hfsplus/extents.c > > +++ b/fs/hfsplus/extents.c > > > > ... > > > > @@ -576,15 +581,20 @@ void hfsplus_file_truncate(struct inode *inode) > > } > > while (1) { > > if (alloc_cnt == hip->first_blocks) { > > + mutex_unlock(&fd.tree->tree_lock); > > hfsplus_free_extents(sb, hip->first_extents, > > alloc_cnt, alloc_cnt - blk_cnt); > > hfsplus_dump_extent(hip->first_extents); > > hip->first_blocks = blk_cnt; > > + mutex_lock(&fd.tree->tree_lock); > > break; > > } > > offtopic: hfsplus_free_extents() does hfsplus_dump_extent(), so I > wonder whether this hfsplus_dump_extent() call is duplicative. hfsplus_free_extents() does make modifications to the extents outside the tree, so one may want to check what they look like after the call. I hope I was clear. Thanks, Ernest > > > > > ... > >
I forgot to mention that hfsplus_free_extents() is already being called from hfsplus_free_fork() without the tree_lock taken. So any issues this causes are already there.
diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c index e8770935ce6d..8e0f59767694 100644 --- a/fs/hfsplus/extents.c +++ b/fs/hfsplus/extents.c @@ -336,6 +336,9 @@ static int hfsplus_free_extents(struct super_block *sb, int i; int err = 0; + /* Mapping the allocation file may lock the extent tree */ + WARN_ON(mutex_is_locked(&HFSPLUS_SB(sb)->ext_tree->tree_lock)); + hfsplus_dump_extent(extent); for (i = 0; i < 8; extent++, i++) { count = be32_to_cpu(extent->block_count); @@ -415,11 +418,13 @@ int hfsplus_free_fork(struct super_block *sb, u32 cnid, if (res) break; start = be32_to_cpu(fd.key->ext.start_block); - hfsplus_free_extents(sb, ext_entry, - total_blocks - start, - total_blocks); hfs_brec_remove(&fd); + + mutex_unlock(&fd.tree->tree_lock); + hfsplus_free_extents(sb, ext_entry, total_blocks - start, + total_blocks); total_blocks = start; + mutex_lock(&fd.tree->tree_lock); } while (total_blocks > blocks); hfs_find_exit(&fd); @@ -576,15 +581,20 @@ void hfsplus_file_truncate(struct inode *inode) } while (1) { if (alloc_cnt == hip->first_blocks) { + mutex_unlock(&fd.tree->tree_lock); hfsplus_free_extents(sb, hip->first_extents, alloc_cnt, alloc_cnt - blk_cnt); hfsplus_dump_extent(hip->first_extents); hip->first_blocks = blk_cnt; + mutex_lock(&fd.tree->tree_lock); break; } res = __hfsplus_ext_cache_extent(&fd, inode, alloc_cnt); if (res) break; + hfs_brec_remove(&fd); + + mutex_unlock(&fd.tree->tree_lock); start = hip->cached_start; hfsplus_free_extents(sb, hip->cached_extents, alloc_cnt - start, alloc_cnt - blk_cnt); @@ -596,7 +606,7 @@ void hfsplus_file_truncate(struct inode *inode) alloc_cnt = start; hip->cached_start = hip->cached_blocks = 0; hip->extent_state &= ~(HFSPLUS_EXT_DIRTY | HFSPLUS_EXT_NEW); - hfs_brec_remove(&fd); + mutex_lock(&fd.tree->tree_lock); } hfs_find_exit(&fd);
After an extent is removed from the extent tree, the corresponding bits are also cleared from the block allocation file. This is currently done without releasing the tree lock. The problem is that the allocation file has extents of its own; if it is fragmented enough, some of them may be in the extent tree as well, and hfsplus_get_block() will try to take the lock again. To avoid deadlock, only hold the extent tree lock during the actual tree operations. Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> --- fs/hfsplus/extents.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)