diff mbox

[RFC] xfs: fix multi-AG deadlock in xfs_bunmapi

Message ID 20170718172545.18065-1-hch@lst.de (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Christoph Hellwig July 18, 2017, 5:25 p.m. UTC
Just like in the allocator we must avoid touching multiple AGs out of
order when freeing blocks, as freeing still locks the AGF and can cause
the same AB-BA deadlocks as in the allocation path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Darrick J. Wong July 18, 2017, 6:15 p.m. UTC | #1
On Tue, Jul 18, 2017 at 07:25:45PM +0200, Christoph Hellwig wrote:
> Just like in the allocator we must avoid touching multiple AGs out of
> order when freeing blocks, as freeing still locks the AGF and can cause
> the same AB-BA deadlocks as in the allocation path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>

Looks ok, though I wonder what the bug report looked like. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0a9880777c9c..935adde72a8b 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5435,6 +5435,7 @@ __xfs_bunmapi(
>  	xfs_fsblock_t		sum;
>  	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
>  	xfs_fileoff_t		max_len;
> +	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
>  
>  	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
>  
> @@ -5534,6 +5535,17 @@ __xfs_bunmapi(
>  		 */
>  		del = got;
>  		wasdel = isnullstartblock(del.br_startblock);
> +
> +		/*
> +		 * Make sure we don't touch multiple AGF headers out of order
> +		 * in a single transaction, as that could cause AB-BA deadlocks.
> +		 */
> +		if (!wasdel) {
> +			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
> +			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
> +				break;
> +			prev_agno = agno;
> +		}
>  		if (got.br_startoff < start) {
>  			del.br_startoff = start;
>  			del.br_blockcount -= start - got.br_startoff;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 18, 2017, 6:27 p.m. UTC | #2
On 18.07.2017 21:15, Darrick J. Wong wrote:
> On Tue, Jul 18, 2017 at 07:25:45PM +0200, Christoph Hellwig wrote:
>> Just like in the allocator we must avoid touching multiple AGs out of
>> order when freeing blocks, as freeing still locks the AGF and can cause
>> the same AB-BA deadlocks as in the allocation path.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
> 
> Looks ok, though I wonder what the bug report looked like. :)

https://www.spinics.net/lists/linux-xfs/msg05918.html


> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
>> ---
>>  fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 0a9880777c9c..935adde72a8b 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -5435,6 +5435,7 @@ __xfs_bunmapi(
>>  	xfs_fsblock_t		sum;
>>  	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
>>  	xfs_fileoff_t		max_len;
>> +	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
>>  
>>  	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
>>  
>> @@ -5534,6 +5535,17 @@ __xfs_bunmapi(
>>  		 */
>>  		del = got;
>>  		wasdel = isnullstartblock(del.br_startblock);
>> +
>> +		/*
>> +		 * Make sure we don't touch multiple AGF headers out of order
>> +		 * in a single transaction, as that could cause AB-BA deadlocks.
>> +		 */
>> +		if (!wasdel) {
>> +			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
>> +			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
>> +				break;
>> +			prev_agno = agno;
>> +		}
>>  		if (got.br_startoff < start) {
>>  			del.br_startoff = start;
>>  			del.br_blockcount -= start - got.br_startoff;
>> -- 
>> 2.11.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong July 18, 2017, 6:33 p.m. UTC | #3
On Tue, Jul 18, 2017 at 09:27:43PM +0300, Nikolay Borisov wrote:
> 
> 
> On 18.07.2017 21:15, Darrick J. Wong wrote:
> > On Tue, Jul 18, 2017 at 07:25:45PM +0200, Christoph Hellwig wrote:
> >> Just like in the allocator we must avoid touching multiple AGs out of
> >> order when freeing blocks, as freeing still locks the AGF and can cause
> >> the same AB-BA deadlocks as in the allocation path.
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
> > 
> > Looks ok, though I wonder what the bug report looked like. :)
> 
> https://www.spinics.net/lists/linux-xfs/msg05918.html

Ah, ok.  That's what I suspected, so I'm glad to have confirmation.

--D

> 
> 
> > 
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> >> ---
> >>  fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++++++
> >>  1 file changed, 12 insertions(+)
> >>
> >> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> >> index 0a9880777c9c..935adde72a8b 100644
> >> --- a/fs/xfs/libxfs/xfs_bmap.c
> >> +++ b/fs/xfs/libxfs/xfs_bmap.c
> >> @@ -5435,6 +5435,7 @@ __xfs_bunmapi(
> >>  	xfs_fsblock_t		sum;
> >>  	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
> >>  	xfs_fileoff_t		max_len;
> >> +	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
> >>  
> >>  	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
> >>  
> >> @@ -5534,6 +5535,17 @@ __xfs_bunmapi(
> >>  		 */
> >>  		del = got;
> >>  		wasdel = isnullstartblock(del.br_startblock);
> >> +
> >> +		/*
> >> +		 * Make sure we don't touch multiple AGF headers out of order
> >> +		 * in a single transaction, as that could cause AB-BA deadlocks.
> >> +		 */
> >> +		if (!wasdel) {
> >> +			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
> >> +			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
> >> +				break;
> >> +			prev_agno = agno;
> >> +		}
> >>  		if (got.br_startoff < start) {
> >>  			del.br_startoff = start;
> >>  			del.br_blockcount -= start - got.br_startoff;
> >> -- 
> >> 2.11.0
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 19, 2017, 7:36 a.m. UTC | #4
On Tue, Jul 18, 2017 at 11:33:51AM -0700, Darrick J. Wong wrote:
> On Tue, Jul 18, 2017 at 09:27:43PM +0300, Nikolay Borisov wrote:
> > 
> > 
> > On 18.07.2017 21:15, Darrick J. Wong wrote:
> > > On Tue, Jul 18, 2017 at 07:25:45PM +0200, Christoph Hellwig wrote:
> > >> Just like in the allocator we must avoid touching multiple AGs out of
> > >> order when freeing blocks, as freeing still locks the AGF and can cause
> > >> the same AB-BA deadlocks as in the allocation path.
> > >>
> > >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> > >> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
> > > 
> > > Looks ok, though I wonder what the bug report looked like. :)
> > 
> > https://www.spinics.net/lists/linux-xfs/msg05918.html
> 
> Ah, ok.  That's what I suspected, so I'm glad to have confirmation.

Btw, I found a little bug when rebasing additional bunmapi work on
top of the patch - we need to add a !isrt to the conditional for
the AG check.  But I'll wait for test results from Nikolay before
resending it, as it should not affect normal non-RT setups.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 19, 2017, 1:11 p.m. UTC | #5
On 18.07.2017 20:25, Christoph Hellwig wrote:
> Just like in the allocator we must avoid touching multiple AGs out of
> order when freeing blocks, as freeing still locks the AGF and can cause
> the same AB-BA deadlocks as in the allocation path.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.d> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>

Unfortunately I this patch is not enough. I just tested on a 3.12 kernel ( the issue was easiest to repro there) 
and I still get stack traces. I will also test on a 4.4-based kernel as well but I assume the results are going to
be the same. 

Excerpt: 

[ 8167.898643] SysRq : Show Blocked State
[ 8167.902006] fio             D 0000000000000005     0 15744  15690 0x00000000
[ 8167.902006]  ffff88007f81d390 0000000000000086 0000000000000002 ffff8800952720a0
[ 8167.902006]  ffff88007f81dfd8 ffff88007f81dfd8 ffff88007f81dfd8 ffff8800952720a0
[ 8167.902006]  ffff88007a462f30 7fffffffffffffff ffff8800952720a0 0000000000000001
[ 8167.902006] Call Trace:
[ 8167.902006]  [<ffffffff81524ff9>] schedule_timeout+0x1d9/0x2a0
[ 8167.902006]  [<ffffffff81528960>] __down+0x70/0x100
[ 8167.902006]  [<ffffffff810a231c>] down+0x3c/0x50
[ 8167.902006]  [<ffffffffa01e0b13>] xfs_buf_lock+0x33/0x100 [xfs]
[ 8167.902006]  [<ffffffffa01e0dce>] _xfs_buf_find+0x1ee/0x420 [xfs]
[ 8167.902006]  [<ffffffffa01e12d3>] xfs_buf_get_map+0x23/0x200 [xfs]
[ 8167.902006]  [<ffffffffa01e2141>] xfs_buf_read_map+0x21/0x160 [xfs]
[ 8167.902006]  [<ffffffffa0254423>] xfs_trans_read_buf_map+0x413/0x5d0 [xfs]
[ 8167.902006]  [<ffffffffa0200561>] xfs_read_agf+0xa1/0x150 [xfs]
[ 8167.902006]  [<ffffffffa020066d>] xfs_alloc_read_agf+0x5d/0x180 [xfs]
[ 8167.902006]  [<ffffffffa0200c00>] xfs_alloc_fix_freelist+0x430/0x4c0 [xfs]
[ 8167.902006]  [<ffffffffa0200ebb>] xfs_alloc_vextent+0x22b/0x7b0 [xfs]
[ 8167.902006]  [<ffffffffa02143f7>] xfs_bmap_btalloc+0x407/0x960 [xfs]
[ 8167.902006]  [<ffffffffa021506e>] __xfs_bmapi_allocate+0xde/0x340 [xfs]
[ 8167.902006]  [<ffffffffa01ddc55>] xfs_bmapi_allocate+0xa5/0xb0 [xfs]
[ 8167.902006]  [<ffffffffa0215914>] xfs_bmapi_write+0x644/0xab0 [xfs]
[ 8167.902006]  [<ffffffffa01ef589>] xfs_iomap_write_direct+0x1e9/0x390 [xfs]
[ 8167.902006]  [<ffffffffa01d9f24>] __xfs_get_blocks+0x494/0x930 [xfs]
[ 8167.902006]  [<ffffffff811efbdc>] do_blockdev_direct_IO+0xadc/0x2ea0
[ 8167.902006]  [<ffffffffa01d8b48>] xfs_vm_direct_IO+0x78/0xa0 [xfs]
[ 8167.902006]  [<ffffffffa01e7751>] xfs_file_dio_aio_write.isra.9+0x171/0x430 [xfs]
[ 8167.902006]  [<ffffffffa01e7d35>] xfs_file_aio_write+0x155/0x170 [xfs]
[ 8167.902006]  [<ffffffff81201847>] do_io_submit+0x797/0x830
[ 8167.902006]  [<ffffffff81533249>] system_call_fastpath+0x16/0x1b
[ 8167.902006]  [<00007f73754296f7>] 0x7f73754296f6
[ 8167.902006] fio             D ffff8801394b20a0     0 15746  15690 0x00000000
[ 8167.902006]  ffff8800741c5390 0000000000000086 0000000000000010 ffff8801394b20a0
[ 8167.902006]  ffff8800741c5fd8 ffff8800741c5fd8 ffff8800741c5fd8 ffff8801394b20a0
[ 8167.902006]  ffff88007a462030 7fffffffffffffff ffff8801394b20a0 00000000015f9001
[ 8167.902006] Call Trace:
[ 8167.902006]  [<ffffffff81524ff9>] schedule_timeout+0x1d9/0x2a0
[ 8167.902006]  [<ffffffff81528960>] __down+0x70/0x100
[ 8167.902006]  [<ffffffff810a231c>] down+0x3c/0x50
[ 8167.902006]  [<ffffffffa01e0b13>] xfs_buf_lock+0x33/0x100 [xfs]
[ 8167.902006]  [<ffffffffa01e0dce>] _xfs_buf_find+0x1ee/0x420 [xfs]
[ 8167.902006]  [<ffffffffa01e12d3>] xfs_buf_get_map+0x23/0x200 [xfs]
[ 8167.902006]  [<ffffffffa01e2141>] xfs_buf_read_map+0x21/0x160 [xfs]
[ 8167.902006]  [<ffffffffa0254423>] xfs_trans_read_buf_map+0x413/0x5d0 [xfs]
[ 8167.902006]  [<ffffffffa0200561>] xfs_read_agf+0xa1/0x150 [xfs]
[ 8167.902006]  [<ffffffffa020066d>] xfs_alloc_read_agf+0x5d/0x180 [xfs]
[ 8167.902006]  [<ffffffffa0200c00>] xfs_alloc_fix_freelist+0x430/0x4c0 [xfs]
[ 8167.902006]  [<ffffffffa0200ebb>] xfs_alloc_vextent+0x22b/0x7b0 [xfs]
[ 8167.902006]  [<ffffffffa02143f7>] xfs_bmap_btalloc+0x407/0x960 [xfs]
[ 8167.902006]  [<ffffffffa021506e>] __xfs_bmapi_allocate+0xde/0x340 [xfs]
[ 8167.902006]  [<ffffffffa01ddc55>] xfs_bmapi_allocate+0xa5/0xb0 [xfs]
[ 8167.902006]  [<ffffffffa0215914>] xfs_bmapi_write+0x644/0xab0 [xfs]
[ 8167.902006]  [<ffffffffa01ef589>] xfs_iomap_write_direct+0x1e9/0x390 [xfs]
[ 8167.902006]  [<ffffffffa01d9f24>] __xfs_get_blocks+0x494/0x930 [xfs]
[ 8167.902006]  [<ffffffff811efbdc>] do_blockdev_direct_IO+0xadc/0x2ea0
[ 8167.902006]  [<ffffffffa01d8b48>] xfs_vm_direct_IO+0x78/0xa0 [xfs]
[ 8167.902006]  [<ffffffffa01e7751>] xfs_file_dio_aio_write.isra.9+0x171/0x430 [xfs]
[ 8167.902006]  [<ffffffffa01e7d35>] xfs_file_aio_write+0x155/0x170 [xfs]
[ 8167.902006]  [<ffffffff81201847>] do_io_submit+0x797/0x830
[ 8167.902006]  [<ffffffff81533249>] system_call_fastpath+0x16/0x1b
[ 8167.902006]  [<00007f73754296f7>] 0x7f73754296f6
[ 8167.902006] fio             D ffffffff81625900     0 15747  15690 0x00000000
[ 8167.902006]  ffff88004be11390 0000000000000086 00000000000003ff ffff8800b9a94140
[ 8167.902006]  ffff88004be11fd8 ffff88004be11fd8 ffff88004be11fd8 ffff8800b9a94140
[ 8167.902006]  ffff88007a462f30 7fffffffffffffff ffff8800b9a94140 0000000000000001
[ 8167.902006] Call Trace:
[ 8167.902006]  [<ffffffff81524ff9>] schedule_timeout+0x1d9/0x2a0
[ 8167.902006]  [<ffffffff81528960>] __down+0x70/0x100
[ 8167.902006]  [<ffffffff810a231c>] down+0x3c/0x50
[ 8167.902006]  [<ffffffffa01e0b13>] xfs_buf_lock+0x33/0x100 [xfs]
[ 8167.902006]  [<ffffffffa01e0dce>] _xfs_buf_find+0x1ee/0x420 [xfs]
[ 8167.902006]  [<ffffffffa01e12d3>] xfs_buf_get_map+0x23/0x200 [xfs]
[ 8167.902006]  [<ffffffffa01e2141>] xfs_buf_read_map+0x21/0x160 [xfs]
[ 8167.902006]  [<ffffffffa0254423>] xfs_trans_read_buf_map+0x413/0x5d0 [xfs]
[ 8167.902006]  [<ffffffffa0200561>] xfs_read_agf+0xa1/0x150 [xfs]
[ 8167.902006]  [<ffffffffa020066d>] xfs_alloc_read_agf+0x5d/0x180 [xfs]
[ 8167.902006]  [<ffffffffa0200c00>] xfs_alloc_fix_freelist+0x430/0x4c0 [xfs]
[ 8167.902006]  [<ffffffffa0200ebb>] xfs_alloc_vextent+0x22b/0x7b0 [xfs]
[ 8167.902006]  [<ffffffffa02143f7>] xfs_bmap_btalloc+0x407/0x960 [xfs]
[ 8167.902006]  [<ffffffffa021506e>] __xfs_bmapi_allocate+0xde/0x340 [xfs]
[ 8167.902006]  [<ffffffffa01ddc55>] xfs_bmapi_allocate+0xa5/0xb0 [xfs]
[ 8167.902006]  [<ffffffffa0215914>] xfs_bmapi_write+0x644/0xab0 [xfs]
[ 8167.902006]  [<ffffffffa01ef589>] xfs_iomap_write_direct+0x1e9/0x390 [xfs]
[ 8167.902006]  [<ffffffffa01d9f24>] __xfs_get_blocks+0x494/0x930 [xfs]
[ 8167.902006]  [<ffffffff811efbdc>] do_blockdev_direct_IO+0xadc/0x2ea0
[ 8167.902006]  [<ffffffffa01d8b48>] xfs_vm_direct_IO+0x78/0xa0 [xfs]
[ 8167.902006]  [<ffffffffa01e7751>] xfs_file_dio_aio_write.isra.9+0x171/0x430 [xfs]
[ 8167.902006]  [<ffffffffa01e7d35>] xfs_file_aio_write+0x155/0x170 [xfs]
[ 8167.902006]  [<ffffffff81201847>] do_io_submit+0x797/0x830
[ 8167.902006]  [<ffffffff81533249>] system_call_fastpath+0x16/0x1b
[ 8167.902006]  [<00007f73754296f7>] 0x7f73754296f6
[ 8167.902006] fio             D ffffffff81625900     0 15748  15690 0x00000000
[ 8167.902006]  ffff8800b5c35390 0000000000000086 00000000000003ff ffff8800b9a90000
[ 8167.902006]  ffff8800b5c35fd8 ffff8800b5c35fd8 ffff8800b5c35fd8 ffff8800b9a90000
[ 8167.902006]  ffff88007a462f30 7fffffffffffffff ffff8800b9a90000 0000000000000001
[ 8167.902006] Call Trace:
[ 8167.902006]  [<ffffffff81524ff9>] schedule_timeout+0x1d9/0x2a0
[ 8167.902006]  [<ffffffff81528960>] __down+0x70/0x100
[ 8167.902006]  [<ffffffffa01e0dce>] _xfs_buf_find+0x1ee/0x420 [xfs]
[ 8167.902006]  [<ffffffffa01e12d3>] xfs_buf_get_map+0x23/0x200 [xfs]
[ 8167.902006]  [<ffffffffa01e2141>] xfs_buf_read_map+0x21/0x160 [xfs]
[ 8167.902006]  [<ffffffffa0254423>] xfs_trans_read_buf_map+0x413/0x5d0 [xfs]
[ 8167.902006]  [<ffffffffa0200561>] xfs_read_agf+0xa1/0x150 [xfs]
[ 8167.902006]  [<ffffffffa020066d>] xfs_alloc_read_agf+0x5d/0x180 [xfs]
[ 8167.902006]  [<ffffffffa0200c00>] xfs_alloc_fix_freelist+0x430/0x4c0 [xfs]
[ 8167.902006]  [<ffffffffa0200ebb>] xfs_alloc_vextent+0x22b/0x7b0 [xfs]
[ 8167.902006]  [<ffffffffa02142ed>] xfs_bmap_btalloc+0x2fd/0x960 [xfs]
[ 8167.902006]  [<ffffffffa021506e>] __xfs_bmapi_allocate+0xde/0x340 [xfs]
[ 8167.902006]  [<ffffffffa01ddc55>] xfs_bmapi_allocate+0xa5/0xb0 [xfs]
[ 8167.902006]  [<ffffffffa0215914>] xfs_bmapi_write+0x644/0xab0 [xfs]
[ 8167.902006]  [<ffffffffa01ef589>] xfs_iomap_write_direct+0x1e9/0x390 [xfs]
[ 8167.902006]  [<ffffffffa01d9f24>] __xfs_get_blocks+0x494/0x930 [xfs]
[ 8167.902006]  [<ffffffff811efbdc>] do_blockdev_direct_IO+0xadc/0x2ea0
[ 8167.902006]  [<ffffffffa01d8b48>] xfs_vm_direct_IO+0x78/0xa0 [xfs]
[ 8167.902006]  [<ffffffffa01e7751>] xfs_file_dio_aio_write.isra.9+0x171/0x430 [xfs]
[ 8167.902006]  [<ffffffffa01e7d35>] xfs_file_aio_write+0x155/0x170 [xfs]
[ 8167.902006]  [<ffffffff81201847>] do_io_submit+0x797/0x830
[ 8167.902006]  [<ffffffff81533249>] system_call_fastpath+0x16/0x1b
[ 8167.902006]  [<00007f73754296f7>] 0x7f73754296f6
[ 8167.902006] fio             D ffffffff81625900     0 15749  15690 0x00000000
[ 8167.902006]  ffff8800b5c0fc10 0000000000000082 0000000000000003 ffff8800b8800000
[ 8167.902006]  ffff8800b5c0ffd8 ffff8800b5c0ffd8 ffff8800b5c0ffd8 ffff8800b8800000
[ 8167.902006]  ffff8800b5c0fd20 ffff8800b5c0fd18 7fffffffffffffff ffff8800b8800000
[ 8167.902006] xfs_io          D ffffffff81625900     0 15854  15342 0x00000000
[ 8167.902006]  ffff880095179910 0000000000000082 ffff8801396f9c80 ffff88004bc40000
[ 8167.902006]  ffff880095179fd8 ffff880095179fd8 ffff880095179fd8 ffff88004bc40000
[ 8167.902006]  ffff88007a462030 7fffffffffffffff ffff88004bc40000 00000000015f9001
[ 8167.902006] Call Trace:
[ 8167.902006]  [<ffffffff81524ff9>] schedule_timeout+0x1d9/0x2a0
[ 8167.902006]  [<ffffffff81528960>] __down+0x70/0x100
[ 8167.902006]  [<ffffffff810a231c>] down+0x3c/0x50
[ 8167.902006]  [<ffffffffa01e0b13>] xfs_buf_lock+0x33/0x100 [xfs]
[ 8167.902006]  [<ffffffffa01e0dce>] _xfs_buf_find+0x1ee/0x420 [xfs]
[ 8167.902006]  [<ffffffffa01e12d3>] xfs_buf_get_map+0x23/0x200 [xfs]
[ 8167.902006]  [<ffffffffa01e2141>] xfs_buf_read_map+0x21/0x160 [xfs]
[ 8167.902006]  [<ffffffffa0254423>] xfs_trans_read_buf_map+0x413/0x5d0 [xfs]
[ 8167.902006]  [<ffffffffa0200561>] xfs_read_agf+0xa1/0x150 [xfs]
[ 8167.902006]  [<ffffffffa020066d>] xfs_alloc_read_agf+0x5d/0x180 [xfs]
[ 8167.902006]  [<ffffffffa0200c00>] xfs_alloc_fix_freelist+0x430/0x4c0 [xfs]
[ 8167.902006]  [<ffffffffa0201509>] xfs_free_extent+0xc9/0x140 [xfs]
[ 8167.902006]  [<ffffffffa01dd69a>] xfs_bmap_finish+0x15a/0x1b0 [xfs]
[ 8167.902006]  [<ffffffffa023a21e>] xfs_itruncate_extents+0x2ce/0x470 [xfs]
[ 8167.902006]  [<ffffffffa01f19c3>] xfs_setattr_size+0x353/0x3f0 [xfs]
[ 8167.902006]  [<ffffffffa01f1b70>] xfs_vn_setattr+0x60/0x80 [xfs]
[ 8167.902006]  [<ffffffff811d0a51>] notify_change+0x241/0x3b0
[ 8167.902006]  [<ffffffff811b1b92>] do_truncate+0x62/0x90
[ 8167.902006]  [<ffffffff811b1f0b>] do_sys_ftruncate.constprop.11+0x12b/0x180
[ 8167.902006]  [<ffffffff81533249>] system_call_fastpath+0x16/0x1b
[ 8167.902006]  [<00007f5cb95239f7>] 0x7f5cb95239f6

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 19, 2017, 3:09 p.m. UTC | #6
On 19.07.2017 16:11, Nikolay Borisov wrote:
> 
> 
> On 18.07.2017 20:25, Christoph Hellwig wrote:
>> Just like in the allocator we must avoid touching multiple AGs out of
>> order when freeing blocks, as freeing still locks the AGF and can cause
>> the same AB-BA deadlocks as in the allocation path.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.d> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
> 
> Unfortunately I this patch is not enough. I just tested on a 3.12 kernel ( the issue was easiest to repro there) 
> and I still get stack traces. I will also test on a 4.4-based kernel as well but I assume the results are going to
> be the same. 

I can confirm that on 4.4 I also hit the same deadlock. So this patch is
not sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 20, 2017, 7:47 a.m. UTC | #7
On Wed, Jul 19, 2017 at 06:09:29PM +0300, Nikolay Borisov wrote:
> I can confirm that on 4.4 I also hit the same deadlock. So this patch is
> not sufficient.

Odd.  And the patch to always just process a single extent from
xfs_itruncate_extents fixes the issue for you, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 20, 2017, 7:47 a.m. UTC | #8
On 20.07.2017 10:47, Christoph Hellwig wrote:
> Odd.  And the patch to always just process a single extent from
> xfs_itruncate_extents fixes the issue for you, right?

As far as I remember yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 20, 2017, 7:49 a.m. UTC | #9
On Thu, Jul 20, 2017 at 10:47:50AM +0300, Nikolay Borisov wrote:
> 
> 
> On 20.07.2017 10:47, Christoph Hellwig wrote:
> > Odd.  And the patch to always just process a single extent from
> > xfs_itruncate_extents fixes the issue for you, right?
> 
> As far as I remember yes.

Can you retest it under otherwise identical conditions?
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 20, 2017, 7:51 a.m. UTC | #10
On 20.07.2017 10:49, Christoph Hellwig wrote:
>>> Odd.  And the patch to always just process a single extent from
>>> xfs_itruncate_extents fixes the issue for you, right?
>> As far as I remember yes.
> Can you retest it under otherwise identical conditions?
Will report back when I have the results.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 20, 2017, 2:58 p.m. UTC | #11
On 20.07.2017 10:49, Christoph Hellwig wrote:
> On Thu, Jul 20, 2017 at 10:47:50AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 20.07.2017 10:47, Christoph Hellwig wrote:
>>> Odd.  And the patch to always just process a single extent from
>>> xfs_itruncate_extents fixes the issue for you, right?
>>
>> As far as I remember yes.
> 
> Can you retest it under otherwise identical conditions?

Okay, I confirm that indeed changing the define from 2 to 1 fixes the
issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 21, 2017, 10:26 a.m. UTC | #12
On Thu, Jul 20, 2017 at 05:58:42PM +0300, Nikolay Borisov wrote:
> Okay, I confirm that indeed changing the define from 2 to 1 fixes the
> issue.

Thanks for the report, although that leaves me puzzling why..
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 26, 2017, 1:04 p.m. UTC | #13
On Thu, Jul 20, 2017 at 05:58:42PM +0300, Nikolay Borisov wrote:
> 
> 
> On 20.07.2017 10:49, Christoph Hellwig wrote:
> > On Thu, Jul 20, 2017 at 10:47:50AM +0300, Nikolay Borisov wrote:
> >>
> >>
> >> On 20.07.2017 10:47, Christoph Hellwig wrote:
> >>> Odd.  And the patch to always just process a single extent from
> >>> xfs_itruncate_extents fixes the issue for you, right?
> >>
> >> As far as I remember yes.
> > 
> > Can you retest it under otherwise identical conditions?
> 
> Okay, I confirm that indeed changing the define from 2 to 1 fixes the
> issue.

But this is just on 4.4 and older, right?  We had issues in the
allocation path there as well, which should be fixed in 4.11 and later,
and in recrnt 4.9-stable releases.  Without those the allocation path
might sometimes lock out of order as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 26, 2017, 1:59 p.m. UTC | #14
On 26.07.2017 16:04, Christoph Hellwig wrote:
> But this is just on 4.4 and older, right?  We had issues in the
> allocation path there as well, which should be fixed in 4.11 and later,
> and in recrnt 4.9-stable releases.  Without those the allocation path
> might sometimes lock out of order as well.

I will try to repro on 4.11 WITHOUT your patch just to see that the
issue happens there as well. But if memory serves me right on newer
kernels it was much harder to hit (but not impossible).
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 28, 2017, 6:05 a.m. UTC | #15
On 26.07.2017 16:04, Christoph Hellwig wrote:
> On Thu, Jul 20, 2017 at 05:58:42PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 20.07.2017 10:49, Christoph Hellwig wrote:
>>> On Thu, Jul 20, 2017 at 10:47:50AM +0300, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 20.07.2017 10:47, Christoph Hellwig wrote:
>>>>> Odd.  And the patch to always just process a single extent from
>>>>> xfs_itruncate_extents fixes the issue for you, right?
>>>>
>>>> As far as I remember yes.
>>>
>>> Can you retest it under otherwise identical conditions?
>>
>> Okay, I confirm that indeed changing the define from 2 to 1 fixes the
>> issue.
> 
> But this is just on 4.4 and older, right?  We had issues in the
> allocation path there as well, which should be fixed in 4.11 and later,
> and in recrnt 4.9-stable releases.  Without those the allocation path
> might sometimes lock out of order as well.
> 

Couldn't repro on 4.11 and 4.10 (which supposedly shouldn't have the
fixes). I tried running generic/299 200 times on each kernel. Which
exactly are the patches in 4.11 which fix the alloc path?

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 31, 2017, 12:09 p.m. UTC | #16
Hi Nikolay,

On Fri, Jul 28, 2017 at 09:05:51AM +0300, Nikolay Borisov wrote:
> Couldn't repro on 4.11 and 4.10 (which supposedly shouldn't have the
> fixes). I tried running generic/299 200 times on each kernel. Which
> exactly are the patches in 4.11 which fix the alloc path?

My suspects in 4.11 would be:

xfs: don't fail xfs_extent_busy allocation
xfs: improve handling of busy extents in the low-level allocator
xfs: fix len comparison in xfs_extent_busy_trim

4.10 also had a couple allocator fixes, but nothing that looks
directly related.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0a9880777c9c..935adde72a8b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5435,6 +5435,7 @@  __xfs_bunmapi(
 	xfs_fsblock_t		sum;
 	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
 	xfs_fileoff_t		max_len;
+	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
 
 	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
 
@@ -5534,6 +5535,17 @@  __xfs_bunmapi(
 		 */
 		del = got;
 		wasdel = isnullstartblock(del.br_startblock);
+
+		/*
+		 * Make sure we don't touch multiple AGF headers out of order
+		 * in a single transaction, as that could cause AB-BA deadlocks.
+		 */
+		if (!wasdel) {
+			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
+			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
+				break;
+			prev_agno = agno;
+		}
 		if (got.br_startoff < start) {
 			del.br_startoff = start;
 			del.br_blockcount -= start - got.br_startoff;