diff mbox

6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

Message ID 5ee26e48-d949-502e-5884-6369c9e6b278@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gabriel C Aug. 5, 2016, 8:03 p.m. UTC
On 04.08.2016 18:53, Lutz Vieweg wrote:
> 
> I was today hit by what I think is probably the same bug:
> A btrfs on a close-to-4TB sized block device, only half filled
> to almost exactly 2 TB, suddenly says "no space left on device"
> upon any attempt to write to it. The filesystem was NOT automatically
> switched to read-only by the kernel, I should mention.
> 
> Re-mounting (which is a pain as this filesystem is used for
> $HOMEs of a multitude of active users who I have to kick from
> the server for doing things like re-mounting) removed the symptom
> for now, but from what I can read in linux-btrfs mailing list
> archives, it pretty likely the symptom will re-appear.
> 
> Here are some more details:
> 
> Software versions:
>> linux-4.6.1 (vanilla from kernel.org)
...
> 
> dmesg output from the time the "no space left on device"-symptom
> appeared:
> 
>> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]

....
> ...
>> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]


Sounds like the bug I hit too also ..

To fix this you'll need :


crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf
commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c
Author: Chris Mason <clm@fb.com>
Date:   Tue Jul 19 05:52:36 2016 -0700

    Btrfs: fix delalloc accounting after copy_from_user faults

    Commit 56244ef151c3cd11 was almost but not quite enough to fix the
    reservation math after btrfs_copy_from_user returned partial copies.

    Some users are still seeing warnings in btrfs_destroy_inode, and with a
    long enough test run I'm able to trigger them as well.

    This patch fixes the accounting math again, bringing it much closer to
    the way it was before the sectorsize conversion Chandan did.  The
    problem is accounting for the offset into the page/sector when we do a
    partial copy.  This one just uses the dirty_sectors variable which
    should already be updated properly.

    Signed-off-by: Chris Mason <clm@fb.com>
    cc: stable@vger.kernel.org # v4.6+

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Lutz Vieweg Aug. 25, 2016, 3:48 p.m. UTC | #1
On 08/05/2016 10:03 PM, Gabriel C wrote:
> On 04.08.2016 18:53, Lutz Vieweg wrote:
>>
>> I was today hit by what I think is probably the same bug:
>> A btrfs on a close-to-4TB sized block device, only half filled
>> to almost exactly 2 TB, suddenly says "no space left on device"
>> upon any attempt to write to it. The filesystem was NOT automatically
>> switched to read-only by the kernel, I should mention.
>>
>> Re-mounting (which is a pain as this filesystem is used for
>> $HOMEs of a multitude of active users who I have to kick from
>> the server for doing things like re-mounting) removed the symptom
>> for now, but from what I can read in linux-btrfs mailing list
>> archives, it pretty likely the symptom will re-appear.
>>
>> Here are some more details:
>>
>> Software versions:
>>> linux-4.6.1 (vanilla from kernel.org)
> ...
>>
>> dmesg output from the time the "no space left on device"-symptom
>> appeared:
>>
>>> [5171203.601620] WARNING: CPU: 4 PID: 23208 at fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x263/0x2a0 [btrfs]
>
> ....
>> ...
>>> [5171230.306037] WARNING: CPU: 18 PID: 12656 at fs/btrfs/extent-tree.c:4233 btrfs_free_reserved_data_space_noquota+0xf3/0x100 [btrfs]
>
> Sounds like the bug I hit too also ..
>
> To fix this you'll need :
>
> crazy@zwerg:~/Work/linux-git$ git show 8b8b08cbf
> commit 8b8b08cbfb9021af4b54b4175fc4c51d655aac8c
> Author: Chris Mason <clm@fb.com>
> Date:   Tue Jul 19 05:52:36 2016 -0700
>
>      Btrfs: fix delalloc accounting after copy_from_user faults

Thanks for this hint!

Yesterday (20 days after the first time this bug struck us, and after re-mounting
the filesystem) we were hit by the same bug again - twice! - once in the morning,
and again in the evening.

That called for immediate action, and short of reverting the whole setup to XFS,
installing a new kernel with the above (and other) btrfs fix(es) was the one thing
I could try.

The system is now running linux-4.7.2, which does contain those patches.
If that doesn't fix it, we're really running out of options.

Regards,

Lutz Vieweg


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f3f61d1..bcfb4a2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1629,13 +1629,11 @@  again:
                 * managed to copy.
                 */
                if (num_sectors > dirty_sectors) {
-                       /*
-                        * we round down because we don't want to count
-                        * any partial blocks actually sent through the
-                        * IO machines
-                        */
-                       release_bytes = round_down(release_bytes - copied,
-                                     root->sectorsize);
+
+                       /* release everything except the sectors we dirtied */
+                       release_bytes -= dirty_sectors <<
+                               root->fs_info->sb->s_blocksize_bits;
+
                        if (copied > 0) {
                                spin_lock(&BTRFS_I(inode)->lock);
                                BTRFS_I(inode)->outstanding_extents++;