diff mbox

btrfs: fix crash when no drive supports DISCARD

Message ID 1305648031-26379-1-git-send-email-dsterba@suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

David Sterba May 17, 2011, 4 p.m. UTC
xfstests/013 crashes when the test partition is mounted with -o discard:

walk_up_log_tree
  btrfs_free_reserved_extent
    btrfs_discard_extent
      return -EOPNOTSUPP
    BUG_ON ret

btrfs_discard_extent() should be fine when drive does not support the
DISCARD operation and filter the EOPNOTSUPP retcode, but currently it
does this only when some bytes were succesfully discarded.

Signed-off-by: David Sterba <dsterba@suse.cz>
CC: stable@kernel.org
---
 fs/btrfs/extent-tree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Chris Mason May 17, 2011, 6:49 p.m. UTC | #1
Excerpts from David Sterba's message of 2011-05-17 12:00:31 -0400:
> xfstests/013 crashes when the test partition is mounted with -o discard:
> 
> walk_up_log_tree
>   btrfs_free_reserved_extent
>     btrfs_discard_extent
>       return -EOPNOTSUPP
>     BUG_ON ret
> 
> btrfs_discard_extent() should be fine when drive does not support the
> DISCARD operation and filter the EOPNOTSUPP retcode, but currently it
> does this only when some bytes were succesfully discarded.

Fair enough, we aren't really doing anything with EOPNOTSUPP now, so we
can just drop it in btrfs_discard_extent.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 20, 2011, 11:52 a.m. UTC | #2
On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
> that time, and maybe we could do somthing like remove the discard
> mount_opt in the fs_info so we can avoid calling it again.

I do not agree that discard should be removed from mount_opt, because
one may add TRIM-capable devices later, it's a whole filesystem option,
while. A disappeared discard mount option will probably cause panic
in administrator's head.

However, if a drive does not support TRIM, the btrfs_issue_discard calls
can take a shortcut and do not call up to blkdev_issue_discard (though
it does return immediatelly), caching the state after first call. But
this is matter of the lower level call (blkdev) and should not be
propagated beyond to the extent "level" (ie. btrfs_discard_extent).


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik May 20, 2011, 2:15 p.m. UTC | #3
On 05/20/2011 07:52 AM, David Sterba wrote:
> On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
>> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
>> that time, and maybe we could do somthing like remove the discard
>> mount_opt in the fs_info so we can avoid calling it again.
> 
> I do not agree that discard should be removed from mount_opt, because
> one may add TRIM-capable devices later, it's a whole filesystem option,
> while. A disappeared discard mount option will probably cause panic
> in administrator's head.
> 
> However, if a drive does not support TRIM, the btrfs_issue_discard calls
> can take a shortcut and do not call up to blkdev_issue_discard (though
> it does return immediatelly), caching the state after first call. But
> this is matter of the lower level call (blkdev) and should not be
> propagated beyond to the extent "level" (ie. btrfs_discard_extent).
> 

There ought to just be a flag added to btrfs_device to say whether or
not it supports discard, and if we get back EOPNOTSUPP we stop putting
discards down on that device, that way if we have some devices that do
it and some that don't (like for instance if we do that tiered caching
with ssd's thing) we can make sure discard is actually done on drives
that care about it.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Li Dongyang May 21, 2011, 8:26 a.m. UTC | #4
On Friday, May 20, 2011 10:15:46 PM Josef Bacik wrote:
> On 05/20/2011 07:52 AM, David Sterba wrote:
> > On Wed, May 18, 2011 at 11:29:14AM +0800, Li Dongyang wrote:
> >> Thanks for the fix, I thought EOPNOTSUPP could be useful by the caller
> >> that time, and maybe we could do somthing like remove the discard
> >> mount_opt in the fs_info so we can avoid calling it again.
> > 
> > I do not agree that discard should be removed from mount_opt, because
> > one may add TRIM-capable devices later, it's a whole filesystem option,
> > while. A disappeared discard mount option will probably cause panic
> > in administrator's head.
> > 
> > However, if a drive does not support TRIM, the btrfs_issue_discard calls
> > can take a shortcut and do not call up to blkdev_issue_discard (though
> > it does return immediatelly), caching the state after first call. But
> > this is matter of the lower level call (blkdev) and should not be
> > propagated beyond to the extent "level" (ie. btrfs_discard_extent).
> > 
> 
> There ought to just be a flag added to btrfs_device to say whether or
> not it supports discard, and if we get back EOPNOTSUPP we stop putting
> discards down on that device, that way if we have some devices that do
> it and some that don't (like for instance if we do that tiered caching
> with ssd's thing) we can make sure discard is actually done on drives
> that care about it.  Thanks,
> 
> Josef
> 
This should be the best way. but if we do not export the EOPNOTSUPP to callers
of btrfs_discard_extent, we will get no errors if we run fstrim on a btrfs which
all the devices don't have trim, I think this is not what we want.

Before commit 5378e60734f5b7bfe1b43dc191aaf6131c1befe7, btrfs_discard_extent
will only return the error of btrfs_map_block,so I think we should move the BUG_ONs in
to btrfs_discard_extent from the callers as actually they are testing the result of
btrfs_map_block.

Thanks
Li Dongyang
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9ee6bd5..feab2ab 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1790,7 +1790,7 @@  static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
 		}
 		kfree(multi);
 	}
-	if (discarded_bytes && ret == -EOPNOTSUPP)
+	if (ret == -EOPNOTSUPP)
 		ret = 0;
 
 	if (actual_bytes)