Message ID | 20180803124526.18497-2-anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: handle the BUG_ON in btrfs_num_devices() | expand |
On 3.08.2018 15:45, Anand Jain wrote: > Its a logical bug if we hit fs_devices::num_devices == 1 and if the > replace is running because, as fs_devices::num_devices counts the in memory > devices, so it should include the replace target which is running as > indicated by the flag. If this happens return the -EINVAL back. > > Suggested-by: Nikolay Borisov <nborisov@suse.com> > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > Hi, > As it fixes the BUG_ON I have spun a new patch for this. > Instead of -EINVAL should we use ASSERT? > > fs/btrfs/volumes.c | 27 ++++++++++++++++++++------- > 1 file changed, 20 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 7359596ac8eb..ed2399caff80 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1855,9 +1855,11 @@ void btrfs_assign_next_active_device(struct btrfs_device *device, > } > > /* Returns btrfs_fs_devices::num_devices minus replace device if any */ > -static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) > +static int btrfs_num_devices(struct btrfs_fs_info *fs_info, u64 *num_devices) Why do you resort to this travesty of returning the value in an input parameter? Having the function return int, assuming that we will always have a positive device num and in case of an error return a negative value. In the worst case when we get to see a btrfs fs consisting of 2 billion devices then we can start worrying that an int here won't do it. > { > - u64 num_devices = fs_info->fs_devices->num_devices; > + int ret = 0; > + > + *num_devices = fs_info->fs_devices->num_devices; > > /* > * balance and replace co-exists in a scenario as below.. > @@ -1867,12 +1869,13 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) > */ > btrfs_dev_replace_read_lock(&fs_info->dev_replace); > if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) { > - BUG_ON(num_devices < 1); > - num_devices--; > + if (*num_devices < 1) > + ret = -EINVAL; > + (*num_devices)--; > } > btrfs_dev_replace_read_unlock(&fs_info->dev_replace); > > - return num_devices; > + return ret; > } > > int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, > @@ -1886,7 +1889,12 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, > > mutex_lock(&uuid_mutex); > > - num_devices = btrfs_num_devices(fs_info); > + ret = btrfs_num_devices(fs_info, &num_devices); > + if (ret) { The canonical form, used across the whole code base of btrfs, for checking for an error is 'if (ret <0)' as such please stick to it in this and all future patches. (I have a vague recollection this is not the first time I have given you this feedback) > + btrfs_err(fs_info, "logical bug num_devices %llu < 0", > + num_devices); > + return ret; > + } > > ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1); > if (ret) > @@ -3755,7 +3763,12 @@ int btrfs_balance(struct btrfs_fs_info *fs_info, > } > } > > - num_devices = btrfs_num_devices(fs_info); > + ret = btrfs_num_devices(fs_info, &num_devices); > + if (ret) { ditto > + btrfs_err(fs_info, "hits a logical bug num_devices %llu < 0", > + num_devices); > + return ret; > + } > > allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP; > if (num_devices > 1) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/03/2018 09:33 PM, Nikolay Borisov wrote: > > > On 3.08.2018 15:45, Anand Jain wrote: >> Its a logical bug if we hit fs_devices::num_devices == 1 and if the >> replace is running because, as fs_devices::num_devices counts the in memory >> devices, so it should include the replace target which is running as >> indicated by the flag. If this happens return the -EINVAL back. >> >> Suggested-by: Nikolay Borisov <nborisov@suse.com> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> Hi, >> As it fixes the BUG_ON I have spun a new patch for this. >> Instead of -EINVAL should we use ASSERT? >> >> fs/btrfs/volumes.c | 27 ++++++++++++++++++++------- >> 1 file changed, 20 insertions(+), 7 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 7359596ac8eb..ed2399caff80 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -1855,9 +1855,11 @@ void btrfs_assign_next_active_device(struct btrfs_device *device, >> } >> >> /* Returns btrfs_fs_devices::num_devices minus replace device if any */ >> -static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) >> +static int btrfs_num_devices(struct btrfs_fs_info *fs_info, u64 *num_devices) > > Why do you resort to this travesty of returning the value in an input > parameter? Having the function return int, assuming that we will always > have a positive device num and in case of an error return a negative > value. In the worst case when we get to see a btrfs fs consisting of 2 > billion devices then we can start worrying that an int here won't do it. Its theoretically wrong. I wonder if David is OK with this? Will wait for his comments. >> { >> - u64 num_devices = fs_info->fs_devices->num_devices; >> + int ret = 0; >> + >> + *num_devices = fs_info->fs_devices->num_devices; >> >> /* >> * balance and replace co-exists in a scenario as below.. >> @@ -1867,12 +1869,13 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) >> */ >> btrfs_dev_replace_read_lock(&fs_info->dev_replace); >> if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) { >> - BUG_ON(num_devices < 1); >> - num_devices--; >> + if (*num_devices < 1) >> + ret = -EINVAL; >> + (*num_devices)--; >> } >> btrfs_dev_replace_read_unlock(&fs_info->dev_replace); >> >> - return num_devices; >> + return ret; >> } >> >> int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, >> @@ -1886,7 +1889,12 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, >> >> mutex_lock(&uuid_mutex); >> >> - num_devices = btrfs_num_devices(fs_info); >> + ret = btrfs_num_devices(fs_info, &num_devices); >> + if (ret) { > > The canonical form, used across the whole code base of btrfs, for > checking for an error is 'if (ret <0)' as such please stick to it in > this and all future patches. Not a big deal will fix. > (I have a vague recollection this is not the first time I have given you > this feedback) >> + btrfs_err(fs_info, "logical bug num_devices %llu < 0", >> + num_devices); >> + return ret; >> + } >> >> ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1); >> if (ret) >> @@ -3755,7 +3763,12 @@ int btrfs_balance(struct btrfs_fs_info *fs_info, >> } >> } >> >> - num_devices = btrfs_num_devices(fs_info); >> + ret = btrfs_num_devices(fs_info, &num_devices); >> + if (ret) { > ditto ok. Thanks, Anand >> + btrfs_err(fs_info, "hits a logical bug num_devices %llu < 0", >> + num_devices); >> + return ret; >> + } >> >> allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP; >> if (num_devices > 1) >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 06, 2018 at 04:57:45PM +0800, Anand Jain wrote: > > > On 08/03/2018 09:33 PM, Nikolay Borisov wrote: > > > > > > On 3.08.2018 15:45, Anand Jain wrote: > >> Its a logical bug if we hit fs_devices::num_devices == 1 and if the > >> replace is running because, as fs_devices::num_devices counts the in memory > >> devices, so it should include the replace target which is running as > >> indicated by the flag. If this happens return the -EINVAL back. > >> > >> Suggested-by: Nikolay Borisov <nborisov@suse.com> > >> Signed-off-by: Anand Jain <anand.jain@oracle.com> > >> --- > >> Hi, > >> As it fixes the BUG_ON I have spun a new patch for this. > >> Instead of -EINVAL should we use ASSERT? > >> > >> fs/btrfs/volumes.c | 27 ++++++++++++++++++++------- > >> 1 file changed, 20 insertions(+), 7 deletions(-) > >> > >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > >> index 7359596ac8eb..ed2399caff80 100644 > >> --- a/fs/btrfs/volumes.c > >> +++ b/fs/btrfs/volumes.c > >> @@ -1855,9 +1855,11 @@ void btrfs_assign_next_active_device(struct btrfs_device *device, > >> } > >> > >> /* Returns btrfs_fs_devices::num_devices minus replace device if any */ > >> -static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) > >> +static int btrfs_num_devices(struct btrfs_fs_info *fs_info, u64 *num_devices) > > > > Why do you resort to this travesty of returning the value in an input > > parameter? Having the function return int, assuming that we will always > > have a positive device num and in case of an error return a negative > > value. In the worst case when we get to see a btrfs fs consisting of 2 > > billion devices then we can start worrying that an int here won't do it. > > Its theoretically wrong. I wonder if David is OK with this? Will wait > for his comments. Theoretically, as in having 2^64 devices where the numbers would clash with error code. Practically we're talking about tens maybe hundreds of devices, anything that fits to 32 bits in the forseeable future. Yes I agree with Nikolai. But the BUG_ON can be removed, or the number of devices can be checked in advance as mentioned in other mails, I'm starting to lose track of what's the last version. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/08/2018 01:09 AM, David Sterba wrote: > On Mon, Aug 06, 2018 at 04:57:45PM +0800, Anand Jain wrote: >> >> >> On 08/03/2018 09:33 PM, Nikolay Borisov wrote: >>> >>> >>> On 3.08.2018 15:45, Anand Jain wrote: >>>> Its a logical bug if we hit fs_devices::num_devices == 1 and if the >>>> replace is running because, as fs_devices::num_devices counts the in memory >>>> devices, so it should include the replace target which is running as >>>> indicated by the flag. If this happens return the -EINVAL back. >>>> >>>> Suggested-by: Nikolay Borisov <nborisov@suse.com> >>>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>>> --- >>>> Hi, >>>> As it fixes the BUG_ON I have spun a new patch for this. >>>> Instead of -EINVAL should we use ASSERT? >>>> >>>> fs/btrfs/volumes.c | 27 ++++++++++++++++++++------- >>>> 1 file changed, 20 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>>> index 7359596ac8eb..ed2399caff80 100644 >>>> --- a/fs/btrfs/volumes.c >>>> +++ b/fs/btrfs/volumes.c >>>> @@ -1855,9 +1855,11 @@ void btrfs_assign_next_active_device(struct btrfs_device *device, >>>> } >>>> >>>> /* Returns btrfs_fs_devices::num_devices minus replace device if any */ >>>> -static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) >>>> +static int btrfs_num_devices(struct btrfs_fs_info *fs_info, u64 *num_devices) >>> >>> Why do you resort to this travesty of returning the value in an input >>> parameter? Having the function return int, assuming that we will always >>> have a positive device num and in case of an error return a negative >>> value. In the worst case when we get to see a btrfs fs consisting of 2 >>> billion devices then we can start worrying that an int here won't do it. >> >> Its theoretically wrong. I wonder if David is OK with this? Will wait >> for his comments. > > Theoretically, as in having 2^64 devices where the numbers would clash > with error code. Practically we're talking about tens maybe hundreds of > devices, anything that fits to 32 bits in the forseeable future. Yes I > agree with Nikolai. > > But the BUG_ON can be removed, or the number of devices can be checked > in advance as mentioned in other mails, I'm starting to lose track of > what's the last version. Other email thread [1] were talking about replacing the BUG_ON with ASSERT, which means we don't need this patch. [1] [PATCH v3 4/4] btrfs: add helper btrfs_num_devices() to deduce num_devices Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7359596ac8eb..ed2399caff80 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1855,9 +1855,11 @@ void btrfs_assign_next_active_device(struct btrfs_device *device, } /* Returns btrfs_fs_devices::num_devices minus replace device if any */ -static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) +static int btrfs_num_devices(struct btrfs_fs_info *fs_info, u64 *num_devices) { - u64 num_devices = fs_info->fs_devices->num_devices; + int ret = 0; + + *num_devices = fs_info->fs_devices->num_devices; /* * balance and replace co-exists in a scenario as below.. @@ -1867,12 +1869,13 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info) */ btrfs_dev_replace_read_lock(&fs_info->dev_replace); if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) { - BUG_ON(num_devices < 1); - num_devices--; + if (*num_devices < 1) + ret = -EINVAL; + (*num_devices)--; } btrfs_dev_replace_read_unlock(&fs_info->dev_replace); - return num_devices; + return ret; } int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, @@ -1886,7 +1889,12 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path, mutex_lock(&uuid_mutex); - num_devices = btrfs_num_devices(fs_info); + ret = btrfs_num_devices(fs_info, &num_devices); + if (ret) { + btrfs_err(fs_info, "logical bug num_devices %llu < 0", + num_devices); + return ret; + } ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1); if (ret) @@ -3755,7 +3763,12 @@ int btrfs_balance(struct btrfs_fs_info *fs_info, } } - num_devices = btrfs_num_devices(fs_info); + ret = btrfs_num_devices(fs_info, &num_devices); + if (ret) { + btrfs_err(fs_info, "hits a logical bug num_devices %llu < 0", + num_devices); + return ret; + } allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP; if (num_devices > 1)
Its a logical bug if we hit fs_devices::num_devices == 1 and if the replace is running because, as fs_devices::num_devices counts the in memory devices, so it should include the replace target which is running as indicated by the flag. If this happens return the -EINVAL back. Suggested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> --- Hi, As it fixes the BUG_ON I have spun a new patch for this. Instead of -EINVAL should we use ASSERT? fs/btrfs/volumes.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-)