Message ID | 20170919155534.25334-1-colin.king@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2017.09.19 16:55:34 +0100, Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > An earlier fix changed the return type from find_bb_size however the > integer return is being assigned to a unsigned int so the -ve error > check will never be detected. Make bb_size an int to fix this. > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0") > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow") > Signed-off-by: Colin Ian King <colin.king@canonical.com> > --- > drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c > index 2c0ccbb817dc..f41cbf664b69 100644 > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) > struct intel_shadow_bb_entry *entry_obj; > struct intel_vgpu *vgpu = s->vgpu; > unsigned long gma = 0; > - uint32_t bb_size; > + int bb_size; > void *dst = NULL; > int ret = 0; > Applied this, thanks!
On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote: > On 2017.09.19 16:55:34 +0100, Colin King wrote: > > From: Colin Ian King <colin.king@canonical.com> > > > > An earlier fix changed the return type from find_bb_size however the > > integer return is being assigned to a unsigned int so the -ve error > > check will never be detected. Make bb_size an int to fix this. > > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0") > > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow") > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > > --- > > drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > index 2c0ccbb817dc..f41cbf664b69 100644 > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) > > struct intel_shadow_bb_entry *entry_obj; > > struct intel_vgpu *vgpu = s->vgpu; > > unsigned long gma = 0; > > - uint32_t bb_size; > > + int bb_size; > > void *dst = NULL; > > int ret = 0; > > > > Applied this, thanks! Is it possible for bb_size to be both >= 2g and valid?
On 2017.09.19 19:35:23 -0700, Joe Perches wrote: > On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote: > > On 2017.09.19 16:55:34 +0100, Colin King wrote: > > > From: Colin Ian King <colin.king@canonical.com> > > > > > > An earlier fix changed the return type from find_bb_size however the > > > integer return is being assigned to a unsigned int so the -ve error > > > check will never be detected. Make bb_size an int to fix this. > > > > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0") > > > > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow") > > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > > > --- > > > drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > index 2c0ccbb817dc..f41cbf664b69 100644 > > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) > > > struct intel_shadow_bb_entry *entry_obj; > > > struct intel_vgpu *vgpu = s->vgpu; > > > unsigned long gma = 0; > > > - uint32_t bb_size; > > > + int bb_size; > > > void *dst = NULL; > > > int ret = 0; > > > > > > > Applied this, thanks! > > Is it possible for bb_size to be both >= 2g and valid? Never be possible in practise and if really that big I think something is already insane indeed.
On Thu, 2017-09-21 at 06:44 +0800, Zhenyu Wang wrote: > On 2017.09.19 19:35:23 -0700, Joe Perches wrote: > > On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote: > > > On 2017.09.19 16:55:34 +0100, Colin King wrote: > > > > From: Colin Ian King <colin.king@canonical.com> > > > > > > > > An earlier fix changed the return type from find_bb_size however the > > > > integer return is being assigned to a unsigned int so the -ve error > > > > check will never be detected. Make bb_size an int to fix this. > > > > > > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0") > > > > > > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow") > > > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > > > > --- > > > > drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > index 2c0ccbb817dc..f41cbf664b69 100644 > > > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) > > > > struct intel_shadow_bb_entry *entry_obj; > > > > struct intel_vgpu *vgpu = s->vgpu; > > > > unsigned long gma = 0; > > > > - uint32_t bb_size; > > > > + int bb_size; > > > > void *dst = NULL; > > > > int ret = 0; > > > > > > > > > > Applied this, thanks! > > > > Is it possible for bb_size to be both >= 2g and valid? > > Never be possible in practise and if really that big I think something > is already insane indeed. It's good idea to document these assumptions as WARN_ON's. In i915, if the value is completely internal to kernel, we're using GEM_BUG_ON for these so that our CI will notice breakage. If it's not a driver internal value only, a WARN_ON is the appropriate action. Otherwise the information is lost and the next person reading the code will have the same question in mind. Regards, Joonas
Hi Joonas: Thanks for the introduction. I have been thinking about the possibility of introducing GEM_BUG_ON into GVT-g recently and investigating on it. I'm just a bit confused about the usage between GEM_BUG_ON and WARN_ON. GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly is disabled in a production kernel. In the case of i915, I'm sure it will be enabled in CI test so that it can catch broken code path. Looking into GVT-g, the similar scenario is we enable it in QA test. Let's say GEM_BUG_ON can do its work very well in QA test but QA test is not fully covered all the condition, then something might be still broken when it comes to the production kernel for user and GEM_BUG_ON will be disabled and will not catch that, I guess. That's my confusion which scratched my mind during the investigation: If GEM_BUG_ON is not always working, then it looks WARN_ON should always be used.... Expected to learn more about the story behind. :) Thanks, Zhi. -----Original Message----- From: intel-gvt-dev [mailto:intel-gvt-dev-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen Sent: Thursday, September 21, 2017 5:32 PM To: Zhenyu Wang <zhenyuw@linux.intel.com>; Joe Perches <joe@perches.com> Cc: Gao, Fred <fred.gao@intel.com>; David Airlie <airlied@linux.ie>; intel-gfx@lists.freedesktop.org; kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org; Jani Nikula <jani.nikula@linux.intel.com>; dri-devel@lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Colin King <colin.king@canonical.com>; intel-gvt-dev@lists.freedesktop.org; Wang, Zhi A <zhi.a.wang@intel.com> Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly On Thu, 2017-09-21 at 06:44 +0800, Zhenyu Wang wrote: > On 2017.09.19 19:35:23 -0700, Joe Perches wrote: > > On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote: > > > On 2017.09.19 16:55:34 +0100, Colin King wrote: > > > > From: Colin Ian King <colin.king@canonical.com> > > > > > > > > An earlier fix changed the return type from find_bb_size however > > > > the integer return is being assigned to a unsigned int so the > > > > -ve error check will never be detected. Make bb_size an int to fix this. > > > > > > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against > > > > 0") > > > > > > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for > > > > perform_bb_shadow") > > > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > > > > --- > > > > drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > index 2c0ccbb817dc..f41cbf664b69 100644 > > > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c > > > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) > > > > struct intel_shadow_bb_entry *entry_obj; > > > > struct intel_vgpu *vgpu = s->vgpu; > > > > unsigned long gma = 0; > > > > - uint32_t bb_size; > > > > + int bb_size; > > > > void *dst = NULL; > > > > int ret = 0; > > > > > > > > > > Applied this, thanks! > > > > Is it possible for bb_size to be both >= 2g and valid? > > Never be possible in practise and if really that big I think something > is already insane indeed. It's good idea to document these assumptions as WARN_ON's. In i915, if the value is completely internal to kernel, we're using GEM_BUG_ON for these so that our CI will notice breakage. If it's not a driver internal value only, a WARN_ON is the appropriate action. Otherwise the information is lost and the next person reading the code will have the same question in mind. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ intel-gvt-dev mailing list intel-gvt-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote: > Hi Joonas: > > Thanks for the introduction. I have been thinking about the > possibility of introducing GEM_BUG_ON into GVT-g recently and > investigating on it. I'm just a bit confused about the usage between > GEM_BUG_ON and WARN_ON. GEM_BUG_ON is basically there to catch things that we do not expect ever to happen within the driver. So we often list the function preconditions as GEM_BUG_ON. It's there for the same reason as the lockdep_assert_held and KASAN. It's sometimes heavy checks that we really want to run when functionally validating kernel. GEM_BUG_ON became to existence because adding checks for obvious conditions at the critical command submission path GEM is not sustainable for performance in production. The expectation is that each GEM_BUG_ON has a testcase in I-G-T that has the potential to hit it if driver was modified not to respect those preconditions. So once our testest passes, we can disable the GEM_BUG_ONs and be confident of the internal driver quality and get the release performance. WARN_ON is mostly used for the cases when the hardware is behaving differently than we expect. We can't remove them as we don't have all the hardware in the world to test, but we try to exercise them too through I-G-Ts. The test will often be the subtest that was written to reproduce the problem with our expectations of hardware in case of hangs and other bugs. After we've corrected the driver behaviour, or got a hardware W/A assigned, we keep the test and add a WARN_ON to make sure there will be no regression back to the same situation. This is at least what should happen, given time constraints, there may be variations. User behaving unexpectedly should never result in WARN_ON (or even worse, BUG_ON), should always just be debug messages displayed (not to trigger the CI) and errors propagated back to user: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended -ioctl-return-values Bare BUG_ON should only be used when there's the danger of corrupting system memory or filesystems, so from graphics driver, that's not very often. Controlled propagation of errors and maybe WARN_ON is always preferred if possible. > GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly > is disabled in a production kernel. In the case of i915, I'm sure it > will be enabled in CI test so that it can catch broken code path. > Looking into GVT-g, the similar scenario is we enable it in QA test. > > Let's say GEM_BUG_ON can do its work very well in QA test but QA test > is not fully covered all the condition, then something might be still > broken when it comes to the production kernel for user and GEM_BUG_ON > will be disabled and will not catch that, I guess. > > That's my confusion which scratched my mind during the investigation: > If GEM_BUG_ON is not always working, then it looks WARN_ON should > always be used.... Expected to learn more about the story behind. :) So if the saying is some object is "never going to be bigger than 2G", there should be either: 1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by trying to allocate a huge object for example, and should get rejection as -EINVAL 2. Test to see if the object is bigger, and propagate back the error if it is. Either resulting in user reported error if the origin of the object is outside of kernel <-> hardware. Or a WARN_ON if it's strange hardware or kernel driver behavior. You should choose depending on how often your function gets called, and how critical the execution time is. Hopefully this clarified things. Regards, Joonas
Thanks for the reply. Learned a lot. :) GEM_BUG_ON is new to me since it wasn't there at the beginning of GVT-g upstream. It showed up later. So I left a lot of WARN_ON in the code and some of them should be GEM_BUG_ON now. Now I can figure out those differences. We can discuss with our QA to see if they would like to enable I915_GEM_DEBUG and then we can move to GEM_BUG_ON also, or maybe we can have a dedicated GVT_BUG_ON. :) Thank you so much. Have a great weekend. Thanks, Zhi. -----Original Message----- From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] Sent: Friday, September 22, 2017 2:11 PM To: Wang, Zhi A <zhi.a.wang@intel.com>; Zhenyu Wang <zhenyuw@linux.intel.com>; Joe Perches <joe@perches.com> Cc: Gao, Fred <fred.gao@intel.com>; David Airlie <airlied@linux.ie>; intel-gfx@lists.freedesktop.org; kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org; Jani Nikula <jani.nikula@linux.intel.com>; dri-devel@lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Colin King <colin.king@canonical.com>; intel-gvt-dev@lists.freedesktop.org Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote: > Hi Joonas: > > Thanks for the introduction. I have been thinking about the > possibility of introducing GEM_BUG_ON into GVT-g recently and > investigating on it. I'm just a bit confused about the usage between > GEM_BUG_ON and WARN_ON. GEM_BUG_ON is basically there to catch things that we do not expect ever to happen within the driver. So we often list the function preconditions as GEM_BUG_ON. It's there for the same reason as the lockdep_assert_held and KASAN. It's sometimes heavy checks that we really want to run when functionally validating kernel. GEM_BUG_ON became to existence because adding checks for obvious conditions at the critical command submission path GEM is not sustainable for performance in production. The expectation is that each GEM_BUG_ON has a testcase in I-G-T that has the potential to hit it if driver was modified not to respect those preconditions. So once our testest passes, we can disable the GEM_BUG_ONs and be confident of the internal driver quality and get the release performance. WARN_ON is mostly used for the cases when the hardware is behaving differently than we expect. We can't remove them as we don't have all the hardware in the world to test, but we try to exercise them too through I-G-Ts. The test will often be the subtest that was written to reproduce the problem with our expectations of hardware in case of hangs and other bugs. After we've corrected the driver behaviour, or got a hardware W/A assigned, we keep the test and add a WARN_ON to make sure there will be no regression back to the same situation. This is at least what should happen, given time constraints, there may be variations. User behaving unexpectedly should never result in WARN_ON (or even worse, BUG_ON), should always just be debug messages displayed (not to trigger the CI) and errors propagated back to user: https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended -ioctl-return-values Bare BUG_ON should only be used when there's the danger of corrupting system memory or filesystems, so from graphics driver, that's not very often. Controlled propagation of errors and maybe WARN_ON is always preferred if possible. > GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly > is disabled in a production kernel. In the case of i915, I'm sure it > will be enabled in CI test so that it can catch broken code path. > Looking into GVT-g, the similar scenario is we enable it in QA test. > > Let's say GEM_BUG_ON can do its work very well in QA test but QA test > is not fully covered all the condition, then something might be still > broken when it comes to the production kernel for user and GEM_BUG_ON > will be disabled and will not catch that, I guess. > > That's my confusion which scratched my mind during the investigation: > If GEM_BUG_ON is not always working, then it looks WARN_ON should > always be used.... Expected to learn more about the story behind. :) So if the saying is some object is "never going to be bigger than 2G", there should be either: 1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by trying to allocate a huge object for example, and should get rejection as -EINVAL 2. Test to see if the object is bigger, and propagate back the error if it is. Either resulting in user reported error if the origin of the object is outside of kernel <-> hardware. Or a WARN_ON if it's strange hardware or kernel driver behavior. You should choose depending on how often your function gets called, and how critical the execution time is. Hopefully this clarified things. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation
On Fri, 2017-09-22 at 17:50 +0000, Wang, Zhi A wrote: > Thanks for the reply. Learned a lot. :) > > GEM_BUG_ON is new to me since it wasn't there at the beginning of > GVT-g upstream. It showed up later. So I left a lot of WARN_ON in the > code and some of them should be GEM_BUG_ON now. > > Now I can figure out those differences. We can discuss with our QA to > see if they would like to enable I915_GEM_DEBUG and then we can move > to GEM_BUG_ON also, or maybe we can have a dedicated GVT_BUG_ON. :) > Thank you so much. Have a great weekend. GVT_BUG_ON is probably the way to go :) Regards, Joonas
diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c index 2c0ccbb817dc..f41cbf664b69 100644 --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s) struct intel_shadow_bb_entry *entry_obj; struct intel_vgpu *vgpu = s->vgpu; unsigned long gma = 0; - uint32_t bb_size; + int bb_size; void *dst = NULL; int ret = 0;