diff mbox

[drm-next] drm/i915/gvt: ensure -ve return value is handled correctly

Message ID 20170919155534.25334-1-colin.king@canonical.com (mailing list archive)
State New, archived
Headers show

Commit Message

Colin King Sept. 19, 2017, 3:55 p.m. UTC
From: Colin Ian King <colin.king@canonical.com>

An earlier fix changed the return type from find_bb_size however the
integer return is being assigned to a unsigned int so the -ve error
check will never be detected. Make bb_size an int to fix this.

Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")

Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Zhenyu Wang Sept. 19, 2017, 9:46 p.m. UTC | #1
On 2017.09.19 16:55:34 +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> An earlier fix changed the return type from find_bb_size however the
> integer return is being assigned to a unsigned int so the -ve error
> check will never be detected. Make bb_size an int to fix this.
> 
> Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")
> 
> Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> index 2c0ccbb817dc..f41cbf664b69 100644
> --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
> +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s)
>  	struct intel_shadow_bb_entry *entry_obj;
>  	struct intel_vgpu *vgpu = s->vgpu;
>  	unsigned long gma = 0;
> -	uint32_t bb_size;
> +	int bb_size;
>  	void *dst = NULL;
>  	int ret = 0;
>  

Applied this, thanks!
Joe Perches Sept. 20, 2017, 2:35 a.m. UTC | #2
On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote:
> On 2017.09.19 16:55:34 +0100, Colin King wrote:
> > From: Colin Ian King <colin.king@canonical.com>
> > 
> > An earlier fix changed the return type from find_bb_size however the
> > integer return is being assigned to a unsigned int so the -ve error
> > check will never be detected. Make bb_size an int to fix this.
> > 
> > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")
> > 
> > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
> > Signed-off-by: Colin Ian King <colin.king@canonical.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > index 2c0ccbb817dc..f41cbf664b69 100644
> > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s)
> >  	struct intel_shadow_bb_entry *entry_obj;
> >  	struct intel_vgpu *vgpu = s->vgpu;
> >  	unsigned long gma = 0;
> > -	uint32_t bb_size;
> > +	int bb_size;
> >  	void *dst = NULL;
> >  	int ret = 0;
> >  
> 
> Applied this, thanks!

Is it possible for bb_size to be both >= 2g and valid?
Zhenyu Wang Sept. 20, 2017, 10:44 p.m. UTC | #3
On 2017.09.19 19:35:23 -0700, Joe Perches wrote:
> On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote:
> > On 2017.09.19 16:55:34 +0100, Colin King wrote:
> > > From: Colin Ian King <colin.king@canonical.com>
> > > 
> > > An earlier fix changed the return type from find_bb_size however the
> > > integer return is being assigned to a unsigned int so the -ve error
> > > check will never be detected. Make bb_size an int to fix this.
> > > 
> > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")
> > > 
> > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
> > > Signed-off-by: Colin Ian King <colin.king@canonical.com>
> > > ---
> > >  drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > index 2c0ccbb817dc..f41cbf664b69 100644
> > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s)
> > >  	struct intel_shadow_bb_entry *entry_obj;
> > >  	struct intel_vgpu *vgpu = s->vgpu;
> > >  	unsigned long gma = 0;
> > > -	uint32_t bb_size;
> > > +	int bb_size;
> > >  	void *dst = NULL;
> > >  	int ret = 0;
> > >  
> > 
> > Applied this, thanks!
> 
> Is it possible for bb_size to be both >= 2g and valid?

Never be possible in practise and if really that big I think something
is already insane indeed.
Joonas Lahtinen Sept. 21, 2017, 2:31 p.m. UTC | #4
On Thu, 2017-09-21 at 06:44 +0800, Zhenyu Wang wrote:
> On 2017.09.19 19:35:23 -0700, Joe Perches wrote:
> > On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote:
> > > On 2017.09.19 16:55:34 +0100, Colin King wrote:
> > > > From: Colin Ian King <colin.king@canonical.com>
> > > > 
> > > > An earlier fix changed the return type from find_bb_size however the
> > > > integer return is being assigned to a unsigned int so the -ve error
> > > > check will never be detected. Make bb_size an int to fix this.
> > > > 
> > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 0")
> > > > 
> > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for perform_bb_shadow")
> > > > Signed-off-by: Colin Ian King <colin.king@canonical.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > > index 2c0ccbb817dc..f41cbf664b69 100644
> > > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
> > > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s)
> > > >  	struct intel_shadow_bb_entry *entry_obj;
> > > >  	struct intel_vgpu *vgpu = s->vgpu;
> > > >  	unsigned long gma = 0;
> > > > -	uint32_t bb_size;
> > > > +	int bb_size;
> > > >  	void *dst = NULL;
> > > >  	int ret = 0;
> > > >  
> > > 
> > > Applied this, thanks!
> > 
> > Is it possible for bb_size to be both >= 2g and valid?
> 
> Never be possible in practise and if really that big I think something
> is already insane indeed.

It's good idea to document these assumptions as WARN_ON's. In i915, if
the value is completely internal to kernel, we're using GEM_BUG_ON for
these so that our CI will notice breakage. If it's not a driver
internal value only, a WARN_ON is the appropriate action.

Otherwise the information is lost and the next person reading the code
will have the same question in mind.

Regards, Joonas
Wang, Zhi A Sept. 21, 2017, 4:17 p.m. UTC | #5
Hi Joonas:

Thanks for the introduction. I have been thinking about the possibility of introducing GEM_BUG_ON into GVT-g recently and investigating on it. I'm just a bit confused about the usage between GEM_BUG_ON and WARN_ON.

GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly is disabled in a production kernel. In the case of i915, I'm sure it will be enabled in CI test so that it can catch broken code path. Looking into GVT-g, the similar scenario is we enable it in QA test.

Let's say GEM_BUG_ON can do its work very well in QA test but QA test is not fully covered all the condition, then something might be still broken when it comes to the production kernel for user and GEM_BUG_ON will be disabled and will not catch that, I guess.

That's my confusion which scratched my mind during the investigation: If GEM_BUG_ON is not always working, then it looks WARN_ON should always be used.... Expected to learn more about the story behind. :)

Thanks,
Zhi.

-----Original Message-----
From: intel-gvt-dev [mailto:intel-gvt-dev-bounces@lists.freedesktop.org] On Behalf Of Joonas Lahtinen

Sent: Thursday, September 21, 2017 5:32 PM
To: Zhenyu Wang <zhenyuw@linux.intel.com>; Joe Perches <joe@perches.com>
Cc: Gao, Fred <fred.gao@intel.com>; David Airlie <airlied@linux.ie>; intel-gfx@lists.freedesktop.org; kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org; Jani Nikula <jani.nikula@linux.intel.com>; dri-devel@lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Colin King <colin.king@canonical.com>; intel-gvt-dev@lists.freedesktop.org; Wang, Zhi A <zhi.a.wang@intel.com>
Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly

On Thu, 2017-09-21 at 06:44 +0800, Zhenyu Wang wrote:
> On 2017.09.19 19:35:23 -0700, Joe Perches wrote:

> > On Wed, 2017-09-20 at 05:46 +0800, Zhenyu Wang wrote:

> > > On 2017.09.19 16:55:34 +0100, Colin King wrote:

> > > > From: Colin Ian King <colin.king@canonical.com>

> > > > 

> > > > An earlier fix changed the return type from find_bb_size however 

> > > > the integer return is being assigned to a unsigned int so the 

> > > > -ve error check will never be detected. Make bb_size an int to fix this.

> > > > 

> > > > Detected by CoverityScan CID#1456886 ("Unsigned compared against 

> > > > 0")

> > > > 

> > > > Fixes: 1e3197d6ad73 ("drm/i915/gvt: Refine error handling for 

> > > > perform_bb_shadow")

> > > > Signed-off-by: Colin Ian King <colin.king@canonical.com>

> > > > ---

> > > >  drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-

> > > >  1 file changed, 1 insertion(+), 1 deletion(-)

> > > > 

> > > > diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c 

> > > > b/drivers/gpu/drm/i915/gvt/cmd_parser.c

> > > > index 2c0ccbb817dc..f41cbf664b69 100644

> > > > --- a/drivers/gpu/drm/i915/gvt/cmd_parser.c

> > > > +++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c

> > > > @@ -1628,7 +1628,7 @@ static int perform_bb_shadow(struct parser_exec_state *s)

> > > >  	struct intel_shadow_bb_entry *entry_obj;

> > > >  	struct intel_vgpu *vgpu = s->vgpu;

> > > >  	unsigned long gma = 0;

> > > > -	uint32_t bb_size;

> > > > +	int bb_size;

> > > >  	void *dst = NULL;

> > > >  	int ret = 0;

> > > >  

> > > 

> > > Applied this, thanks!

> > 

> > Is it possible for bb_size to be both >= 2g and valid?

> 

> Never be possible in practise and if really that big I think something 

> is already insane indeed.


It's good idea to document these assumptions as WARN_ON's. In i915, if the value is completely internal to kernel, we're using GEM_BUG_ON for these so that our CI will notice breakage. If it's not a driver internal value only, a WARN_ON is the appropriate action.

Otherwise the information is lost and the next person reading the code will have the same question in mind.

Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
intel-gvt-dev mailing list
intel-gvt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Joonas Lahtinen Sept. 22, 2017, 11:11 a.m. UTC | #6
On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote:
> Hi Joonas:
> 
> Thanks for the introduction. I have been thinking about the
> possibility of introducing GEM_BUG_ON into GVT-g recently and
> investigating on it. I'm just a bit confused about the usage between
> GEM_BUG_ON and WARN_ON.

GEM_BUG_ON is basically there to catch things that we do not expect
ever to happen within the driver. So we often list the function
preconditions as GEM_BUG_ON. It's there for the same reason as the
lockdep_assert_held and KASAN. It's sometimes heavy checks that we
really want to run when functionally validating kernel.

GEM_BUG_ON became to existence because adding checks for obvious
conditions at the critical command submission path GEM is not
sustainable for performance in production.

The expectation is that each GEM_BUG_ON has a testcase in I-G-T that
has the potential to hit it if driver was modified not to respect those
preconditions. So once our testest passes, we can disable the
GEM_BUG_ONs and be confident of the internal driver quality and get the
release performance.

WARN_ON is mostly used for the cases when the hardware is behaving
differently than we expect. We can't remove them as we don't have all
the hardware in the world to test, but we try to exercise them too
through I-G-Ts. The test will often be the subtest that was written to
reproduce the problem with our expectations of hardware in case of
hangs and other bugs. After we've corrected the driver behaviour, or
got a hardware W/A assigned, we keep the test and add a WARN_ON to make
sure there will be no regression back to the same situation.

This is at least what should happen, given time constraints, there may
be variations.

User behaving unexpectedly should never result in WARN_ON (or even
worse, BUG_ON), should always just be debug messages displayed (not to
trigger the CI) and errors propagated back to user:

https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended
-ioctl-return-values

Bare BUG_ON should only be used when there's the danger of corrupting
system memory or filesystems, so from graphics driver, that's not very
often. Controlled propagation of errors and maybe WARN_ON is always
preferred if possible.


> GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly
> is disabled in a production kernel. In the case of i915, I'm sure it
> will be enabled in CI test so that it can catch broken code path.
> Looking into GVT-g, the similar scenario is we enable it in QA test.
> 
> Let's say GEM_BUG_ON can do its work very well in QA test but QA test
> is not fully covered all the condition, then something might be still
> broken when it comes to the production kernel for user and GEM_BUG_ON
> will be disabled and will not catch that, I guess.
> 
> That's my confusion which scratched my mind during the investigation:
> If GEM_BUG_ON is not always working, then it looks WARN_ON should
> always be used.... Expected to learn more about the story behind. :)

So if the saying is some object is "never going to be bigger than 2G",
there should be either:

1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by
trying to allocate a huge object for example, and should get rejection
as -EINVAL

2. Test to see if the object is bigger, and propagate back the error if
it is. Either resulting in user reported error if the origin of the
object is outside of kernel <-> hardware. Or a WARN_ON if it's strange
hardware or kernel driver behavior.

You should choose depending on how often your function gets called, and
how critical the execution time is.

Hopefully this clarified things.

Regards, Joonas
Wang, Zhi A Sept. 22, 2017, 5:50 p.m. UTC | #7
Thanks for the reply. Learned a lot. :)

GEM_BUG_ON is new to me since it wasn't there at the beginning of GVT-g upstream. It showed up later. So I left a lot of WARN_ON in the code and some of them should be GEM_BUG_ON now.

Now I can figure out those differences. We can discuss with our QA to see if they would like to enable I915_GEM_DEBUG and then we can move to GEM_BUG_ON also, or maybe we can have a dedicated GVT_BUG_ON. :) Thank you so much. Have a great weekend.

Thanks,
Zhi.

-----Original Message-----
From: Joonas Lahtinen [mailto:joonas.lahtinen@linux.intel.com] 

Sent: Friday, September 22, 2017 2:11 PM
To: Wang, Zhi A <zhi.a.wang@intel.com>; Zhenyu Wang <zhenyuw@linux.intel.com>; Joe Perches <joe@perches.com>
Cc: Gao, Fred <fred.gao@intel.com>; David Airlie <airlied@linux.ie>; intel-gfx@lists.freedesktop.org; kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org; Jani Nikula <jani.nikula@linux.intel.com>; dri-devel@lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Colin King <colin.king@canonical.com>; intel-gvt-dev@lists.freedesktop.org
Subject: Re: [PATCH][drm-next] drm/i915/gvt: ensure -ve return value is handled correctly

On Thu, 2017-09-21 at 16:17 +0000, Wang, Zhi A wrote:
> Hi Joonas:

> 

> Thanks for the introduction. I have been thinking about the 

> possibility of introducing GEM_BUG_ON into GVT-g recently and 

> investigating on it. I'm just a bit confused about the usage between 

> GEM_BUG_ON and WARN_ON.


GEM_BUG_ON is basically there to catch things that we do not expect ever to happen within the driver. So we often list the function preconditions as GEM_BUG_ON. It's there for the same reason as the lockdep_assert_held and KASAN. It's sometimes heavy checks that we really want to run when functionally validating kernel.

GEM_BUG_ON became to existence because adding checks for obvious conditions at the critical command submission path GEM is not sustainable for performance in production.

The expectation is that each GEM_BUG_ON has a testcase in I-G-T that has the potential to hit it if driver was modified not to respect those preconditions. So once our testest passes, we can disable the GEM_BUG_ONs and be confident of the internal driver quality and get the release performance.

WARN_ON is mostly used for the cases when the hardware is behaving differently than we expect. We can't remove them as we don't have all the hardware in the world to test, but we try to exercise them too through I-G-Ts. The test will often be the subtest that was written to reproduce the problem with our expectations of hardware in case of hangs and other bugs. After we've corrected the driver behaviour, or got a hardware W/A assigned, we keep the test and add a WARN_ON to make sure there will be no regression back to the same situation.

This is at least what should happen, given time constraints, there may be variations.

User behaving unexpectedly should never result in WARN_ON (or even worse, BUG_ON), should always just be debug messages displayed (not to trigger the CI) and errors propagated back to user:

https://01.org/linuxgraphics/gfx-docs/drm/gpu/drm-uapi.html#recommended
-ioctl-return-values

Bare BUG_ON should only be used when there's the danger of corrupting system memory or filesystems, so from graphics driver, that's not very often. Controlled propagation of errors and maybe WARN_ON is always preferred if possible.


> GEM_BUG_ON is only enabled when kernel debug is enabled, which mostly 

> is disabled in a production kernel. In the case of i915, I'm sure it 

> will be enabled in CI test so that it can catch broken code path.

> Looking into GVT-g, the similar scenario is we enable it in QA test.

> 

> Let's say GEM_BUG_ON can do its work very well in QA test but QA test 

> is not fully covered all the condition, then something might be still 

> broken when it comes to the production kernel for user and GEM_BUG_ON 

> will be disabled and will not catch that, I guess.

> 

> That's my confusion which scratched my mind during the investigation:

> If GEM_BUG_ON is not always working, then it looks WARN_ON should 

> always be used.... Expected to learn more about the story behind. :)


So if the saying is some object is "never going to be bigger than 2G", there should be either:

1. GEM_BUG_ON like assertion for it and a test that tries to hit it, by trying to allocate a huge object for example, and should get rejection as -EINVAL

2. Test to see if the object is bigger, and propagate back the error if it is. Either resulting in user reported error if the origin of the object is outside of kernel <-> hardware. Or a WARN_ON if it's strange hardware or kernel driver behavior.

You should choose depending on how often your function gets called, and how critical the execution time is.

Hopefully this clarified things.

Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
Joonas Lahtinen Sept. 25, 2017, 9:32 a.m. UTC | #8
On Fri, 2017-09-22 at 17:50 +0000, Wang, Zhi A wrote:
> Thanks for the reply. Learned a lot. :)
> 
> GEM_BUG_ON is new to me since it wasn't there at the beginning of
> GVT-g upstream. It showed up later. So I left a lot of WARN_ON in the
> code and some of them should be GEM_BUG_ON now.
> 
> Now I can figure out those differences. We can discuss with our QA to
> see if they would like to enable I915_GEM_DEBUG and then we can move
> to GEM_BUG_ON also, or maybe we can have a dedicated GVT_BUG_ON. :)
> Thank you so much. Have a great weekend.

GVT_BUG_ON is probably the way to go :)

Regards, Joonas
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
index 2c0ccbb817dc..f41cbf664b69 100644
--- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
+++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
@@ -1628,7 +1628,7 @@  static int perform_bb_shadow(struct parser_exec_state *s)
 	struct intel_shadow_bb_entry *entry_obj;
 	struct intel_vgpu *vgpu = s->vgpu;
 	unsigned long gma = 0;
-	uint32_t bb_size;
+	int bb_size;
 	void *dst = NULL;
 	int ret = 0;