diff mbox

[02/13] drm/i915: Implement command buffer parsing logic

Message ID 1391032514-19136-3-git-send-email-bradley.d.volkin@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

bradley.d.volkin@intel.com Jan. 29, 2014, 9:55 p.m. UTC
From: Brad Volkin <bradley.d.volkin@intel.com>

The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

Each ring maintains tables of commands and registers which the parser uses in
scanning batch buffers submitted to that ring.

The set of commands that the parser must check for is significantly smaller
than the number of commands supported, especially on the render ring. As such,
the parser tables (built up in subsequent patches) contain only those commands
required by the parser. This generally works because command opcode ranges have
standard command length encodings. So for commands that the parser does not need
to check, it can easily skip them. This is implementated via a per-ring length
decoding vfunc.

Unfortunately, there are a number of commands that do not follow the standard
length encoding for their opcode range, primarily amongst the MI_* commands. To
handle this, the parser provides a way to define explicit "skip" entries in the
per-ring command tables.

Other command table entries will map fairly directly to high level categories
mentioned above: rejected, master-only, register whitelist. A number of checks,
including the privileged memory checks, are implemented via a general bitmasking
mechanism.

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/Makefile              |   3 +-
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
 drivers/gpu/drm/i915/i915_params.c         |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
 7 files changed, 556 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c

Comments

Chris Wilson Jan. 29, 2014, 10:28 p.m. UTC | #1
On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + *
> + * The caller must supply space for a default descriptor via the default_desc
> + * parameter. If no descriptor for the specified command exists in the ring's
> + * command parser tables, this function fills in default_desc based on the
> + * ring's default length encoding and returns default_desc.
> + */
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd(struct intel_ring_buffer *ring,
> +	 u32 cmd_header,
> +	 struct drm_i915_cmd_descriptor *default_desc)
> +{
> +	u32 mask;
> +	int i;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +
> +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> +		if (desc)
> +			return desc;
> +	}
> +
> +	mask = ring->get_cmd_length_mask(cmd_header);
> +	if (!mask)
> +		return NULL;
> +
> +	BUG_ON(!default_desc);
> +	default_desc->flags = CMD_DESC_SKIP;
> +	default_desc->length.mask = mask;

If we turn off all hw validation (through use of the secure bit) should
we not default to a whitelist of commands? Otherwise it just seems to be
a case of running a fuzzer until we kill the machine.
-Chris
Daniel Vetter Jan. 30, 2014, 8:53 a.m. UTC | #2
On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > +/*
> > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > + *
> > + * The caller must supply space for a default descriptor via the default_desc
> > + * parameter. If no descriptor for the specified command exists in the ring's
> > + * command parser tables, this function fills in default_desc based on the
> > + * ring's default length encoding and returns default_desc.
> > + */
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd(struct intel_ring_buffer *ring,
> > +	 u32 cmd_header,
> > +	 struct drm_i915_cmd_descriptor *default_desc)
> > +{
> > +	u32 mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +
> > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > +		if (desc)
> > +			return desc;
> > +	}
> > +
> > +	mask = ring->get_cmd_length_mask(cmd_header);
> > +	if (!mask)
> > +		return NULL;
> > +
> > +	BUG_ON(!default_desc);
> > +	default_desc->flags = CMD_DESC_SKIP;
> > +	default_desc->length.mask = mask;
> 
> If we turn off all hw validation (through use of the secure bit) should
> we not default to a whitelist of commands? Otherwise it just seems to be
> a case of running a fuzzer until we kill the machine.

Preventing hangs and dos is imo not the attack model, gpus are too fickle
for that. The attach model here is to prevent priveledge escalation and
information leaks. I.e. we want just containement of all read/write access
to the gtt space.

I think for that purpose an explicit whitelist of commands which target
things outside of the (pp)gtt is sufficient. radeon's checker design is
completely different, but pretty much the only command they have is
to load register values. Intel gpus otoh have a big set of special-purpose
commands to load (most) of the rendering pipeline state. So we have
hw built-in register whitelists for all that stuff since you just can't
load arbitrary registers and state with those commands.

Also note that for raw register access Bradley's scanner _is_ whitelist
based. And for general reads/writes gpu designers confirmed that those are
all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
as long as we check for the exceptions and otherwise only whitelist MI_
commands we know about we should be covered.

So I think this is sound.
-Daniel
Daniel Vetter Jan. 30, 2014, 9:05 a.m. UTC | #3
On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > +/*
> > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > + *
> > > + * The caller must supply space for a default descriptor via the default_desc
> > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > + * command parser tables, this function fills in default_desc based on the
> > > + * ring's default length encoding and returns default_desc.
> > > + */
> > > +static const struct drm_i915_cmd_descriptor*
> > > +find_cmd(struct intel_ring_buffer *ring,
> > > +	 u32 cmd_header,
> > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > +{
> > > +	u32 mask;
> > > +	int i;
> > > +
> > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > +		const struct drm_i915_cmd_descriptor *desc;
> > > +
> > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > +		if (desc)
> > > +			return desc;
> > > +	}
> > > +
> > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > +	if (!mask)
> > > +		return NULL;
> > > +
> > > +	BUG_ON(!default_desc);
> > > +	default_desc->flags = CMD_DESC_SKIP;
> > > +	default_desc->length.mask = mask;
> > 
> > If we turn off all hw validation (through use of the secure bit) should
> > we not default to a whitelist of commands? Otherwise it just seems to be
> > a case of running a fuzzer until we kill the machine.
> 
> Preventing hangs and dos is imo not the attack model, gpus are too fickle
> for that. The attach model here is to prevent priveledge escalation and
> information leaks. I.e. we want just containement of all read/write access
> to the gtt space.
> 
> I think for that purpose an explicit whitelist of commands which target
> things outside of the (pp)gtt is sufficient. radeon's checker design is
> completely different, but pretty much the only command they have is
> to load register values. Intel gpus otoh have a big set of special-purpose
> commands to load (most) of the rendering pipeline state. So we have
> hw built-in register whitelists for all that stuff since you just can't
> load arbitrary registers and state with those commands.
> 
> Also note that for raw register access Bradley's scanner _is_ whitelist
> based. And for general reads/writes gpu designers confirmed that those are
> all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> as long as we check for the exceptions and otherwise only whitelist MI_
> commands we know about we should be covered.
> 
> So I think this is sound.

Hm, but while scrolling through the checker I haven't spotted a "reject
everything unknown" for MI_CLIENT commands. Bradley, have I missed that?

I think submitting an invented MI_CLIENT command would also be a good
testcase.
-Daniel
Daniel Vetter Jan. 30, 2014, 9:07 a.m. UTC | #4
On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> The command parser scans batch buffers submitted via execbuffer ioctls before
> the driver submits them to hardware. At a high level, it looks for several
> things:
> 
> 1) Commands which are explicitly defined as privileged or which should only be
>    used by the kernel driver. The parser generally rejects such commands, with
>    the provision that it may allow some from the drm master process.
> 2) Commands which access registers. To support correct/enhanced userspace
>    functionality, particularly certain OpenGL extensions, the parser provides a
>    whitelist of registers which userspace may safely access (for both normal and
>    drm master processes).
> 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
>    parser always rejects such commands.
> 
> Each ring maintains tables of commands and registers which the parser uses in
> scanning batch buffers submitted to that ring.
> 
> The set of commands that the parser must check for is significantly smaller
> than the number of commands supported, especially on the render ring. As such,
> the parser tables (built up in subsequent patches) contain only those commands
> required by the parser. This generally works because command opcode ranges have
> standard command length encodings. So for commands that the parser does not need
> to check, it can easily skip them. This is implementated via a per-ring length
> decoding vfunc.
> 
> Unfortunately, there are a number of commands that do not follow the standard
> length encoding for their opcode range, primarily amongst the MI_* commands. To
> handle this, the parser provides a way to define explicit "skip" entries in the
> per-ring command tables.
> 
> Other command table entries will map fairly directly to high level categories
> mentioned above: rejected, master-only, register whitelist. A number of checks,
> including the privileged memory checks, are implemented via a general bitmasking
> mechanism.
> 
> OTC-Tracker: AXIA-4631
> Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>


> +#include "i915_drv.h"
> +
> +#define CLIENT_MASK      0xE0000000
> +#define SUBCLIENT_MASK   0x18000000
> +#define MI_CLIENT        0x00000000
> +#define RC_CLIENT        0x60000000
> +#define BC_CLIENT        0x40000000
> +#define MEDIA_SUBCLIENT  0x10000000

I think these would fit neatly right next to all the other MI_* #defines
in i915_reg.h. The other idea that just crossed my mind is to extract all
the command #defines into a new i915_cmd.h (included by i915_drv.h by
default) since i915_reg.h is already giant.

But that should be done as a follow-up patch to avoid patch shuffling
hell.
-Daniel
Daniel Vetter Jan. 30, 2014, 9:12 a.m. UTC | #5
On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > +/*
> > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > + *
> > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > + * command parser tables, this function fills in default_desc based on the
> > > > + * ring's default length encoding and returns default_desc.
> > > > + */
> > > > +static const struct drm_i915_cmd_descriptor*
> > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > +	 u32 cmd_header,
> > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > +{
> > > > +	u32 mask;
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > +
> > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > +		if (desc)
> > > > +			return desc;
> > > > +	}
> > > > +
> > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > +	if (!mask)
> > > > +		return NULL;
> > > > +
> > > > +	BUG_ON(!default_desc);
> > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > +	default_desc->length.mask = mask;
> > > 
> > > If we turn off all hw validation (through use of the secure bit) should
> > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > a case of running a fuzzer until we kill the machine.
> > 
> > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > for that. The attach model here is to prevent priveledge escalation and
> > information leaks. I.e. we want just containement of all read/write access
> > to the gtt space.
> > 
> > I think for that purpose an explicit whitelist of commands which target
> > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > completely different, but pretty much the only command they have is
> > to load register values. Intel gpus otoh have a big set of special-purpose
> > commands to load (most) of the rendering pipeline state. So we have
> > hw built-in register whitelists for all that stuff since you just can't
> > load arbitrary registers and state with those commands.
> > 
> > Also note that for raw register access Bradley's scanner _is_ whitelist
> > based. And for general reads/writes gpu designers confirmed that those are
> > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > as long as we check for the exceptions and otherwise only whitelist MI_
> > commands we know about we should be covered.
> > 
> > So I think this is sound.
> 
> Hm, but while scrolling through the checker I haven't spotted a "reject
> everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> 
> I think submitting an invented MI_CLIENT command would also be a good
> testcase.

One more: I think it would be good to have an overview comment at the top
of i915_cmd_parser.c which details the security attack model and the
overall blacklist/whitelist design of the checker. We don't (yet) have
autogenerated documentation for i915, but that's something I'm working on.
And the kerneldoc system can also pull in multi-paragraph overview
comments besides the usual api documentation, so it's good to have things
ready.
-Daniel
Chris Wilson Jan. 30, 2014, 10:57 a.m. UTC | #6
On Thu, Jan 30, 2014 at 10:07:42AM +0100, Daniel Vetter wrote:
> On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> > 
> > The command parser scans batch buffers submitted via execbuffer ioctls before
> > the driver submits them to hardware. At a high level, it looks for several
> > things:
> > 
> > 1) Commands which are explicitly defined as privileged or which should only be
> >    used by the kernel driver. The parser generally rejects such commands, with
> >    the provision that it may allow some from the drm master process.
> > 2) Commands which access registers. To support correct/enhanced userspace
> >    functionality, particularly certain OpenGL extensions, the parser provides a
> >    whitelist of registers which userspace may safely access (for both normal and
> >    drm master processes).
> > 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
> >    parser always rejects such commands.
> > 
> > Each ring maintains tables of commands and registers which the parser uses in
> > scanning batch buffers submitted to that ring.
> > 
> > The set of commands that the parser must check for is significantly smaller
> > than the number of commands supported, especially on the render ring. As such,
> > the parser tables (built up in subsequent patches) contain only those commands
> > required by the parser. This generally works because command opcode ranges have
> > standard command length encodings. So for commands that the parser does not need
> > to check, it can easily skip them. This is implementated via a per-ring length
> > decoding vfunc.
> > 
> > Unfortunately, there are a number of commands that do not follow the standard
> > length encoding for their opcode range, primarily amongst the MI_* commands. To
> > handle this, the parser provides a way to define explicit "skip" entries in the
> > per-ring command tables.
> > 
> > Other command table entries will map fairly directly to high level categories
> > mentioned above: rejected, master-only, register whitelist. A number of checks,
> > including the privileged memory checks, are implemented via a general bitmasking
> > mechanism.
> > 
> > OTC-Tracker: AXIA-4631
> > Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> 
> 
> > +#include "i915_drv.h"
> > +
> > +#define CLIENT_MASK      0xE0000000
> > +#define SUBCLIENT_MASK   0x18000000
> > +#define MI_CLIENT        0x00000000
> > +#define RC_CLIENT        0x60000000
> > +#define BC_CLIENT        0x40000000
> > +#define MEDIA_SUBCLIENT  0x10000000
> 
> I think these would fit neatly right next to all the other MI_* #defines
> in i915_reg.h. The other idea that just crossed my mind is to extract all
> the command #defines into a new i915_cmd.h (included by i915_drv.h by
> default) since i915_reg.h is already giant.

i915_cmd.h please. (i915_cs.h? i915_command_stream.h?)
-Chris
Daniel Vetter Jan. 30, 2014, 11:07 a.m. UTC | #7
On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > +/*
> > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > + *
> > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > + * ring's default length encoding and returns default_desc.
> > > > > + */
> > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > +	 u32 cmd_header,
> > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > +{
> > > > > +	u32 mask;
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > +
> > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > +		if (desc)
> > > > > +			return desc;
> > > > > +	}
> > > > > +
> > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > +	if (!mask)
> > > > > +		return NULL;
> > > > > +
> > > > > +	BUG_ON(!default_desc);
> > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > +	default_desc->length.mask = mask;
> > > > 
> > > > If we turn off all hw validation (through use of the secure bit) should
> > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > a case of running a fuzzer until we kill the machine.
> > > 
> > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > for that. The attach model here is to prevent priveledge escalation and
> > > information leaks. I.e. we want just containement of all read/write access
> > > to the gtt space.
> > > 
> > > I think for that purpose an explicit whitelist of commands which target
> > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > completely different, but pretty much the only command they have is
> > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > commands to load (most) of the rendering pipeline state. So we have
> > > hw built-in register whitelists for all that stuff since you just can't
> > > load arbitrary registers and state with those commands.
> > > 
> > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > based. And for general reads/writes gpu designers confirmed that those are
> > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > commands we know about we should be covered.
> > > 
> > > So I think this is sound.
> > 
> > Hm, but while scrolling through the checker I haven't spotted a "reject
> > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > 
> > I think submitting an invented MI_CLIENT command would also be a good
> > testcase.
> 
> One more: I think it would be good to have an overview comment at the top
> of i915_cmd_parser.c which details the security attack model and the
> overall blacklist/whitelist design of the checker. We don't (yet) have
> autogenerated documentation for i915, but that's something I'm working on.
> And the kerneldoc system can also pull in multi-paragraph overview
> comments besides the usual api documentation, so it's good to have things
> ready.

Chatted with Chris a bit more on irc about this, and for more paranoia I
guess we should also reject any unknown client and media subclient
commands.
-Daniel
bradley.d.volkin@intel.com Jan. 30, 2014, 5:55 p.m. UTC | #8
On Thu, Jan 30, 2014 at 01:12:06AM -0800, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > +/*
> > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > + *
> > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > + * ring's default length encoding and returns default_desc.
> > > > > + */
> > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > +	 u32 cmd_header,
> > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > +{
> > > > > +	u32 mask;
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > +
> > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > +		if (desc)
> > > > > +			return desc;
> > > > > +	}
> > > > > +
> > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > +	if (!mask)
> > > > > +		return NULL;
> > > > > +
> > > > > +	BUG_ON(!default_desc);
> > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > +	default_desc->length.mask = mask;
> > > > 
> > > > If we turn off all hw validation (through use of the secure bit) should
> > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > a case of running a fuzzer until we kill the machine.
> > > 
> > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > for that. The attach model here is to prevent priveledge escalation and
> > > information leaks. I.e. we want just containement of all read/write access
> > > to the gtt space.
> > > 
> > > I think for that purpose an explicit whitelist of commands which target
> > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > completely different, but pretty much the only command they have is
> > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > commands to load (most) of the rendering pipeline state. So we have
> > > hw built-in register whitelists for all that stuff since you just can't
> > > load arbitrary registers and state with those commands.
> > > 
> > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > based. And for general reads/writes gpu designers confirmed that those are
> > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > commands we know about we should be covered.
> > > 
> > > So I think this is sound.
> > 
> > Hm, but while scrolling through the checker I haven't spotted a "reject
> > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > 
> > I think submitting an invented MI_CLIENT command would also be a good
> > testcase.
> 
> One more: I think it would be good to have an overview comment at the top
> of i915_cmd_parser.c which details the security attack model and the
> overall blacklist/whitelist design of the checker. We don't (yet) have
> autogenerated documentation for i915, but that's something I'm working on.
> And the kerneldoc system can also pull in multi-paragraph overview
> comments besides the usual api documentation, so it's good to have things
> ready.

Ok, I'll add that and maybe kerneldoc for the non-static functions to the
next revision.
- Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
bradley.d.volkin@intel.com Jan. 30, 2014, 6:05 p.m. UTC | #9
On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > +/*
> > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > + *
> > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > + */
> > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > +	 u32 cmd_header,
> > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > +{
> > > > > > +	u32 mask;
> > > > > > +	int i;
> > > > > > +
> > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > +
> > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > +		if (desc)
> > > > > > +			return desc;
> > > > > > +	}
> > > > > > +
> > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > +	if (!mask)
> > > > > > +		return NULL;
> > > > > > +
> > > > > > +	BUG_ON(!default_desc);
> > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > +	default_desc->length.mask = mask;
> > > > > 
> > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > a case of running a fuzzer until we kill the machine.
> > > > 
> > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > for that. The attach model here is to prevent priveledge escalation and
> > > > information leaks. I.e. we want just containement of all read/write access
> > > > to the gtt space.
> > > > 
> > > > I think for that purpose an explicit whitelist of commands which target
> > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > completely different, but pretty much the only command they have is
> > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > commands to load (most) of the rendering pipeline state. So we have
> > > > hw built-in register whitelists for all that stuff since you just can't
> > > > load arbitrary registers and state with those commands.
> > > > 
> > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > commands we know about we should be covered.
> > > > 
> > > > So I think this is sound.
> > > 
> > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > 
> > > I think submitting an invented MI_CLIENT command would also be a good
> > > testcase.
> > 
> > One more: I think it would be good to have an overview comment at the top
> > of i915_cmd_parser.c which details the security attack model and the
> > overall blacklist/whitelist design of the checker. We don't (yet) have
> > autogenerated documentation for i915, but that's something I'm working on.
> > And the kerneldoc system can also pull in multi-paragraph overview
> > comments besides the usual api documentation, so it's good to have things
> > ready.
> 
> Chatted with Chris a bit more on irc about this, and for more paranoia I
> guess we should also reject any unknown client and media subclient
> commands.

Hmm, not sure I follow. Can you elaborate?

Are you suggesting we add all the MI and Media commands to the tables and reject
any command from those client/subclients that is not found in the table? Or that
we look at the client and subclient fields of the command and reject if they are
not from a set of expected values? Or other?

- Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
bradley.d.volkin@intel.com Feb. 3, 2014, 11 p.m. UTC | #10
Ping. Daniel or Chris, can one of you clarify this request? Thanks.

On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > +/*
> > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > + *
> > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > + */
> > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > +	 u32 cmd_header,
> > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > +{
> > > > > > > +	u32 mask;
> > > > > > > +	int i;
> > > > > > > +
> > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > +
> > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > +		if (desc)
> > > > > > > +			return desc;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > +	if (!mask)
> > > > > > > +		return NULL;
> > > > > > > +
> > > > > > > +	BUG_ON(!default_desc);
> > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > +	default_desc->length.mask = mask;
> > > > > > 
> > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > a case of running a fuzzer until we kill the machine.
> > > > > 
> > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > to the gtt space.
> > > > > 
> > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > completely different, but pretty much the only command they have is
> > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > load arbitrary registers and state with those commands.
> > > > > 
> > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > commands we know about we should be covered.
> > > > > 
> > > > > So I think this is sound.
> > > > 
> > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > 
> > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > testcase.
> > > 
> > > One more: I think it would be good to have an overview comment at the top
> > > of i915_cmd_parser.c which details the security attack model and the
> > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > autogenerated documentation for i915, but that's something I'm working on.
> > > And the kerneldoc system can also pull in multi-paragraph overview
> > > comments besides the usual api documentation, so it's good to have things
> > > ready.
> > 
> > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > guess we should also reject any unknown client and media subclient
> > commands.
> 
> Hmm, not sure I follow. Can you elaborate?
> 
> Are you suggesting we add all the MI and Media commands to the tables and reject
> any command from those client/subclients that is not found in the table? Or that
> we look at the client and subclient fields of the command and reject if they are
> not from a set of expected values? Or other?
> 
> - Brad
> 
> > -Daniel
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Feb. 4, 2014, 10:20 a.m. UTC | #11
On Mon, Feb 03, 2014 at 03:00:19PM -0800, Volkin, Bradley D wrote:
> Ping. Daniel or Chris, can one of you clarify this request? Thanks.

I've been enjoying fosdem ...

> On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> > On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > > +/*
> > > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > > + *
> > > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > > + */
> > > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > > +	 u32 cmd_header,
> > > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > > +{
> > > > > > > > +	u32 mask;
> > > > > > > > +	int i;
> > > > > > > > +
> > > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > > +
> > > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > > +		if (desc)
> > > > > > > > +			return desc;
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > > +	if (!mask)
> > > > > > > > +		return NULL;
> > > > > > > > +
> > > > > > > > +	BUG_ON(!default_desc);
> > > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > > +	default_desc->length.mask = mask;
> > > > > > > 
> > > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > > a case of running a fuzzer until we kill the machine.
> > > > > > 
> > > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > > to the gtt space.
> > > > > > 
> > > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > > completely different, but pretty much the only command they have is
> > > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > > load arbitrary registers and state with those commands.
> > > > > > 
> > > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > > commands we know about we should be covered.
> > > > > > 
> > > > > > So I think this is sound.
> > > > > 
> > > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > > 
> > > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > > testcase.
> > > > 
> > > > One more: I think it would be good to have an overview comment at the top
> > > > of i915_cmd_parser.c which details the security attack model and the
> > > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > > autogenerated documentation for i915, but that's something I'm working on.
> > > > And the kerneldoc system can also pull in multi-paragraph overview
> > > > comments besides the usual api documentation, so it's good to have things
> > > > ready.
> > > 
> > > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > > guess we should also reject any unknown client and media subclient
> > > commands.
> > 
> > Hmm, not sure I follow. Can you elaborate?
> > 
> > Are you suggesting we add all the MI and Media commands to the tables and reject
> > any command from those client/subclients that is not found in the table? Or that
> > we look at the client and subclient fields of the command and reject if they are
> > not from a set of expected values? Or other?

Yeah, I think we should check the client/subclient fields for know values,
but not have explicit lists for each command. So overall control-flow
would be:

1. Check client/subclient against whitelist (per-ring obviously, so reject
blitter commands on non-blt rings ofc).

2. If the client indicates an MI command, check against MI_ whitelist.

3. For all other clients check against blacklist/greylist (e.g.
pipe_control where we just need to forbid global gtt writes).

Iirc we need a special-cases list for blitter commands anyway due to their
irregular lenght, so maybe we could do a full whitelist for that one, too.

Cheers, Daniel
bradley.d.volkin@intel.com Feb. 4, 2014, 6:45 p.m. UTC | #12
On Tue, Feb 04, 2014 at 02:20:36AM -0800, Daniel Vetter wrote:
> On Mon, Feb 03, 2014 at 03:00:19PM -0800, Volkin, Bradley D wrote:
> > Ping. Daniel or Chris, can one of you clarify this request? Thanks.
> 
> I've been enjoying fosdem ...
> 
> > On Thu, Jan 30, 2014 at 10:05:27AM -0800, Volkin, Bradley D wrote:
> > > On Thu, Jan 30, 2014 at 03:07:15AM -0800, Daniel Vetter wrote:
> > > > On Thu, Jan 30, 2014 at 10:12:06AM +0100, Daniel Vetter wrote:
> > > > > On Thu, Jan 30, 2014 at 10:05:28AM +0100, Daniel Vetter wrote:
> > > > > > On Thu, Jan 30, 2014 at 09:53:28AM +0100, Daniel Vetter wrote:
> > > > > > > On Wed, Jan 29, 2014 at 10:28:36PM +0000, Chris Wilson wrote:
> > > > > > > > On Wed, Jan 29, 2014 at 01:55:03PM -0800, bradley.d.volkin@intel.com wrote:
> > > > > > > > > +/*
> > > > > > > > > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > > > > > > > > + *
> > > > > > > > > + * The caller must supply space for a default descriptor via the default_desc
> > > > > > > > > + * parameter. If no descriptor for the specified command exists in the ring's
> > > > > > > > > + * command parser tables, this function fills in default_desc based on the
> > > > > > > > > + * ring's default length encoding and returns default_desc.
> > > > > > > > > + */
> > > > > > > > > +static const struct drm_i915_cmd_descriptor*
> > > > > > > > > +find_cmd(struct intel_ring_buffer *ring,
> > > > > > > > > +	 u32 cmd_header,
> > > > > > > > > +	 struct drm_i915_cmd_descriptor *default_desc)
> > > > > > > > > +{
> > > > > > > > > +	u32 mask;
> > > > > > > > > +	int i;
> > > > > > > > > +
> > > > > > > > > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > > > > > > > > +		const struct drm_i915_cmd_descriptor *desc;
> > > > > > > > > +
> > > > > > > > > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > > > > > > > > +		if (desc)
> > > > > > > > > +			return desc;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	mask = ring->get_cmd_length_mask(cmd_header);
> > > > > > > > > +	if (!mask)
> > > > > > > > > +		return NULL;
> > > > > > > > > +
> > > > > > > > > +	BUG_ON(!default_desc);
> > > > > > > > > +	default_desc->flags = CMD_DESC_SKIP;
> > > > > > > > > +	default_desc->length.mask = mask;
> > > > > > > > 
> > > > > > > > If we turn off all hw validation (through use of the secure bit) should
> > > > > > > > we not default to a whitelist of commands? Otherwise it just seems to be
> > > > > > > > a case of running a fuzzer until we kill the machine.
> > > > > > > 
> > > > > > > Preventing hangs and dos is imo not the attack model, gpus are too fickle
> > > > > > > for that. The attach model here is to prevent priveledge escalation and
> > > > > > > information leaks. I.e. we want just containement of all read/write access
> > > > > > > to the gtt space.
> > > > > > > 
> > > > > > > I think for that purpose an explicit whitelist of commands which target
> > > > > > > things outside of the (pp)gtt is sufficient. radeon's checker design is
> > > > > > > completely different, but pretty much the only command they have is
> > > > > > > to load register values. Intel gpus otoh have a big set of special-purpose
> > > > > > > commands to load (most) of the rendering pipeline state. So we have
> > > > > > > hw built-in register whitelists for all that stuff since you just can't
> > > > > > > load arbitrary registers and state with those commands.
> > > > > > > 
> > > > > > > Also note that for raw register access Bradley's scanner _is_ whitelist
> > > > > > > based. And for general reads/writes gpu designers confirmed that those are
> > > > > > > all MI_ commands (with very few specific exceptions like PIPE_CONTROL), so
> > > > > > > as long as we check for the exceptions and otherwise only whitelist MI_
> > > > > > > commands we know about we should be covered.
> > > > > > > 
> > > > > > > So I think this is sound.
> > > > > > 
> > > > > > Hm, but while scrolling through the checker I haven't spotted a "reject
> > > > > > everything unknown" for MI_CLIENT commands. Bradley, have I missed that?
> > > > > > 
> > > > > > I think submitting an invented MI_CLIENT command would also be a good
> > > > > > testcase.
> > > > > 
> > > > > One more: I think it would be good to have an overview comment at the top
> > > > > of i915_cmd_parser.c which details the security attack model and the
> > > > > overall blacklist/whitelist design of the checker. We don't (yet) have
> > > > > autogenerated documentation for i915, but that's something I'm working on.
> > > > > And the kerneldoc system can also pull in multi-paragraph overview
> > > > > comments besides the usual api documentation, so it's good to have things
> > > > > ready.
> > > > 
> > > > Chatted with Chris a bit more on irc about this, and for more paranoia I
> > > > guess we should also reject any unknown client and media subclient
> > > > commands.
> > > 
> > > Hmm, not sure I follow. Can you elaborate?
> > > 
> > > Are you suggesting we add all the MI and Media commands to the tables and reject
> > > any command from those client/subclients that is not found in the table? Or that
> > > we look at the client and subclient fields of the command and reject if they are
> > > not from a set of expected values? Or other?
> 
> Yeah, I think we should check the client/subclient fields for know values,
> but not have explicit lists for each command. So overall control-flow
> would be:
> 
> 1. Check client/subclient against whitelist (per-ring obviously, so reject
> blitter commands on non-blt rings ofc).

Ok, that's easy enough. I'm not sure the benefit is that large, but I can add this.

> 
> 2. If the client indicates an MI command, check against MI_ whitelist.
> 
> 3. For all other clients check against blacklist/greylist (e.g.
> pipe_control where we just need to forbid global gtt writes).

I'm a bit concerned about 2 and 3 because the behavior and table structures seem
fairly different from what we have now. I'm hesitant to make too big a change here
so close to having something functional.

Please correct me if I've missed or misunderstood anything, but my thinking is...

The current table structure is that we have tables per-ring and per-gen (plus the table
for common MI commands) and all tables are treated as blacklist/greylist. The proposed
flow here would indicate that we need tables per-ring, per-client, per-gen and that some
would be treated as a whitelist and some as a blacklist/greylist.

I think the benefit to these changes amounts to preventing clients from issuing invalid
MI commands, but clients can do that today, so it's not a regression right? It could also
make it easier to safely cover new platforms if MI commands were added, though the parser
is strictly limited to gen7 today and would need additional work to enable it for new gens.

The set of MI commands for current gens is small compared to the other command ranges.
On the one hand, that makes it easier to create a whitelist of the commands. On the other
hand, it also makes it easy to just audit the commands in the spec and validate that the
blacklist/greylist covers the potential issues.

So, if we maintain the conservative parser enabling and re-audit the lists when enabling
new gens, and if we can live with clients issuing totally invalid commands (and I get the
feeling that we have to), then I think the current solution gets us the benefits we want
with less complexity.

Let me know what you think. I'll continue working through the other feedback either way.

> 
> Iirc we need a special-cases list for blitter commands anyway due to their
> irregular lenght, so maybe we could do a full whitelist for that one, too.

All rings have commands with irregular length encodings. I believe they also all have
commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).

Thanks,
Brad

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Feb. 4, 2014, 7:33 p.m. UTC | #13
On Tue, Feb 04, 2014 at 10:45:45AM -0800, Volkin, Bradley D wrote:
> The current table structure is that we have tables per-ring and per-gen (plus the table
> for common MI commands) and all tables are treated as blacklist/greylist. The proposed
> flow here would indicate that we need tables per-ring, per-client, per-gen and that some
> would be treated as a whitelist and some as a blacklist/greylist.
> 
> I think the benefit to these changes amounts to preventing clients from issuing invalid
> MI commands, but clients can do that today, so it's not a regression right? It could also
> make it easier to safely cover new platforms if MI commands were added, though the parser
> is strictly limited to gen7 today and would need additional work to enable it for new gens.

The benefit is in enabling future platforms - with an explicit MI
whitelist we'd forced to re-audit MI_ commands fully and there's no chance
to miss something. Also the MI_ commands are the tricky ones, so if one
slips through we have a problem.

Relying on the command streamer hw to catch invalid opcodes is something
we need to, so no benefit for that really.

> The set of MI commands for current gens is small compared to the other command ranges.
> On the one hand, that makes it easier to create a whitelist of the commands. On the other
> hand, it also makes it easy to just audit the commands in the spec and validate that the
> blacklist/greylist covers the potential issues.

Since we only care about the set of allowed MI commands the list probably
even shrinks a bit.

> So, if we maintain the conservative parser enabling and re-audit the lists when enabling
> new gens, and if we can live with clients issuing totally invalid commands (and I get the
> feeling that we have to), then I think the current solution gets us the benefits we want
> with less complexity.
> 
> Let me know what you think. I'll continue working through the other feedback either way.

I didn't really consider that the MI whitelist would have a bigger impact
on the code really. So if you expect this change to cause a bit a delay in
getting the next round ready then we can postpone this a bit - for me the
important part is to get the parser in soonish so that we can start to
catch regressions (if there are any we haven't considered yet). But
longer-term I think switching to a whiteliste for MI commands is the right
approach.

> > Iirc we need a special-cases list for blitter commands anyway due to their
> > irregular lenght, so maybe we could do a full whitelist for that one, too.
> 
> All rings have commands with irregular length encodings. I believe they also all have
> commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).

Yeah, variable length commands are everywhere, but I've thought commands
which don't use the usual lenght-2 encoding (i.e. those which are just 1
dword) are restricted to MI commands and the blitter. Am I mistaken on
this?

Cheers, Daniel
bradley.d.volkin@intel.com Feb. 5, 2014, 12:56 a.m. UTC | #14
On Tue, Feb 04, 2014 at 11:33:31AM -0800, Daniel Vetter wrote:
> On Tue, Feb 04, 2014 at 10:45:45AM -0800, Volkin, Bradley D wrote:
> > The current table structure is that we have tables per-ring and per-gen (plus the table
> > for common MI commands) and all tables are treated as blacklist/greylist. The proposed
> > flow here would indicate that we need tables per-ring, per-client, per-gen and that some
> > would be treated as a whitelist and some as a blacklist/greylist.
> > 
> > I think the benefit to these changes amounts to preventing clients from issuing invalid
> > MI commands, but clients can do that today, so it's not a regression right? It could also
> > make it easier to safely cover new platforms if MI commands were added, though the parser
> > is strictly limited to gen7 today and would need additional work to enable it for new gens.
> 
> The benefit is in enabling future platforms - with an explicit MI
> whitelist we'd forced to re-audit MI_ commands fully and there's no chance
> to miss something. Also the MI_ commands are the tricky ones, so if one
> slips through we have a problem.
> 
> Relying on the command streamer hw to catch invalid opcodes is something
> we need to, so no benefit for that really.
> 
> > The set of MI commands for current gens is small compared to the other command ranges.
> > On the one hand, that makes it easier to create a whitelist of the commands. On the other
> > hand, it also makes it easy to just audit the commands in the spec and validate that the
> > blacklist/greylist covers the potential issues.
> 
> Since we only care about the set of allowed MI commands the list probably
> even shrinks a bit.
> 
> > So, if we maintain the conservative parser enabling and re-audit the lists when enabling
> > new gens, and if we can live with clients issuing totally invalid commands (and I get the
> > feeling that we have to), then I think the current solution gets us the benefits we want
> > with less complexity.
> > 
> > Let me know what you think. I'll continue working through the other feedback either way.
> 
> I didn't really consider that the MI whitelist would have a bigger impact
> on the code really. So if you expect this change to cause a bit a delay in
> getting the next round ready then we can postpone this a bit - for me the
> important part is to get the parser in soonish so that we can start to
> catch regressions (if there are any we haven't considered yet). But
> longer-term I think switching to a whiteliste for MI commands is the right
> approach.

Ok, let me get the next round out and then I'll look at this in more detail.

> 
> > > Iirc we need a special-cases list for blitter commands anyway due to their
> > > irregular lenght, so maybe we could do a full whitelist for that one, too.
> > 
> > All rings have commands with irregular length encodings. I believe they also all have
> > commands with non-fixed lengths (e.g. XY_TEXT_IMMEDIATE_BLT on blt, MEDIA_OBJECT on render).
> 
> Yeah, variable length commands are everywhere, but I've thought commands
> which don't use the usual lenght-2 encoding (i.e. those which are just 1
> dword) are restricted to MI commands and the blitter. Am I mistaken on
> this?

From what I see, there's the 1-dword MI commands on all rings. MFX_WAIT uses
length-1 on VCS. The blitter looks like length-2 everywhere.

Also, the MI commands and a few places on RCS and BCS use more/fewer bits for
the length field than other commands for that client/subclient.
-Brad

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Jani Nikula Feb. 5, 2014, 3:15 p.m. UTC | #15
On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
>
> The command parser scans batch buffers submitted via execbuffer ioctls before
> the driver submits them to hardware. At a high level, it looks for several
> things:
>
> 1) Commands which are explicitly defined as privileged or which should only be
>    used by the kernel driver. The parser generally rejects such commands, with
>    the provision that it may allow some from the drm master process.
> 2) Commands which access registers. To support correct/enhanced userspace
>    functionality, particularly certain OpenGL extensions, the parser provides a
>    whitelist of registers which userspace may safely access (for both normal and
>    drm master processes).
> 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
>    parser always rejects such commands.
>
> Each ring maintains tables of commands and registers which the parser uses in
> scanning batch buffers submitted to that ring.
>
> The set of commands that the parser must check for is significantly smaller
> than the number of commands supported, especially on the render ring. As such,
> the parser tables (built up in subsequent patches) contain only those commands
> required by the parser. This generally works because command opcode ranges have
> standard command length encodings. So for commands that the parser does not need
> to check, it can easily skip them. This is implementated via a per-ring length
> decoding vfunc.
>
> Unfortunately, there are a number of commands that do not follow the standard
> length encoding for their opcode range, primarily amongst the MI_* commands. To
> handle this, the parser provides a way to define explicit "skip" entries in the
> per-ring command tables.
>
> Other command table entries will map fairly directly to high level categories
> mentioned above: rejected, master-only, register whitelist. A number of checks,
> including the privileged memory checks, are implemented via a general bitmasking
> mechanism.
>
> OTC-Tracker: AXIA-4631
> Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/Makefile              |   3 +-
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
>  drivers/gpu/drm/i915/i915_params.c         |   5 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
>  7 files changed, 556 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 4850494..2da81bf 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
>  	  dvo_tfp410.o \
>  	  dvo_sil164.o \
>  	  dvo_ns2501.o \
> -	  i915_gem_dmabuf.o
> +	  i915_gem_dmabuf.o \
> +	  i915_cmd_parser.o

If you add this anywhere but last, you only need to touch one line
instead of two. It's nitpicky, but helps with things like git blame
(which would now point at you for i915_gem_dmabuf.o too ;).

>  
>  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
>  
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> new file mode 100644
> index 0000000..7639dbc
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -0,0 +1,404 @@
> +/*
> + * Copyright © 2013 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *    Brad Volkin <bradley.d.volkin@intel.com>
> + *
> + */
> +
> +#include "i915_drv.h"
> +
> +#define CLIENT_MASK      0xE0000000
> +#define SUBCLIENT_MASK   0x18000000
> +#define MI_CLIENT        0x00000000
> +#define RC_CLIENT        0x60000000
> +#define BC_CLIENT        0x40000000
> +#define MEDIA_SUBCLIENT  0x10000000
> +
> +static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == RC_CLIENT) {
> +		if (subclient == MEDIA_SUBCLIENT)
> +			return 0xFFFF;
> +		else
> +			return 0xFF;
> +	}
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == RC_CLIENT) {
> +		if (subclient == MEDIA_SUBCLIENT)
> +			return 0xFFF;
> +		else
> +			return 0xFF;
> +	}
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
> +{
> +	u32 client = cmd_header & CLIENT_MASK;
> +
> +	if (client == MI_CLIENT)
> +		return 0x3F;
> +	else if (client == BC_CLIENT)
> +		return 0xFF;
> +
> +	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
> +	return 0;
> +}
> +
> +static void validate_cmds_sorted(struct intel_ring_buffer *ring)
> +{
> +	int i;
> +
> +	if (!ring->cmd_tables || ring->cmd_table_count == 0)
> +		return;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
> +		u32 previous = 0;
> +		int j;
> +
> +		for (j = 0; j < table->count; j++) {
> +			const struct drm_i915_cmd_descriptor *desc =
> +				&table->table[i];
> +			u32 curr = desc->cmd.value & desc->cmd.mask;
> +
> +			if (curr < previous) {
> +				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
> +					  ring->id, i, j, curr);
> +				return;

So this checks the hand-filled tables, right?

I think this should not stop at the first error, but rather scan the
whole table and DRM_ERROR all cases where curr < previous, and after the
full scan BUG_ON() if there were any errors.

> +			}
> +
> +			previous = curr;
> +		}
> +	}
> +}
> +
> +static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
> +{
> +	int i;
> +	u32 previous = 0;
> +
> +	for (i = 0; i < reg_count; i++) {
> +		u32 curr = reg_table[i];
> +
> +		if (curr < previous) {
> +			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
> +				  ring_id, i, curr);
> +			return;

Same here.

> +		}
> +
> +		previous = curr;
> +	}
> +}
> +
> +static void validate_regs_sorted(struct intel_ring_buffer *ring)
> +{
> +	if (ring->reg_table && ring->reg_count > 0)
> +		check_sorted(ring->id, ring->reg_table, ring->reg_count);
> +
> +	if (ring->master_reg_table && ring->master_reg_count > 0)
> +		check_sorted(ring->id, ring->master_reg_table,
> +			     ring->master_reg_count);

Somehow I think the ifs here are redundant. check_sorted() is a no-op if
reg_count == 0, and if reg_count > 0 while reg_table == NULL, it
deserves to oops!

> +}
> +
> +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> +{
> +	if (!IS_GEN7(ring->dev))
> +		return;
> +
> +	switch (ring->id) {
> +	case RCS:
> +		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> +		break;
> +	case VCS:
> +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		break;
> +	case BCS:
> +		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> +		break;
> +	case VECS:
> +		/* VECS can use the same length_mask function as VCS */
> +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> +		break;
> +	default:
> +		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
> +			  ring->id);

You'll oops later for calling NULL ring->get_cmd_length_mask(), so might
as well BUG() here.

> +	}
> +
> +	validate_cmds_sorted(ring);
> +	validate_regs_sorted(ring);
> +}
> +
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> +		  u32 cmd_header)
> +{
> +	int i;
> +
> +	for (i = 0; i < table->count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> +
> +		if (masked_cmd == masked_value)
> +			return desc;
> +	}
> +
> +	return NULL;
> +}
> +
> +/*
> + * Returns a pointer to a descriptor for the command specified by cmd_header.
> + *
> + * The caller must supply space for a default descriptor via the default_desc
> + * parameter. If no descriptor for the specified command exists in the ring's
> + * command parser tables, this function fills in default_desc based on the
> + * ring's default length encoding and returns default_desc.
> + */
> +static const struct drm_i915_cmd_descriptor*
> +find_cmd(struct intel_ring_buffer *ring,
> +	 u32 cmd_header,
> +	 struct drm_i915_cmd_descriptor *default_desc)
> +{
> +	u32 mask;
> +	int i;
> +
> +	for (i = 0; i < ring->cmd_table_count; i++) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +
> +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> +		if (desc)
> +			return desc;
> +	}
> +
> +	mask = ring->get_cmd_length_mask(cmd_header);
> +	if (!mask)
> +		return NULL;
> +
> +	BUG_ON(!default_desc);
> +	default_desc->flags = CMD_DESC_SKIP;
> +	default_desc->length.mask = mask;
> +
> +	return default_desc;
> +}
> +
> +static int valid_reg(const u32 *table, int count, u32 addr)

I like bools for boolean stuff.

> +{
> +	if (table && count != 0) {
> +		int i;
> +
> +		for (i = 0; i < count; i++) {
> +			if (table[i] == addr)
> +				return 1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static u32 *vmap_batch(struct drm_i915_gem_object *obj)
> +{
> +	int i;
> +	void *addr = NULL;
> +	struct sg_page_iter sg_iter;
> +	struct page **pages;
> +
> +	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
> +	if (pages == NULL) {
> +		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> +		goto finish;
> +	}
> +
> +	i = 0;
> +	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> +		pages[i] = sg_page_iter_page(&sg_iter);
> +		i++;
> +	}
> +
> +	addr = vmap(pages, i, 0, PAGE_KERNEL);
> +	if (addr == NULL) {
> +		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> +		goto finish;
> +	}
> +
> +finish:
> +	if (pages)
> +		drm_free_large(pages);
> +	return (u32*)addr;
> +}
> +
> +int i915_needs_cmd_parser(struct intel_ring_buffer *ring)

bool

> +{
> +	/* No command tables indicates a platform without parsing */
> +	if (!ring->cmd_tables)
> +		return 0;
> +
> +	return i915.enable_cmd_parser;
> +}
> +
> +#define LENGTH_BIAS 2
> +
> +int i915_parse_cmds(struct intel_ring_buffer *ring,
> +		    struct drm_i915_gem_object *batch_obj,
> +		    u32 batch_start_offset,
> +		    bool is_master)
> +{
> +	int ret = 0;
> +	u32 *cmd, *batch_base, *batch_end;
> +	struct drm_i915_cmd_descriptor default_desc = { 0 };
> +	int needs_clflush = 0;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		return ret;
> +	}
> +
> +	batch_base = vmap_batch(batch_obj);
> +	if (!batch_base) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> +		i915_gem_object_unpin_pages(batch_obj);
> +		return -ENOMEM;
> +	}
> +
> +	if (needs_clflush)
> +		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> +
> +	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> +	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
> +
> +	while (cmd < batch_end) {
> +		const struct drm_i915_cmd_descriptor *desc;
> +		u32 length;
> +
> +		if (*cmd == MI_BATCH_BUFFER_END)
> +			break;
> +
> +		desc = find_cmd(ring, *cmd, &default_desc);
> +		if (!desc) {
> +			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> +					 *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_FIXED)
> +			length = desc->length.fixed;
> +		else
> +			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> +
> +		if ((batch_end - cmd) < length) {
> +			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
> +					 *cmd,
> +					 length,
> +					 batch_end - cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_REJECT) {
> +			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> +			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> +					 *cmd);
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		if (desc->flags & CMD_DESC_REGISTER) {
> +			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
> +
> +			if (!valid_reg(ring->reg_table,
> +				       ring->reg_count, reg_addr)) {
> +				if (!is_master ||
> +				    !valid_reg(ring->master_reg_table,
> +					       ring->master_reg_count,
> +					       reg_addr)) {
> +					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
> +							 reg_addr,
> +							 *cmd,
> +							 ring->id);
> +					ret = -EINVAL;
> +					break;
> +				}
> +			}
> +		}
> +
> +		if (desc->flags & CMD_DESC_BITMASK) {
> +			int i;
> +
> +			for (i = 0; i < desc->bits_count; i++) {
> +				u32 dword = cmd[desc->bits[i].offset] &
> +					desc->bits[i].mask;
> +
> +				if (dword != desc->bits[i].expected) {
> +					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> +							 *cmd,
> +							 desc->bits[i].mask,
> +							 desc->bits[i].expected,
> +							 dword, ring->id);
> +					ret = -EINVAL;
> +					break;
> +				}
> +			}
> +
> +			if (ret)
> +				break;
> +		}
> +
> +		cmd += length;
> +	}
> +
> +	if (cmd >= batch_end) {
> +		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> +		ret = -EINVAL;
> +	}
> +
> +	vunmap(batch_base);
> +
> +	i915_gem_object_unpin_pages(batch_obj);
> +
> +	return ret;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index bfb30df..8aed80f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1765,6 +1765,91 @@ struct drm_i915_file_private {
>  	atomic_t rps_wait_boost;
>  };
>  
> +/**
> + * A command that requires special handling by the command parser.
> + */

You have plenty of kernel-doc comments here and in other patches. They
do expect a certain format, however. Please either make them regular
comments (the easy way) or adhere to proper kernel-doc format.

> +struct drm_i915_cmd_descriptor {
> +	/**
> +	 * Flags describing how the command parser processes the command.
> +	 *
> +	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
> +	 *                 a length mask if not set
> +	 * CMD_DESC_SKIP: The command is allowed but does not follow the
> +	 *                standard length encoding for the opcode range in
> +	 *                which it falls
> +	 * CMD_DESC_REJECT: The command is never allowed
> +	 * CMD_DESC_REGISTER: The command should be checked against the
> +	 *                    register whitelist for the appropriate ring
> +	 * CMD_DESC_MASTER: The command is allowed if the submitting process
> +	 *                  is the DRM master
> +	 */
> +	u32 flags;
> +#define CMD_DESC_FIXED    (1<<0)
> +#define CMD_DESC_SKIP     (1<<1)
> +#define CMD_DESC_REJECT   (1<<2)
> +#define CMD_DESC_REGISTER (1<<3)
> +#define CMD_DESC_BITMASK  (1<<4)
> +#define CMD_DESC_MASTER   (1<<5)

Feels like flags should be named FLAG, not DESC. *shrug*.

> +
> +	/**
> +	 * The command's unique identification bits and the bitmask to get them.
> +	 * This isn't strictly the opcode field as defined in the spec and may
> +	 * also include type, subtype, and/or subop fields.
> +	 */
> +	struct {
> +		u32 value;
> +		u32 mask;
> +	} cmd;
> +
> +	/**
> +	 * The command's length. The command is either fixed length (i.e. does
> +	 * not include a length field) or has a length field mask. The flag
> +	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
> +	 * a length mask. All command entries in a command table must include
> +	 * length information.
> +	 */
> +	union {
> +		u32 fixed;
> +		u32 mask;
> +	} length;
> +
> +	/**
> +	 * Describes where to find a register address in the command to check
> +	 * against the ring's register whitelist. Only valid if flags has the
> +	 * CMD_DESC_REGISTER bit set.
> +	 */
> +	struct {
> +		u32 offset;
> +		u32 mask;
> +	} reg;
> +
> +#define MAX_CMD_DESC_BITMASKS 3
> +	/**
> +	 * Describes command checks where a particular dword is masked and
> +	 * compared against an expected value. If the command does not match
> +	 * the expected value, the parser rejects it. Only valid if flags has
> +	 * the CMD_DESC_BITMASK bit set.
> +	 */
> +	struct {
> +		u32 offset;
> +		u32 mask;
> +		u32 expected;
> +	} bits[MAX_CMD_DESC_BITMASKS];
> +	/** Number of valid entries in the bits array */
> +	int bits_count;
> +};
> +
> +/**
> + * A table of commands requiring special handling by the command parser.
> + *
> + * Each ring has an array of tables. Each table consists of an array of command
> + * descriptors, which must be sorted with command opcodes in ascending order.
> + */
> +struct drm_i915_cmd_table {
> +	const struct drm_i915_cmd_descriptor *table;
> +	int count;
> +};
> +
>  #define INTEL_INFO(dev)	(to_i915(dev)->info)
>  
>  #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
> @@ -1923,6 +2008,7 @@ struct i915_params {
>  	bool prefault_disable;
>  	bool reset;
>  	int invert_brightness;
> +	int enable_cmd_parser;
>  };
>  extern struct i915_params i915 __read_mostly;
>  
> @@ -2428,6 +2514,14 @@ void i915_destroy_error_state(struct drm_device *dev);
>  void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
>  const char *i915_cache_level_str(int type);
>  
> +/* i915_cmd_parser.c */
> +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
> +int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
> +int i915_parse_cmds(struct intel_ring_buffer *ring,
> +		    struct drm_i915_gem_object *batch_obj,
> +		    u32 batch_start_offset,
> +		    bool is_master);
> +
>  /* i915_suspend.c */
>  extern int i915_save_state(struct drm_device *dev);
>  extern int i915_restore_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 032def9..c953445 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1180,6 +1180,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  	}
>  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
>  
> +	if (i915_needs_cmd_parser(ring)) {
> +		ret = i915_parse_cmds(ring,
> +				      batch_obj,
> +				      args->batch_start_offset,
> +				      file->is_master);
> +		if (ret)
> +			goto err;
> +
> +		/*
> +		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> +		 * from MI_BATCH_BUFFER_START commands issued in the
> +		 * dispatch_execbuffer implementations. We specifically don't
> +		 * want that set when the command parser is enabled.
> +		 */
> +		flags |= I915_DISPATCH_SECURE;
> +	}
> +
>  	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
>  	 * batch" bit. Hence we need to pin secure batches into the global gtt.
>  	 * hsw should have this fixed, but bdw mucks it up again. */
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index c743057..6d3d906 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
>  	.prefault_disable = 0,
>  	.reset = true,
>  	.invert_brightness = 0,
> +	.enable_cmd_parser = 0

Please add a comma in the end so the next addition won't have to, just
like this doesn't have to touch the previous line.

>  };
>  
>  module_param_named(modeset, i915.modeset, int, 0400);
> @@ -153,3 +154,7 @@ MODULE_PARM_DESC(invert_brightness,
>  	"report PCI device ID, subsystem vendor and subsystem device ID "
>  	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
>  	"It will then be included in an upcoming module version.");
> +
> +module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
> +MODULE_PARM_DESC(enable_cmd_parser,
> +		"Enable command parsing (default: false)");

If it's a bool, make it a bool, or change the default text to 0.

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index a0d61f8..77fc61d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1388,6 +1388,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	if (IS_I830(ring->dev) || IS_845G(ring->dev))
>  		ring->effective_size -= 128;
>  
> +	i915_cmd_parser_init_ring(ring);
> +
>  	return 0;
>  
>  err_unmap:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 71a73f4..cff2b35 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -162,6 +162,38 @@ struct  intel_ring_buffer {
>  		u32 gtt_offset;
>  		volatile u32 *cpu_page;
>  	} scratch;
> +
> +	/**
> +	 * Tables of commands the command parser needs to know about
> +	 * for this ring.
> +	 */
> +	const struct drm_i915_cmd_table *cmd_tables;
> +	int cmd_table_count;
> +
> +	/**
> +	 * Table of registers allowed in commands that read/write registers.
> +	 */
> +	const u32 *reg_table;
> +	int reg_count;
> +
> +	/**
> +	 * Table of registers allowed in commands that read/write registers, but
> +	 * only from the DRM master.
> +	 */
> +	const u32 *master_reg_table;
> +	int master_reg_count;
> +
> +	/**
> +	 * Returns the bitmask for the length field of the specified command.
> +	 * Return 0 for an unrecognized/invalid command.
> +	 *
> +	 * If the command parser finds an entry for a command in the ring's
> +	 * cmd_tables, it gets the command's length based on the table entry.
> +	 * If not, it calls this function to determine the per-ring length field
> +	 * encoding for the command (i.e. certain opcode ranges use certain bits
> +	 * to encode the command length in the header).
> +	 */
> +	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  };

Plenty of non-conforming kernel-doc comments here too.

>  
>  static inline bool
> -- 
> 1.8.5.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
bradley.d.volkin@intel.com Feb. 5, 2014, 6:36 p.m. UTC | #16
On Wed, Feb 05, 2014 at 07:15:35AM -0800, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > From: Brad Volkin <bradley.d.volkin@intel.com>
> >
> > The command parser scans batch buffers submitted via execbuffer ioctls before
> > the driver submits them to hardware. At a high level, it looks for several
> > things:
> >
> > 1) Commands which are explicitly defined as privileged or which should only be
> >    used by the kernel driver. The parser generally rejects such commands, with
> >    the provision that it may allow some from the drm master process.
> > 2) Commands which access registers. To support correct/enhanced userspace
> >    functionality, particularly certain OpenGL extensions, the parser provides a
> >    whitelist of registers which userspace may safely access (for both normal and
> >    drm master processes).
> > 3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
> >    parser always rejects such commands.
> >
> > Each ring maintains tables of commands and registers which the parser uses in
> > scanning batch buffers submitted to that ring.
> >
> > The set of commands that the parser must check for is significantly smaller
> > than the number of commands supported, especially on the render ring. As such,
> > the parser tables (built up in subsequent patches) contain only those commands
> > required by the parser. This generally works because command opcode ranges have
> > standard command length encodings. So for commands that the parser does not need
> > to check, it can easily skip them. This is implementated via a per-ring length
> > decoding vfunc.
> >
> > Unfortunately, there are a number of commands that do not follow the standard
> > length encoding for their opcode range, primarily amongst the MI_* commands. To
> > handle this, the parser provides a way to define explicit "skip" entries in the
> > per-ring command tables.
> >
> > Other command table entries will map fairly directly to high level categories
> > mentioned above: rejected, master-only, register whitelist. A number of checks,
> > including the privileged memory checks, are implemented via a general bitmasking
> > mechanism.
> >
> > OTC-Tracker: AXIA-4631
> > Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
> > Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/Makefile              |   3 +-
> >  drivers/gpu/drm/i915/i915_cmd_parser.c     | 404 +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_drv.h            |  94 +++++++
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  17 ++
> >  drivers/gpu/drm/i915/i915_params.c         |   5 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h    |  32 +++
> >  7 files changed, 556 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 4850494..2da81bf 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -47,7 +47,8 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
> >  	  dvo_tfp410.o \
> >  	  dvo_sil164.o \
> >  	  dvo_ns2501.o \
> > -	  i915_gem_dmabuf.o
> > +	  i915_gem_dmabuf.o \
> > +	  i915_cmd_parser.o
> 
> If you add this anywhere but last, you only need to touch one line
> instead of two. It's nitpicky, but helps with things like git blame
> (which would now point at you for i915_gem_dmabuf.o too ;).

Sounds good

> 
> >  
> >  i915-$(CONFIG_COMPAT)   += i915_ioc32.o
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > new file mode 100644
> > index 0000000..7639dbc
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> > @@ -0,0 +1,404 @@
> > +/*
> > + * Copyright © 2013 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the "Software"),
> > + * to deal in the Software without restriction, including without limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the next
> > + * paragraph) shall be included in all copies or substantial portions of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> > + * IN THE SOFTWARE.
> > + *
> > + * Authors:
> > + *    Brad Volkin <bradley.d.volkin@intel.com>
> > + *
> > + */
> > +
> > +#include "i915_drv.h"
> > +
> > +#define CLIENT_MASK      0xE0000000
> > +#define SUBCLIENT_MASK   0x18000000
> > +#define MI_CLIENT        0x00000000
> > +#define RC_CLIENT        0x60000000
> > +#define BC_CLIENT        0x40000000
> > +#define MEDIA_SUBCLIENT  0x10000000
> > +
> > +static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == RC_CLIENT) {
> > +		if (subclient == MEDIA_SUBCLIENT)
> > +			return 0xFFFF;
> > +		else
> > +			return 0xFF;
> > +	}
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +	u32 subclient = cmd_header & SUBCLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == RC_CLIENT) {
> > +		if (subclient == MEDIA_SUBCLIENT)
> > +			return 0xFFF;
> > +		else
> > +			return 0xFF;
> > +	}
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
> > +{
> > +	u32 client = cmd_header & CLIENT_MASK;
> > +
> > +	if (client == MI_CLIENT)
> > +		return 0x3F;
> > +	else if (client == BC_CLIENT)
> > +		return 0xFF;
> > +
> > +	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
> > +	return 0;
> > +}
> > +
> > +static void validate_cmds_sorted(struct intel_ring_buffer *ring)
> > +{
> > +	int i;
> > +
> > +	if (!ring->cmd_tables || ring->cmd_table_count == 0)
> > +		return;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
> > +		u32 previous = 0;
> > +		int j;
> > +
> > +		for (j = 0; j < table->count; j++) {
> > +			const struct drm_i915_cmd_descriptor *desc =
> > +				&table->table[i];
> > +			u32 curr = desc->cmd.value & desc->cmd.mask;
> > +
> > +			if (curr < previous) {
> > +				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
> > +					  ring->id, i, j, curr);
> > +				return;
> 
> So this checks the hand-filled tables, right?
> 
> I think this should not stop at the first error, but rather scan the
> whole table and DRM_ERROR all cases where curr < previous, and after the
> full scan BUG_ON() if there were any errors.

Will change, and below

> 
> > +			}
> > +
> > +			previous = curr;
> > +		}
> > +	}
> > +}
> > +
> > +static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
> > +{
> > +	int i;
> > +	u32 previous = 0;
> > +
> > +	for (i = 0; i < reg_count; i++) {
> > +		u32 curr = reg_table[i];
> > +
> > +		if (curr < previous) {
> > +			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
> > +				  ring_id, i, curr);
> > +			return;
> 
> Same here.
> 
> > +		}
> > +
> > +		previous = curr;
> > +	}
> > +}
> > +
> > +static void validate_regs_sorted(struct intel_ring_buffer *ring)
> > +{
> > +	if (ring->reg_table && ring->reg_count > 0)
> > +		check_sorted(ring->id, ring->reg_table, ring->reg_count);
> > +
> > +	if (ring->master_reg_table && ring->master_reg_count > 0)
> > +		check_sorted(ring->id, ring->master_reg_table,
> > +			     ring->master_reg_count);
> 
> Somehow I think the ifs here are redundant. check_sorted() is a no-op if
> reg_count == 0, and if reg_count > 0 while reg_table == NULL, it
> deserves to oops!

Agreed

> 
> > +}
> > +
> > +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
> > +{
> > +	if (!IS_GEN7(ring->dev))
> > +		return;
> > +
> > +	switch (ring->id) {
> > +	case RCS:
> > +		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
> > +		break;
> > +	case VCS:
> > +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> > +		break;
> > +	case BCS:
> > +		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
> > +		break;
> > +	case VECS:
> > +		/* VECS can use the same length_mask function as VCS */
> > +		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
> > +		break;
> > +	default:
> > +		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
> > +			  ring->id);
> 
> You'll oops later for calling NULL ring->get_cmd_length_mask(), so might
> as well BUG() here.

Agreed

> 
> > +	}
> > +
> > +	validate_cmds_sorted(ring);
> > +	validate_regs_sorted(ring);
> > +}
> > +
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd_in_table(const struct drm_i915_cmd_table *table,
> > +		  u32 cmd_header)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < table->count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
> > +		u32 masked_cmd = desc->cmd.mask & cmd_header;
> > +		u32 masked_value = desc->cmd.value & desc->cmd.mask;
> > +
> > +		if (masked_cmd == masked_value)
> > +			return desc;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * Returns a pointer to a descriptor for the command specified by cmd_header.
> > + *
> > + * The caller must supply space for a default descriptor via the default_desc
> > + * parameter. If no descriptor for the specified command exists in the ring's
> > + * command parser tables, this function fills in default_desc based on the
> > + * ring's default length encoding and returns default_desc.
> > + */
> > +static const struct drm_i915_cmd_descriptor*
> > +find_cmd(struct intel_ring_buffer *ring,
> > +	 u32 cmd_header,
> > +	 struct drm_i915_cmd_descriptor *default_desc)
> > +{
> > +	u32 mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ring->cmd_table_count; i++) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +
> > +		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
> > +		if (desc)
> > +			return desc;
> > +	}
> > +
> > +	mask = ring->get_cmd_length_mask(cmd_header);
> > +	if (!mask)
> > +		return NULL;
> > +
> > +	BUG_ON(!default_desc);
> > +	default_desc->flags = CMD_DESC_SKIP;
> > +	default_desc->length.mask = mask;
> > +
> > +	return default_desc;
> > +}
> > +
> > +static int valid_reg(const u32 *table, int count, u32 addr)
> 
> I like bools for boolean stuff.

I'll reevaluate int vs bool throughout. I think the use is a bit inconsistent
throughout the driver at the moment, but I don't mind improving it.

> 
> > +{
> > +	if (table && count != 0) {
> > +		int i;
> > +
> > +		for (i = 0; i < count; i++) {
> > +			if (table[i] == addr)
> > +				return 1;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static u32 *vmap_batch(struct drm_i915_gem_object *obj)
> > +{
> > +	int i;
> > +	void *addr = NULL;
> > +	struct sg_page_iter sg_iter;
> > +	struct page **pages;
> > +
> > +	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
> > +	if (pages == NULL) {
> > +		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> > +		goto finish;
> > +	}
> > +
> > +	i = 0;
> > +	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> > +		pages[i] = sg_page_iter_page(&sg_iter);
> > +		i++;
> > +	}
> > +
> > +	addr = vmap(pages, i, 0, PAGE_KERNEL);
> > +	if (addr == NULL) {
> > +		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
> > +		goto finish;
> > +	}
> > +
> > +finish:
> > +	if (pages)
> > +		drm_free_large(pages);
> > +	return (u32*)addr;
> > +}
> > +
> > +int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
> 
> bool
> 
> > +{
> > +	/* No command tables indicates a platform without parsing */
> > +	if (!ring->cmd_tables)
> > +		return 0;
> > +
> > +	return i915.enable_cmd_parser;
> > +}
> > +
> > +#define LENGTH_BIAS 2
> > +
> > +int i915_parse_cmds(struct intel_ring_buffer *ring,
> > +		    struct drm_i915_gem_object *batch_obj,
> > +		    u32 batch_start_offset,
> > +		    bool is_master)
> > +{
> > +	int ret = 0;
> > +	u32 *cmd, *batch_base, *batch_end;
> > +	struct drm_i915_cmd_descriptor default_desc = { 0 };
> > +	int needs_clflush = 0;
> > +
> > +	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
> > +	if (ret) {
> > +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> > +		return ret;
> > +	}
> > +
> > +	batch_base = vmap_batch(batch_obj);
> > +	if (!batch_base) {
> > +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> > +		i915_gem_object_unpin_pages(batch_obj);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (needs_clflush)
> > +		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
> > +
> > +	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
> > +	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
> > +
> > +	while (cmd < batch_end) {
> > +		const struct drm_i915_cmd_descriptor *desc;
> > +		u32 length;
> > +
> > +		if (*cmd == MI_BATCH_BUFFER_END)
> > +			break;
> > +
> > +		desc = find_cmd(ring, *cmd, &default_desc);
> > +		if (!desc) {
> > +			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
> > +					 *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_FIXED)
> > +			length = desc->length.fixed;
> > +		else
> > +			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
> > +
> > +		if ((batch_end - cmd) < length) {
> > +			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
> > +					 *cmd,
> > +					 length,
> > +					 batch_end - cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_REJECT) {
> > +			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
> > +			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
> > +					 *cmd);
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_REGISTER) {
> > +			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
> > +
> > +			if (!valid_reg(ring->reg_table,
> > +				       ring->reg_count, reg_addr)) {
> > +				if (!is_master ||
> > +				    !valid_reg(ring->master_reg_table,
> > +					       ring->master_reg_count,
> > +					       reg_addr)) {
> > +					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
> > +							 reg_addr,
> > +							 *cmd,
> > +							 ring->id);
> > +					ret = -EINVAL;
> > +					break;
> > +				}
> > +			}
> > +		}
> > +
> > +		if (desc->flags & CMD_DESC_BITMASK) {
> > +			int i;
> > +
> > +			for (i = 0; i < desc->bits_count; i++) {
> > +				u32 dword = cmd[desc->bits[i].offset] &
> > +					desc->bits[i].mask;
> > +
> > +				if (dword != desc->bits[i].expected) {
> > +					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
> > +							 *cmd,
> > +							 desc->bits[i].mask,
> > +							 desc->bits[i].expected,
> > +							 dword, ring->id);
> > +					ret = -EINVAL;
> > +					break;
> > +				}
> > +			}
> > +
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +		cmd += length;
> > +	}
> > +
> > +	if (cmd >= batch_end) {
> > +		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	vunmap(batch_base);
> > +
> > +	i915_gem_object_unpin_pages(batch_obj);
> > +
> > +	return ret;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index bfb30df..8aed80f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1765,6 +1765,91 @@ struct drm_i915_file_private {
> >  	atomic_t rps_wait_boost;
> >  };
> >  
> > +/**
> > + * A command that requires special handling by the command parser.
> > + */
> 
> You have plenty of kernel-doc comments here and in other patches. They
> do expect a certain format, however. Please either make them regular
> comments (the easy way) or adhere to proper kernel-doc format.

Again, current use is inconsistent in the driver, but I'll reevaluate throughout.

> 
> > +struct drm_i915_cmd_descriptor {
> > +	/**
> > +	 * Flags describing how the command parser processes the command.
> > +	 *
> > +	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
> > +	 *                 a length mask if not set
> > +	 * CMD_DESC_SKIP: The command is allowed but does not follow the
> > +	 *                standard length encoding for the opcode range in
> > +	 *                which it falls
> > +	 * CMD_DESC_REJECT: The command is never allowed
> > +	 * CMD_DESC_REGISTER: The command should be checked against the
> > +	 *                    register whitelist for the appropriate ring
> > +	 * CMD_DESC_MASTER: The command is allowed if the submitting process
> > +	 *                  is the DRM master
> > +	 */
> > +	u32 flags;
> > +#define CMD_DESC_FIXED    (1<<0)
> > +#define CMD_DESC_SKIP     (1<<1)
> > +#define CMD_DESC_REJECT   (1<<2)
> > +#define CMD_DESC_REGISTER (1<<3)
> > +#define CMD_DESC_BITMASK  (1<<4)
> > +#define CMD_DESC_MASTER   (1<<5)
> 
> Feels like flags should be named FLAG, not DESC. *shrug*.
> 
> > +
> > +	/**
> > +	 * The command's unique identification bits and the bitmask to get them.
> > +	 * This isn't strictly the opcode field as defined in the spec and may
> > +	 * also include type, subtype, and/or subop fields.
> > +	 */
> > +	struct {
> > +		u32 value;
> > +		u32 mask;
> > +	} cmd;
> > +
> > +	/**
> > +	 * The command's length. The command is either fixed length (i.e. does
> > +	 * not include a length field) or has a length field mask. The flag
> > +	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
> > +	 * a length mask. All command entries in a command table must include
> > +	 * length information.
> > +	 */
> > +	union {
> > +		u32 fixed;
> > +		u32 mask;
> > +	} length;
> > +
> > +	/**
> > +	 * Describes where to find a register address in the command to check
> > +	 * against the ring's register whitelist. Only valid if flags has the
> > +	 * CMD_DESC_REGISTER bit set.
> > +	 */
> > +	struct {
> > +		u32 offset;
> > +		u32 mask;
> > +	} reg;
> > +
> > +#define MAX_CMD_DESC_BITMASKS 3
> > +	/**
> > +	 * Describes command checks where a particular dword is masked and
> > +	 * compared against an expected value. If the command does not match
> > +	 * the expected value, the parser rejects it. Only valid if flags has
> > +	 * the CMD_DESC_BITMASK bit set.
> > +	 */
> > +	struct {
> > +		u32 offset;
> > +		u32 mask;
> > +		u32 expected;
> > +	} bits[MAX_CMD_DESC_BITMASKS];
> > +	/** Number of valid entries in the bits array */
> > +	int bits_count;
> > +};
> > +
> > +/**
> > + * A table of commands requiring special handling by the command parser.
> > + *
> > + * Each ring has an array of tables. Each table consists of an array of command
> > + * descriptors, which must be sorted with command opcodes in ascending order.
> > + */
> > +struct drm_i915_cmd_table {
> > +	const struct drm_i915_cmd_descriptor *table;
> > +	int count;
> > +};
> > +
> >  #define INTEL_INFO(dev)	(to_i915(dev)->info)
> >  
> >  #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
> > @@ -1923,6 +2008,7 @@ struct i915_params {
> >  	bool prefault_disable;
> >  	bool reset;
> >  	int invert_brightness;
> > +	int enable_cmd_parser;
> >  };
> >  extern struct i915_params i915 __read_mostly;
> >  
> > @@ -2428,6 +2514,14 @@ void i915_destroy_error_state(struct drm_device *dev);
> >  void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
> >  const char *i915_cache_level_str(int type);
> >  
> > +/* i915_cmd_parser.c */
> > +void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
> > +int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
> > +int i915_parse_cmds(struct intel_ring_buffer *ring,
> > +		    struct drm_i915_gem_object *batch_obj,
> > +		    u32 batch_start_offset,
> > +		    bool is_master);
> > +
> >  /* i915_suspend.c */
> >  extern int i915_save_state(struct drm_device *dev);
> >  extern int i915_restore_state(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 032def9..c953445 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1180,6 +1180,23 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  	}
> >  	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
> >  
> > +	if (i915_needs_cmd_parser(ring)) {
> > +		ret = i915_parse_cmds(ring,
> > +				      batch_obj,
> > +				      args->batch_start_offset,
> > +				      file->is_master);
> > +		if (ret)
> > +			goto err;
> > +
> > +		/*
> > +		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
> > +		 * from MI_BATCH_BUFFER_START commands issued in the
> > +		 * dispatch_execbuffer implementations. We specifically don't
> > +		 * want that set when the command parser is enabled.
> > +		 */
> > +		flags |= I915_DISPATCH_SECURE;
> > +	}
> > +
> >  	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
> >  	 * batch" bit. Hence we need to pin secure batches into the global gtt.
> >  	 * hsw should have this fixed, but bdw mucks it up again. */
> > diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> > index c743057..6d3d906 100644
> > --- a/drivers/gpu/drm/i915/i915_params.c
> > +++ b/drivers/gpu/drm/i915/i915_params.c
> > @@ -47,6 +47,7 @@ struct i915_params i915 __read_mostly = {
> >  	.prefault_disable = 0,
> >  	.reset = true,
> >  	.invert_brightness = 0,
> > +	.enable_cmd_parser = 0
> 
> Please add a comma in the end so the next addition won't have to, just
> like this doesn't have to touch the previous line.

Will do throughout

> 
> >  };
> >  
> >  module_param_named(modeset, i915.modeset, int, 0400);
> > @@ -153,3 +154,7 @@ MODULE_PARM_DESC(invert_brightness,
> >  	"report PCI device ID, subsystem vendor and subsystem device ID "
> >  	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
> >  	"It will then be included in an upcoming module version.");
> > +
> > +module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
> > +MODULE_PARM_DESC(enable_cmd_parser,
> > +		"Enable command parsing (default: false)");
> 
> If it's a bool, make it a bool, or change the default text to 0.

I'll change it to 0 for now. We might want to do the -1/0/1 style thing at some
point, though I don't necessarily have a good case for it right now.
- Brad

> 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index a0d61f8..77fc61d 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1388,6 +1388,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
> >  	if (IS_I830(ring->dev) || IS_845G(ring->dev))
> >  		ring->effective_size -= 128;
> >  
> > +	i915_cmd_parser_init_ring(ring);
> > +
> >  	return 0;
> >  
> >  err_unmap:
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 71a73f4..cff2b35 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -162,6 +162,38 @@ struct  intel_ring_buffer {
> >  		u32 gtt_offset;
> >  		volatile u32 *cpu_page;
> >  	} scratch;
> > +
> > +	/**
> > +	 * Tables of commands the command parser needs to know about
> > +	 * for this ring.
> > +	 */
> > +	const struct drm_i915_cmd_table *cmd_tables;
> > +	int cmd_table_count;
> > +
> > +	/**
> > +	 * Table of registers allowed in commands that read/write registers.
> > +	 */
> > +	const u32 *reg_table;
> > +	int reg_count;
> > +
> > +	/**
> > +	 * Table of registers allowed in commands that read/write registers, but
> > +	 * only from the DRM master.
> > +	 */
> > +	const u32 *master_reg_table;
> > +	int master_reg_count;
> > +
> > +	/**
> > +	 * Returns the bitmask for the length field of the specified command.
> > +	 * Return 0 for an unrecognized/invalid command.
> > +	 *
> > +	 * If the command parser finds an entry for a command in the ring's
> > +	 * cmd_tables, it gets the command's length based on the table entry.
> > +	 * If not, it calls this function to determine the per-ring length field
> > +	 * encoding for the command (i.e. certain opcode ranges use certain bits
> > +	 * to encode the command length in the header).
> > +	 */
> > +	u32 (*get_cmd_length_mask)(u32 cmd_header);
> >  };
> 
> Plenty of non-conforming kernel-doc comments here too.
> 
> >  
> >  static inline bool
> > -- 
> > 1.8.5.2
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
>
Jani Nikula Feb. 7, 2014, 1:58 p.m. UTC | #17
On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> +static int valid_reg(const u32 *table, int count, u32 addr)
> +{
> +	if (table && count != 0) {
> +		int i;
> +
> +		for (i = 0; i < count; i++) {
> +			if (table[i] == addr)
> +				return 1;
> +		}
> +	}

You go to great lengths to validate the register tables are sorted, but
in the end you don't take advantage of this fact by bailing out early if
the lookup goes past the addr.

Is this optimization the main reason for having the tables sorted, or
are there other reasons too (I couldn't find any)?

I'm beginning to wonder if this is a premature optimization that adds
extra code. For master restricted registers you will always scan the
regular reg table completely first. Perhaps a better option would be to
have all registers in the same table, with a separate master flag,
ordered by how frequently they are expected to be used. We do want to
optimize for the happy day scenario. But maybe it's too early to tell.

I'm inclined to ripping out the sort requirement and check, if the sole
purpose is optimization, for simplicity's sake.


BR,
Jani.
Daniel Vetter Feb. 7, 2014, 2:45 p.m. UTC | #18
On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
> On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > +static int valid_reg(const u32 *table, int count, u32 addr)
> > +{
> > +	if (table && count != 0) {
> > +		int i;
> > +
> > +		for (i = 0; i < count; i++) {
> > +			if (table[i] == addr)
> > +				return 1;
> > +		}
> > +	}
> 
> You go to great lengths to validate the register tables are sorted, but
> in the end you don't take advantage of this fact by bailing out early if
> the lookup goes past the addr.
> 
> Is this optimization the main reason for having the tables sorted, or
> are there other reasons too (I couldn't find any)?
> 
> I'm beginning to wonder if this is a premature optimization that adds
> extra code. For master restricted registers you will always scan the
> regular reg table completely first. Perhaps a better option would be to
> have all registers in the same table, with a separate master flag,
> ordered by how frequently they are expected to be used. We do want to
> optimize for the happy day scenario. But maybe it's too early to tell.
> 
> I'm inclined to ripping out the sort requirement and check, if the sole
> purpose is optimization, for simplicity's sake.

tbh I don't mind the sorting requirement, and iirc Brad has patches
already for binary search. Once we start to rely on the sorting we can
easily add a little functions which checks for that at ring
initialization, so I also don't see any concerns wrt code fragility.
-Daniel
bradley.d.volkin@intel.com Feb. 11, 2014, 6:12 p.m. UTC | #19
On Fri, Feb 07, 2014 at 06:45:48AM -0800, Daniel Vetter wrote:
> On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
> > On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
> > > +static int valid_reg(const u32 *table, int count, u32 addr)
> > > +{
> > > +	if (table && count != 0) {
> > > +		int i;
> > > +
> > > +		for (i = 0; i < count; i++) {
> > > +			if (table[i] == addr)
> > > +				return 1;
> > > +		}
> > > +	}
> > 
> > You go to great lengths to validate the register tables are sorted, but
> > in the end you don't take advantage of this fact by bailing out early if
> > the lookup goes past the addr.
> > 
> > Is this optimization the main reason for having the tables sorted, or
> > are there other reasons too (I couldn't find any)?
> > 
> > I'm beginning to wonder if this is a premature optimization that adds
> > extra code. For master restricted registers you will always scan the
> > regular reg table completely first. Perhaps a better option would be to
> > have all registers in the same table, with a separate master flag,
> > ordered by how frequently they are expected to be used. We do want to
> > optimize for the happy day scenario. But maybe it's too early to tell.
> > 
> > I'm inclined to ripping out the sort requirement and check, if the sole
> > purpose is optimization, for simplicity's sake.
> 
> tbh I don't mind the sorting requirement, and iirc Brad has patches
> already for binary search. Once we start to rely on the sorting we can
> easily add a little functions which checks for that at ring
> initialization, so I also don't see any concerns wrt code fragility.

Sorry for the delayed response. The background here is that I originally
just had the tables sorted with a comment to say as much. The idea was that
if the linear search became an issue, switching algorithms would be easier.
Chris suggested just moving to bsearch and checking that the tables are sorted
as part of the v1 series review. I implemented bsearch and found that the perf
change was the same to slightly worse. So I added the sorting check and kept
the linear search until we have better data.

Thanks,
Brad

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Jani Nikula Feb. 11, 2014, 6:21 p.m. UTC | #20
On Tue, 11 Feb 2014, "Volkin, Bradley D" <bradley.d.volkin@intel.com> wrote:
> On Fri, Feb 07, 2014 at 06:45:48AM -0800, Daniel Vetter wrote:
>> On Fri, Feb 07, 2014 at 03:58:46PM +0200, Jani Nikula wrote:
>> > On Wed, 29 Jan 2014, bradley.d.volkin@intel.com wrote:
>> > > +static int valid_reg(const u32 *table, int count, u32 addr)
>> > > +{
>> > > +	if (table && count != 0) {
>> > > +		int i;
>> > > +
>> > > +		for (i = 0; i < count; i++) {
>> > > +			if (table[i] == addr)
>> > > +				return 1;
>> > > +		}
>> > > +	}
>> > 
>> > You go to great lengths to validate the register tables are sorted, but
>> > in the end you don't take advantage of this fact by bailing out early if
>> > the lookup goes past the addr.
>> > 
>> > Is this optimization the main reason for having the tables sorted, or
>> > are there other reasons too (I couldn't find any)?
>> > 
>> > I'm beginning to wonder if this is a premature optimization that adds
>> > extra code. For master restricted registers you will always scan the
>> > regular reg table completely first. Perhaps a better option would be to
>> > have all registers in the same table, with a separate master flag,
>> > ordered by how frequently they are expected to be used. We do want to
>> > optimize for the happy day scenario. But maybe it's too early to tell.
>> > 
>> > I'm inclined to ripping out the sort requirement and check, if the sole
>> > purpose is optimization, for simplicity's sake.
>> 
>> tbh I don't mind the sorting requirement, and iirc Brad has patches
>> already for binary search. Once we start to rely on the sorting we can
>> easily add a little functions which checks for that at ring
>> initialization, so I also don't see any concerns wrt code fragility.
>
> Sorry for the delayed response. The background here is that I originally
> just had the tables sorted with a comment to say as much. The idea was that
> if the linear search became an issue, switching algorithms would be easier.
> Chris suggested just moving to bsearch and checking that the tables are sorted
> as part of the v1 series review. I implemented bsearch and found that the perf
> change was the same to slightly worse. So I added the sorting check and kept
> the linear search until we have better data.

Ok. For the linear search I think you could add the check if you've
iterated past the register and bail out early, and gather the data with
that.

BR,
Jani.


>
> Thanks,
> Brad
>
>> -Daniel
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 4850494..2da81bf 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -47,7 +47,8 @@  i915-y := i915_drv.o i915_dma.o i915_irq.o \
 	  dvo_tfp410.o \
 	  dvo_sil164.o \
 	  dvo_ns2501.o \
-	  i915_gem_dmabuf.o
+	  i915_gem_dmabuf.o \
+	  i915_cmd_parser.o
 
 i915-$(CONFIG_COMPAT)   += i915_ioc32.o
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
new file mode 100644
index 0000000..7639dbc
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -0,0 +1,404 @@ 
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Brad Volkin <bradley.d.volkin@intel.com>
+ *
+ */
+
+#include "i915_drv.h"
+
+#define CLIENT_MASK      0xE0000000
+#define SUBCLIENT_MASK   0x18000000
+#define MI_CLIENT        0x00000000
+#define RC_CLIENT        0x60000000
+#define BC_CLIENT        0x40000000
+#define MEDIA_SUBCLIENT  0x10000000
+
+static u32 gen7_render_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal rcs cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_bsd_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+	u32 subclient = cmd_header & SUBCLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == RC_CLIENT) {
+		if (subclient == MEDIA_SUBCLIENT)
+			return 0xFFF;
+		else
+			return 0xFF;
+	}
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal bsd cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static u32 gen7_blt_get_cmd_length_mask(u32 cmd_header)
+{
+	u32 client = cmd_header & CLIENT_MASK;
+
+	if (client == MI_CLIENT)
+		return 0x3F;
+	else if (client == BC_CLIENT)
+		return 0xFF;
+
+	DRM_DEBUG_DRIVER("CMD: Abnormal blt cmd length! 0x%08X\n", cmd_header);
+	return 0;
+}
+
+static void validate_cmds_sorted(struct intel_ring_buffer *ring)
+{
+	int i;
+
+	if (!ring->cmd_tables || ring->cmd_table_count == 0)
+		return;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_table *table = &ring->cmd_tables[i];
+		u32 previous = 0;
+		int j;
+
+		for (j = 0; j < table->count; j++) {
+			const struct drm_i915_cmd_descriptor *desc =
+				&table->table[i];
+			u32 curr = desc->cmd.value & desc->cmd.mask;
+
+			if (curr < previous) {
+				DRM_ERROR("CMD: table not sorted ring=%d table=%d entry=%d cmd=0x%08X\n",
+					  ring->id, i, j, curr);
+				return;
+			}
+
+			previous = curr;
+		}
+	}
+}
+
+static void check_sorted(int ring_id, const u32 *reg_table, int reg_count)
+{
+	int i;
+	u32 previous = 0;
+
+	for (i = 0; i < reg_count; i++) {
+		u32 curr = reg_table[i];
+
+		if (curr < previous) {
+			DRM_ERROR("CMD: table not sorted ring=%d entry=%d reg=0x%08X\n",
+				  ring_id, i, curr);
+			return;
+		}
+
+		previous = curr;
+	}
+}
+
+static void validate_regs_sorted(struct intel_ring_buffer *ring)
+{
+	if (ring->reg_table && ring->reg_count > 0)
+		check_sorted(ring->id, ring->reg_table, ring->reg_count);
+
+	if (ring->master_reg_table && ring->master_reg_count > 0)
+		check_sorted(ring->id, ring->master_reg_table,
+			     ring->master_reg_count);
+}
+
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring)
+{
+	if (!IS_GEN7(ring->dev))
+		return;
+
+	switch (ring->id) {
+	case RCS:
+		ring->get_cmd_length_mask = gen7_render_get_cmd_length_mask;
+		break;
+	case VCS:
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
+		break;
+	case BCS:
+		ring->get_cmd_length_mask = gen7_blt_get_cmd_length_mask;
+		break;
+	case VECS:
+		/* VECS can use the same length_mask function as VCS */
+		ring->get_cmd_length_mask = gen7_bsd_get_cmd_length_mask;
+		break;
+	default:
+		DRM_ERROR("CMD: cmd_parser_init with unknown ring: %d\n",
+			  ring->id);
+	}
+
+	validate_cmds_sorted(ring);
+	validate_regs_sorted(ring);
+}
+
+static const struct drm_i915_cmd_descriptor*
+find_cmd_in_table(const struct drm_i915_cmd_table *table,
+		  u32 cmd_header)
+{
+	int i;
+
+	for (i = 0; i < table->count; i++) {
+		const struct drm_i915_cmd_descriptor *desc = &table->table[i];
+		u32 masked_cmd = desc->cmd.mask & cmd_header;
+		u32 masked_value = desc->cmd.value & desc->cmd.mask;
+
+		if (masked_cmd == masked_value)
+			return desc;
+	}
+
+	return NULL;
+}
+
+/*
+ * Returns a pointer to a descriptor for the command specified by cmd_header.
+ *
+ * The caller must supply space for a default descriptor via the default_desc
+ * parameter. If no descriptor for the specified command exists in the ring's
+ * command parser tables, this function fills in default_desc based on the
+ * ring's default length encoding and returns default_desc.
+ */
+static const struct drm_i915_cmd_descriptor*
+find_cmd(struct intel_ring_buffer *ring,
+	 u32 cmd_header,
+	 struct drm_i915_cmd_descriptor *default_desc)
+{
+	u32 mask;
+	int i;
+
+	for (i = 0; i < ring->cmd_table_count; i++) {
+		const struct drm_i915_cmd_descriptor *desc;
+
+		desc = find_cmd_in_table(&ring->cmd_tables[i], cmd_header);
+		if (desc)
+			return desc;
+	}
+
+	mask = ring->get_cmd_length_mask(cmd_header);
+	if (!mask)
+		return NULL;
+
+	BUG_ON(!default_desc);
+	default_desc->flags = CMD_DESC_SKIP;
+	default_desc->length.mask = mask;
+
+	return default_desc;
+}
+
+static int valid_reg(const u32 *table, int count, u32 addr)
+{
+	if (table && count != 0) {
+		int i;
+
+		for (i = 0; i < count; i++) {
+			if (table[i] == addr)
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
+static u32 *vmap_batch(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return (u32*)addr;
+}
+
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring)
+{
+	/* No command tables indicates a platform without parsing */
+	if (!ring->cmd_tables)
+		return 0;
+
+	return i915.enable_cmd_parser;
+}
+
+#define LENGTH_BIAS 2
+
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset,
+		    bool is_master)
+{
+	int ret = 0;
+	u32 *cmd, *batch_base, *batch_end;
+	struct drm_i915_cmd_descriptor default_desc = { 0 };
+	int needs_clflush = 0;
+
+	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		return ret;
+	}
+
+	batch_base = vmap_batch(batch_obj);
+	if (!batch_base) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		i915_gem_object_unpin_pages(batch_obj);
+		return -ENOMEM;
+	}
+
+	if (needs_clflush)
+		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
+
+	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
+
+	while (cmd < batch_end) {
+		const struct drm_i915_cmd_descriptor *desc;
+		u32 length;
+
+		if (*cmd == MI_BATCH_BUFFER_END)
+			break;
+
+		desc = find_cmd(ring, *cmd, &default_desc);
+		if (!desc) {
+			DRM_DEBUG_DRIVER("CMD: Unrecognized command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_FIXED)
+			length = desc->length.fixed;
+		else
+			length = ((*cmd & desc->length.mask) + LENGTH_BIAS);
+
+		if ((batch_end - cmd) < length) {
+			DRM_DEBUG_DRIVER("CMD: Command length exceeds batch length: 0x%08X length=%d batchlen=%ld\n",
+					 *cmd,
+					 length,
+					 batch_end - cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_REJECT) {
+			DRM_DEBUG_DRIVER("CMD: Rejected command: 0x%08X\n", *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if ((desc->flags & CMD_DESC_MASTER) && !is_master) {
+			DRM_DEBUG_DRIVER("CMD: Rejected master-only command: 0x%08X\n",
+					 *cmd);
+			ret = -EINVAL;
+			break;
+		}
+
+		if (desc->flags & CMD_DESC_REGISTER) {
+			u32 reg_addr = cmd[desc->reg.offset] & desc->reg.mask;
+
+			if (!valid_reg(ring->reg_table,
+				       ring->reg_count, reg_addr)) {
+				if (!is_master ||
+				    !valid_reg(ring->master_reg_table,
+					       ring->master_reg_count,
+					       reg_addr)) {
+					DRM_DEBUG_DRIVER("CMD: Rejected register 0x%08X in command: 0x%08X (ring=%d)\n",
+							 reg_addr,
+							 *cmd,
+							 ring->id);
+					ret = -EINVAL;
+					break;
+				}
+			}
+		}
+
+		if (desc->flags & CMD_DESC_BITMASK) {
+			int i;
+
+			for (i = 0; i < desc->bits_count; i++) {
+				u32 dword = cmd[desc->bits[i].offset] &
+					desc->bits[i].mask;
+
+				if (dword != desc->bits[i].expected) {
+					DRM_DEBUG_DRIVER("CMD: Rejected command 0x%08X for bitmask 0x%08X (exp=0x%08X act=0x%08X) (ring=%d)\n",
+							 *cmd,
+							 desc->bits[i].mask,
+							 desc->bits[i].expected,
+							 dword, ring->id);
+					ret = -EINVAL;
+					break;
+				}
+			}
+
+			if (ret)
+				break;
+		}
+
+		cmd += length;
+	}
+
+	if (cmd >= batch_end) {
+		DRM_DEBUG_DRIVER("CMD: Got to the end of the buffer w/o a BBE cmd!\n");
+		ret = -EINVAL;
+	}
+
+	vunmap(batch_base);
+
+	i915_gem_object_unpin_pages(batch_obj);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bfb30df..8aed80f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1765,6 +1765,91 @@  struct drm_i915_file_private {
 	atomic_t rps_wait_boost;
 };
 
+/**
+ * A command that requires special handling by the command parser.
+ */
+struct drm_i915_cmd_descriptor {
+	/**
+	 * Flags describing how the command parser processes the command.
+	 *
+	 * CMD_DESC_FIXED: The command has a fixed length if this is set,
+	 *                 a length mask if not set
+	 * CMD_DESC_SKIP: The command is allowed but does not follow the
+	 *                standard length encoding for the opcode range in
+	 *                which it falls
+	 * CMD_DESC_REJECT: The command is never allowed
+	 * CMD_DESC_REGISTER: The command should be checked against the
+	 *                    register whitelist for the appropriate ring
+	 * CMD_DESC_MASTER: The command is allowed if the submitting process
+	 *                  is the DRM master
+	 */
+	u32 flags;
+#define CMD_DESC_FIXED    (1<<0)
+#define CMD_DESC_SKIP     (1<<1)
+#define CMD_DESC_REJECT   (1<<2)
+#define CMD_DESC_REGISTER (1<<3)
+#define CMD_DESC_BITMASK  (1<<4)
+#define CMD_DESC_MASTER   (1<<5)
+
+	/**
+	 * The command's unique identification bits and the bitmask to get them.
+	 * This isn't strictly the opcode field as defined in the spec and may
+	 * also include type, subtype, and/or subop fields.
+	 */
+	struct {
+		u32 value;
+		u32 mask;
+	} cmd;
+
+	/**
+	 * The command's length. The command is either fixed length (i.e. does
+	 * not include a length field) or has a length field mask. The flag
+	 * CMD_DESC_FIXED indicates a fixed length. Otherwise, the command has
+	 * a length mask. All command entries in a command table must include
+	 * length information.
+	 */
+	union {
+		u32 fixed;
+		u32 mask;
+	} length;
+
+	/**
+	 * Describes where to find a register address in the command to check
+	 * against the ring's register whitelist. Only valid if flags has the
+	 * CMD_DESC_REGISTER bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+	} reg;
+
+#define MAX_CMD_DESC_BITMASKS 3
+	/**
+	 * Describes command checks where a particular dword is masked and
+	 * compared against an expected value. If the command does not match
+	 * the expected value, the parser rejects it. Only valid if flags has
+	 * the CMD_DESC_BITMASK bit set.
+	 */
+	struct {
+		u32 offset;
+		u32 mask;
+		u32 expected;
+	} bits[MAX_CMD_DESC_BITMASKS];
+	/** Number of valid entries in the bits array */
+	int bits_count;
+};
+
+/**
+ * A table of commands requiring special handling by the command parser.
+ *
+ * Each ring has an array of tables. Each table consists of an array of command
+ * descriptors, which must be sorted with command opcodes in ascending order.
+ */
+struct drm_i915_cmd_table {
+	const struct drm_i915_cmd_descriptor *table;
+	int count;
+};
+
 #define INTEL_INFO(dev)	(to_i915(dev)->info)
 
 #define IS_I830(dev)		((dev)->pdev->device == 0x3577)
@@ -1923,6 +2008,7 @@  struct i915_params {
 	bool prefault_disable;
 	bool reset;
 	int invert_brightness;
+	int enable_cmd_parser;
 };
 extern struct i915_params i915 __read_mostly;
 
@@ -2428,6 +2514,14 @@  void i915_destroy_error_state(struct drm_device *dev);
 void i915_get_extra_instdone(struct drm_device *dev, uint32_t *instdone);
 const char *i915_cache_level_str(int type);
 
+/* i915_cmd_parser.c */
+void i915_cmd_parser_init_ring(struct intel_ring_buffer *ring);
+int i915_needs_cmd_parser(struct intel_ring_buffer *ring);
+int i915_parse_cmds(struct intel_ring_buffer *ring,
+		    struct drm_i915_gem_object *batch_obj,
+		    u32 batch_start_offset,
+		    bool is_master);
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
 extern int i915_restore_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 032def9..c953445 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1180,6 +1180,23 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
+	if (i915_needs_cmd_parser(ring)) {
+		ret = i915_parse_cmds(ring,
+				      batch_obj,
+				      args->batch_start_offset,
+				      file->is_master);
+		if (ret)
+			goto err;
+
+		/*
+		 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
+		 * from MI_BATCH_BUFFER_START commands issued in the
+		 * dispatch_execbuffer implementations. We specifically don't
+		 * want that set when the command parser is enabled.
+		 */
+		flags |= I915_DISPATCH_SECURE;
+	}
+
 	/* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index c743057..6d3d906 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -47,6 +47,7 @@  struct i915_params i915 __read_mostly = {
 	.prefault_disable = 0,
 	.reset = true,
 	.invert_brightness = 0,
+	.enable_cmd_parser = 0
 };
 
 module_param_named(modeset, i915.modeset, int, 0400);
@@ -153,3 +154,7 @@  MODULE_PARM_DESC(invert_brightness,
 	"report PCI device ID, subsystem vendor and subsystem device ID "
 	"to dri-devel@lists.freedesktop.org, if your machine needs it. "
 	"It will then be included in an upcoming module version.");
+
+module_param_named(enable_cmd_parser, i915.enable_cmd_parser, int, 0600);
+MODULE_PARM_DESC(enable_cmd_parser,
+		"Enable command parsing (default: false)");
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a0d61f8..77fc61d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1388,6 +1388,8 @@  static int intel_init_ring_buffer(struct drm_device *dev,
 	if (IS_I830(ring->dev) || IS_845G(ring->dev))
 		ring->effective_size -= 128;
 
+	i915_cmd_parser_init_ring(ring);
+
 	return 0;
 
 err_unmap:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71a73f4..cff2b35 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -162,6 +162,38 @@  struct  intel_ring_buffer {
 		u32 gtt_offset;
 		volatile u32 *cpu_page;
 	} scratch;
+
+	/**
+	 * Tables of commands the command parser needs to know about
+	 * for this ring.
+	 */
+	const struct drm_i915_cmd_table *cmd_tables;
+	int cmd_table_count;
+
+	/**
+	 * Table of registers allowed in commands that read/write registers.
+	 */
+	const u32 *reg_table;
+	int reg_count;
+
+	/**
+	 * Table of registers allowed in commands that read/write registers, but
+	 * only from the DRM master.
+	 */
+	const u32 *master_reg_table;
+	int master_reg_count;
+
+	/**
+	 * Returns the bitmask for the length field of the specified command.
+	 * Return 0 for an unrecognized/invalid command.
+	 *
+	 * If the command parser finds an entry for a command in the ring's
+	 * cmd_tables, it gets the command's length based on the table entry.
+	 * If not, it calls this function to determine the per-ring length field
+	 * encoding for the command (i.e. certain opcode ranges use certain bits
+	 * to encode the command length in the header).
+	 */
+	u32 (*get_cmd_length_mask)(u32 cmd_header);
 };
 
 static inline bool