[08/21] drm/i915/gem: Disallow bonding of virtual engines

Message ID	20210423223131.879208-9-jason@jlekstrand.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=xpeQ=JU=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E75261131 From: Jason Ekstrand <jason@jlekstrand.net> To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 23 Apr 2021 17:31:18 -0500 Message-Id: <20210423223131.879208-9-jason@jlekstrand.net> In-Reply-To: <20210423223131.879208-1-jason@jlekstrand.net> References: <20210423223131.879208-1-jason@jlekstrand.net> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/21] drm/i915/gem: Disallow bonding of virtual engines Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915/gem: ioctl clean-ups \| expand [00/21] drm/i915/gem: ioctl clean-ups [01/21] drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE [02/21] drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP [03/21] drm/i915/gem: Set the watchdog timeout directly in intel_context_set_gem [04/21] drm/i915/gem: Return void from context_apply_all [05/21] drm/i915: Drop the CONTEXT_CLONE API [06/21] drm/i915: Implement SINGLE_TIMELINE with a syncobj (v3) [07/21] drm/i915: Drop getparam support for I915_CONTEXT_PARAM_ENGINES [08/21] drm/i915/gem: Disallow bonding of virtual engines [09/21] drm/i915/gem: Disallow creating contexts with too many engines [10/21] drm/i915/request: Remove the hook from await_execution [11/21] drm/i915: Stop manually RCU banging in reset_stats_ioctl [12/21] drm/i915/gem: Add a separate validate_priority helper [13/21] drm/i915/gem: Add an intermediate proto_context struct [14/21] drm/i915/gem: Return an error ptr from context_lookup [15/21] drm/i915/gt: Drop i915_address_space::file [16/21] drm/i915/gem: Delay context creation [17/21] drm/i915/gem: Don't allow changing the VM on running contexts [18/21] drm/i915/gem: Don't allow changing the engine set on running contexts [19/21] drm/i915/selftests: Take a VM in kernel_context() [20/21] i915/gem/selftests: Assign the VM at context creation in igt_shared_ctx_exec [21/21] drm/i915/gem: Roll all of context creation together

Jason Ekstrand April 23, 2021, 10:31 p.m. UTC

This adds a bunch of complexity which the media driver has never
actually used.  The media driver does technically bond a balanced engine
to another engine but the balanced engine only has one engine in the
sibling set.  This doesn't actually result in a virtual engine.

Unless some userspace badly wants it, there's no good reason to support
this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
leave the validation code in place in case we ever decide we want to do
something interesting with the bonding information.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
 .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
 .../drm/i915/gt/intel_execlists_submission.h  |   4 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
 6 files changed, 7 insertions(+), 353 deletions(-)

Jason Ekstrand April 27, 2021, 1:51 p.m. UTC | #1

On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> This adds a bunch of complexity which the media driver has never
> actually used.  The media driver does technically bond a balanced engine
> to another engine but the balanced engine only has one engine in the
> sibling set.  This doesn't actually result in a virtual engine.
>
> Unless some userspace badly wants it, there's no good reason to support
> this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> leave the validation code in place in case we ever decide we want to do
> something interesting with the bonding information.
>
> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
>  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
>  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
>  6 files changed, 7 insertions(+), 353 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index e8179918fa306..5f8d0faf783aa 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
>         }
>         virtual = set->engines->engines[idx]->engine;
>
> +       if (intel_engine_is_virtual(virtual)) {
> +               drm_dbg(&i915->drm,
> +                       "Bonding with virtual engines not allowed\n");
> +               return -EINVAL;
> +       }
> +
>         err = check_user_mbz(&ext->flags);
>         if (err)
>                 return err;
> @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
>                                 n, ci.engine_class, ci.engine_instance);
>                         return -EINVAL;
>                 }
> -
> -               /*
> -                * A non-virtual engine has no siblings to choose between; and
> -                * a submit fence will always be directed to the one engine.
> -                */
> -               if (intel_engine_is_virtual(virtual)) {
> -                       err = intel_virtual_engine_attach_bond(virtual,
> -                                                              master,
> -                                                              bond);
> -                       if (err)
> -                               return err;
> -               }
>         }
>
>         return 0;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index d640bba6ad9ab..efb2fa3522a42 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
>                         err = i915_request_await_execution(eb.request,
>                                                            in_fence,
> -                                                          eb.engine->bond_execute);
> +                                                          NULL);
>                 else
>                         err = i915_request_await_dma_fence(eb.request,
>                                                            in_fence);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 883bafc449024..68cfe5080325c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -446,13 +446,6 @@ struct intel_engine_cs {
>          */
>         void            (*submit_request)(struct i915_request *rq);
>
> -       /*
> -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> -        * request down to the bonded pairs.
> -        */
> -       void            (*bond_execute)(struct i915_request *rq,
> -                                       struct dma_fence *signal);
> -
>         /*
>          * Call when the priority on a request has changed and it and its
>          * dependencies may need rescheduling. Note the request itself may
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de124870af44d..b6e2b59f133b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -181,18 +181,6 @@ struct virtual_engine {
>                 int prio;
>         } nodes[I915_NUM_ENGINES];
>
> -       /*
> -        * Keep track of bonded pairs -- restrictions upon on our selection
> -        * of physical engines any particular request may be submitted to.
> -        * If we receive a submit-fence from a master engine, we will only
> -        * use one of sibling_mask physical engines.
> -        */
> -       struct ve_bond {
> -               const struct intel_engine_cs *master;
> -               intel_engine_mask_t sibling_mask;
> -       } *bonds;
> -       unsigned int num_bonds;
> -
>         /* And finally, which physical engines this virtual engine maps onto. */
>         unsigned int num_siblings;
>         struct intel_engine_cs *siblings[];
> @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>         intel_breadcrumbs_free(ve->base.breadcrumbs);
>         intel_engine_free_request_pool(&ve->base);
>
> -       kfree(ve->bonds);
>         kfree(ve);
>  }
>
> @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
>         spin_unlock_irqrestore(&ve->base.active.lock, flags);
>  }
>
> -static struct ve_bond *
> -virtual_find_bond(struct virtual_engine *ve,
> -                 const struct intel_engine_cs *master)
> -{
> -       int i;
> -
> -       for (i = 0; i < ve->num_bonds; i++) {
> -               if (ve->bonds[i].master == master)
> -                       return &ve->bonds[i];
> -       }
> -
> -       return NULL;
> -}
> -
> -static void
> -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> -{
> -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> -       intel_engine_mask_t allowed, exec;
> -       struct ve_bond *bond;
> -
> -       allowed = ~to_request(signal)->engine->mask;
> -
> -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> -       if (bond)
> -               allowed &= bond->sibling_mask;
> -
> -       /* Restrict the bonded request to run on only the available engines */
> -       exec = READ_ONCE(rq->execution_mask);
> -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> -               ;
> -
> -       /* Prevent the master from being re-run on the bonded engines */
> -       to_request(signal)->execution_mask &= ~allowed;

I sent a v2 of this patch because it turns out I deleted a bit too
much code.  This function in particular, has to stay, unfortunately.
When a batch is submitted with a SUBMIT_FENCE, this is used to push
the work onto a different engine than than the one it's supposed to
run in parallel with.  This means we can't dead-code this function or
the bond_execution function pointer and related stuff.

--Jason


> -}
> -
>  struct intel_context *
>  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>                                unsigned int count)
> @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>
>         ve->base.schedule = i915_schedule;
>         ve->base.submit_request = virtual_submit_request;
> -       ve->base.bond_execute = virtual_bond_execute;
>
>         INIT_LIST_HEAD(virtual_queue(ve));
>         ve->base.execlists.queue_priority_hint = INT_MIN;
> @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
>         if (IS_ERR(dst))
>                 return dst;
>
> -       if (se->num_bonds) {
> -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> -
> -               de->bonds = kmemdup(se->bonds,
> -                                   sizeof(*se->bonds) * se->num_bonds,
> -                                   GFP_KERNEL);
> -               if (!de->bonds) {
> -                       intel_context_put(dst);
> -                       return ERR_PTR(-ENOMEM);
> -               }
> -
> -               de->num_bonds = se->num_bonds;
> -       }
> -
>         return dst;
>  }
>
> -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> -                                    const struct intel_engine_cs *master,
> -                                    const struct intel_engine_cs *sibling)
> -{
> -       struct virtual_engine *ve = to_virtual_engine(engine);
> -       struct ve_bond *bond;
> -       int n;
> -
> -       /* Sanity check the sibling is part of the virtual engine */
> -       for (n = 0; n < ve->num_siblings; n++)
> -               if (sibling == ve->siblings[n])
> -                       break;
> -       if (n == ve->num_siblings)
> -               return -EINVAL;
> -
> -       bond = virtual_find_bond(ve, master);
> -       if (bond) {
> -               bond->sibling_mask |= sibling->mask;
> -               return 0;
> -       }
> -
> -       bond = krealloc(ve->bonds,
> -                       sizeof(*bond) * (ve->num_bonds + 1),
> -                       GFP_KERNEL);
> -       if (!bond)
> -               return -ENOMEM;
> -
> -       bond[ve->num_bonds].master = master;
> -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> -
> -       ve->bonds = bond;
> -       ve->num_bonds++;
> -
> -       return 0;
> -}
> -
>  void intel_execlists_show_requests(struct intel_engine_cs *engine,
>                                    struct drm_printer *m,
>                                    void (*show_request)(struct drm_printer *m,
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index fd61dae820e9e..80cec37a56ba9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>  struct intel_context *
>  intel_execlists_clone_virtual(struct intel_engine_cs *src);
>
> -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> -                                    const struct intel_engine_cs *master,
> -                                    const struct intel_engine_cs *sibling);
> -
>  bool
>  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1081cd36a2bd3..f03446d587160 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
>         return 0;
>  }
>
> -static int bond_virtual_engine(struct intel_gt *gt,
> -                              unsigned int class,
> -                              struct intel_engine_cs **siblings,
> -                              unsigned int nsibling,
> -                              unsigned int flags)
> -#define BOND_SCHEDULE BIT(0)
> -{
> -       struct intel_engine_cs *master;
> -       struct i915_request *rq[16];
> -       enum intel_engine_id id;
> -       struct igt_spinner spin;
> -       unsigned long n;
> -       int err;
> -
> -       /*
> -        * A set of bonded requests is intended to be run concurrently
> -        * across a number of engines. We use one request per-engine
> -        * and a magic fence to schedule each of the bonded requests
> -        * at the same time. A consequence of our current scheduler is that
> -        * we only move requests to the HW ready queue when the request
> -        * becomes ready, that is when all of its prerequisite fences have
> -        * been signaled. As one of those fences is the master submit fence,
> -        * there is a delay on all secondary fences as the HW may be
> -        * currently busy. Equally, as all the requests are independent,
> -        * they may have other fences that delay individual request
> -        * submission to HW. Ergo, we do not guarantee that all requests are
> -        * immediately submitted to HW at the same time, just that if the
> -        * rules are abided by, they are ready at the same time as the
> -        * first is submitted. Userspace can embed semaphores in its batch
> -        * to ensure parallel execution of its phases as it requires.
> -        * Though naturally it gets requested that perhaps the scheduler should
> -        * take care of parallel execution, even across preemption events on
> -        * different HW. (The proper answer is of course "lalalala".)
> -        *
> -        * With the submit-fence, we have identified three possible phases
> -        * of synchronisation depending on the master fence: queued (not
> -        * ready), executing, and signaled. The first two are quite simple
> -        * and checked below. However, the signaled master fence handling is
> -        * contentious. Currently we do not distinguish between a signaled
> -        * fence and an expired fence, as once signaled it does not convey
> -        * any information about the previous execution. It may even be freed
> -        * and hence checking later it may not exist at all. Ergo we currently
> -        * do not apply the bonding constraint for an already signaled fence,
> -        * as our expectation is that it should not constrain the secondaries
> -        * and is outside of the scope of the bonded request API (i.e. all
> -        * userspace requests are meant to be running in parallel). As
> -        * it imposes no constraint, and is effectively a no-op, we do not
> -        * check below as normal execution flows are checked extensively above.
> -        *
> -        * XXX Is the degenerate handling of signaled submit fences the
> -        * expected behaviour for userpace?
> -        */
> -
> -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> -
> -       if (igt_spinner_init(&spin, gt))
> -               return -ENOMEM;
> -
> -       err = 0;
> -       rq[0] = ERR_PTR(-ENOMEM);
> -       for_each_engine(master, gt, id) {
> -               struct i915_sw_fence fence = {};
> -               struct intel_context *ce;
> -
> -               if (master->class == class)
> -                       continue;
> -
> -               ce = intel_context_create(master);
> -               if (IS_ERR(ce)) {
> -                       err = PTR_ERR(ce);
> -                       goto out;
> -               }
> -
> -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> -
> -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> -               intel_context_put(ce);
> -               if (IS_ERR(rq[0])) {
> -                       err = PTR_ERR(rq[0]);
> -                       goto out;
> -               }
> -               i915_request_get(rq[0]);
> -
> -               if (flags & BOND_SCHEDULE) {
> -                       onstack_fence_init(&fence);
> -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> -                                                              &fence,
> -                                                              GFP_KERNEL);
> -               }
> -
> -               i915_request_add(rq[0]);
> -               if (err < 0)
> -                       goto out;
> -
> -               if (!(flags & BOND_SCHEDULE) &&
> -                   !igt_wait_for_spinner(&spin, rq[0])) {
> -                       err = -EIO;
> -                       goto out;
> -               }
> -
> -               for (n = 0; n < nsibling; n++) {
> -                       struct intel_context *ve;
> -
> -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> -                       if (IS_ERR(ve)) {
> -                               err = PTR_ERR(ve);
> -                               onstack_fence_fini(&fence);
> -                               goto out;
> -                       }
> -
> -                       err = intel_virtual_engine_attach_bond(ve->engine,
> -                                                              master,
> -                                                              siblings[n]);
> -                       if (err) {
> -                               intel_context_put(ve);
> -                               onstack_fence_fini(&fence);
> -                               goto out;
> -                       }
> -
> -                       err = intel_context_pin(ve);
> -                       intel_context_put(ve);
> -                       if (err) {
> -                               onstack_fence_fini(&fence);
> -                               goto out;
> -                       }
> -
> -                       rq[n + 1] = i915_request_create(ve);
> -                       intel_context_unpin(ve);
> -                       if (IS_ERR(rq[n + 1])) {
> -                               err = PTR_ERR(rq[n + 1]);
> -                               onstack_fence_fini(&fence);
> -                               goto out;
> -                       }
> -                       i915_request_get(rq[n + 1]);
> -
> -                       err = i915_request_await_execution(rq[n + 1],
> -                                                          &rq[0]->fence,
> -                                                          ve->engine->bond_execute);
> -                       i915_request_add(rq[n + 1]);
> -                       if (err < 0) {
> -                               onstack_fence_fini(&fence);
> -                               goto out;
> -                       }
> -               }
> -               onstack_fence_fini(&fence);
> -               intel_engine_flush_submission(master);
> -               igt_spinner_end(&spin);
> -
> -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> -                       pr_err("Master request did not execute (on %s)!\n",
> -                              rq[0]->engine->name);
> -                       err = -EIO;
> -                       goto out;
> -               }
> -
> -               for (n = 0; n < nsibling; n++) {
> -                       if (i915_request_wait(rq[n + 1], 0,
> -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> -                               err = -EIO;
> -                               goto out;
> -                       }
> -
> -                       if (rq[n + 1]->engine != siblings[n]) {
> -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> -                                      siblings[n]->name,
> -                                      rq[n + 1]->engine->name,
> -                                      rq[0]->engine->name);
> -                               err = -EINVAL;
> -                               goto out;
> -                       }
> -               }
> -
> -               for (n = 0; !IS_ERR(rq[n]); n++)
> -                       i915_request_put(rq[n]);
> -               rq[0] = ERR_PTR(-ENOMEM);
> -       }
> -
> -out:
> -       for (n = 0; !IS_ERR(rq[n]); n++)
> -               i915_request_put(rq[n]);
> -       if (igt_flush_test(gt->i915))
> -               err = -EIO;
> -
> -       igt_spinner_fini(&spin);
> -       return err;
> -}
> -
> -static int live_virtual_bond(void *arg)
> -{
> -       static const struct phase {
> -               const char *name;
> -               unsigned int flags;
> -       } phases[] = {
> -               { "", 0 },
> -               { "schedule", BOND_SCHEDULE },
> -               { },
> -       };
> -       struct intel_gt *gt = arg;
> -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> -       unsigned int class;
> -       int err;
> -
> -       if (intel_uc_uses_guc_submission(&gt->uc))
> -               return 0;
> -
> -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> -               const struct phase *p;
> -               int nsibling;
> -
> -               nsibling = select_siblings(gt, class, siblings);
> -               if (nsibling < 2)
> -                       continue;
> -
> -               for (p = phases; p->name; p++) {
> -                       err = bond_virtual_engine(gt,
> -                                                 class, siblings, nsibling,
> -                                                 p->flags);
> -                       if (err) {
> -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> -                                      __func__, p->name, class, nsibling, err);
> -                               return err;
> -                       }
> -               }
> -       }
> -
> -       return 0;
> -}
> -
>  static int reset_virtual_engine(struct intel_gt *gt,
>                                 struct intel_engine_cs **siblings,
>                                 unsigned int nsibling)
> @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>                 SUBTEST(live_virtual_mask),
>                 SUBTEST(live_virtual_preserved),
>                 SUBTEST(live_virtual_slice),
> -               SUBTEST(live_virtual_bond),
>                 SUBTEST(live_virtual_reset),
>         };
>
> --
> 2.31.1
>

Daniel Vetter April 28, 2021, 10:13 a.m. UTC | #2

On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > This adds a bunch of complexity which the media driver has never
> > actually used.  The media driver does technically bond a balanced engine
> > to another engine but the balanced engine only has one engine in the
> > sibling set.  This doesn't actually result in a virtual engine.
> >
> > Unless some userspace badly wants it, there's no good reason to support
> > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > leave the validation code in place in case we ever decide we want to do
> > something interesting with the bonding information.
> >
> > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> >  6 files changed, 7 insertions(+), 353 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index e8179918fa306..5f8d0faf783aa 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> >         }
> >         virtual = set->engines->engines[idx]->engine;
> >
> > +       if (intel_engine_is_virtual(virtual)) {
> > +               drm_dbg(&i915->drm,
> > +                       "Bonding with virtual engines not allowed\n");
> > +               return -EINVAL;
> > +       }
> > +
> >         err = check_user_mbz(&ext->flags);
> >         if (err)
> >                 return err;
> > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> >                                 n, ci.engine_class, ci.engine_instance);
> >                         return -EINVAL;
> >                 }
> > -
> > -               /*
> > -                * A non-virtual engine has no siblings to choose between; and
> > -                * a submit fence will always be directed to the one engine.
> > -                */
> > -               if (intel_engine_is_virtual(virtual)) {
> > -                       err = intel_virtual_engine_attach_bond(virtual,
> > -                                                              master,
> > -                                                              bond);
> > -                       if (err)
> > -                               return err;
> > -               }
> >         }
> >
> >         return 0;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index d640bba6ad9ab..efb2fa3522a42 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> >                         err = i915_request_await_execution(eb.request,
> >                                                            in_fence,
> > -                                                          eb.engine->bond_execute);
> > +                                                          NULL);
> >                 else
> >                         err = i915_request_await_dma_fence(eb.request,
> >                                                            in_fence);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 883bafc449024..68cfe5080325c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> >          */
> >         void            (*submit_request)(struct i915_request *rq);
> >
> > -       /*
> > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > -        * request down to the bonded pairs.
> > -        */
> > -       void            (*bond_execute)(struct i915_request *rq,
> > -                                       struct dma_fence *signal);
> > -
> >         /*
> >          * Call when the priority on a request has changed and it and its
> >          * dependencies may need rescheduling. Note the request itself may
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index de124870af44d..b6e2b59f133b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -181,18 +181,6 @@ struct virtual_engine {
> >                 int prio;
> >         } nodes[I915_NUM_ENGINES];
> >
> > -       /*
> > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > -        * of physical engines any particular request may be submitted to.
> > -        * If we receive a submit-fence from a master engine, we will only
> > -        * use one of sibling_mask physical engines.
> > -        */
> > -       struct ve_bond {
> > -               const struct intel_engine_cs *master;
> > -               intel_engine_mask_t sibling_mask;
> > -       } *bonds;
> > -       unsigned int num_bonds;
> > -
> >         /* And finally, which physical engines this virtual engine maps onto. */
> >         unsigned int num_siblings;
> >         struct intel_engine_cs *siblings[];
> > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> >         intel_engine_free_request_pool(&ve->base);
> >
> > -       kfree(ve->bonds);
> >         kfree(ve);
> >  }
> >
> > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> >  }
> >
> > -static struct ve_bond *
> > -virtual_find_bond(struct virtual_engine *ve,
> > -                 const struct intel_engine_cs *master)
> > -{
> > -       int i;
> > -
> > -       for (i = 0; i < ve->num_bonds; i++) {
> > -               if (ve->bonds[i].master == master)
> > -                       return &ve->bonds[i];
> > -       }
> > -
> > -       return NULL;
> > -}
> > -
> > -static void
> > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > -{
> > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > -       intel_engine_mask_t allowed, exec;
> > -       struct ve_bond *bond;
> > -
> > -       allowed = ~to_request(signal)->engine->mask;
> > -
> > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > -       if (bond)
> > -               allowed &= bond->sibling_mask;
> > -
> > -       /* Restrict the bonded request to run on only the available engines */
> > -       exec = READ_ONCE(rq->execution_mask);
> > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > -               ;
> > -
> > -       /* Prevent the master from being re-run on the bonded engines */
> > -       to_request(signal)->execution_mask &= ~allowed;
> 
> I sent a v2 of this patch because it turns out I deleted a bit too
> much code.  This function in particular, has to stay, unfortunately.
> When a batch is submitted with a SUBMIT_FENCE, this is used to push
> the work onto a different engine than than the one it's supposed to
> run in parallel with.  This means we can't dead-code this function or
> the bond_execution function pointer and related stuff.

Uh that's disappointing, since if I understand your point correctly, the
sibling engines should all be singletons, not load balancing virtual ones.
So there really should not be any need to pick the right one at execution
time.

At least my understanding is that we're only limiting the engine set
further, so if both signaller and signalled request can only run on
singletons (which must be distinct, or the bonded parameter validation is
busted) there's really nothing to do here.

Also this is the locking code that freaks me out about the current bonded
execlist code ...

Dazzled and confused.
-Daniel

> 
> --Jason
> 
> 
> > -}
> > -
> >  struct intel_context *
> >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >                                unsigned int count)
> > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >
> >         ve->base.schedule = i915_schedule;
> >         ve->base.submit_request = virtual_submit_request;
> > -       ve->base.bond_execute = virtual_bond_execute;
> >
> >         INIT_LIST_HEAD(virtual_queue(ve));
> >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> >         if (IS_ERR(dst))
> >                 return dst;
> >
> > -       if (se->num_bonds) {
> > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > -
> > -               de->bonds = kmemdup(se->bonds,
> > -                                   sizeof(*se->bonds) * se->num_bonds,
> > -                                   GFP_KERNEL);
> > -               if (!de->bonds) {
> > -                       intel_context_put(dst);
> > -                       return ERR_PTR(-ENOMEM);
> > -               }
> > -
> > -               de->num_bonds = se->num_bonds;
> > -       }
> > -
> >         return dst;
> >  }
> >
> > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > -                                    const struct intel_engine_cs *master,
> > -                                    const struct intel_engine_cs *sibling)
> > -{
> > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > -       struct ve_bond *bond;
> > -       int n;
> > -
> > -       /* Sanity check the sibling is part of the virtual engine */
> > -       for (n = 0; n < ve->num_siblings; n++)
> > -               if (sibling == ve->siblings[n])
> > -                       break;
> > -       if (n == ve->num_siblings)
> > -               return -EINVAL;
> > -
> > -       bond = virtual_find_bond(ve, master);
> > -       if (bond) {
> > -               bond->sibling_mask |= sibling->mask;
> > -               return 0;
> > -       }
> > -
> > -       bond = krealloc(ve->bonds,
> > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > -                       GFP_KERNEL);
> > -       if (!bond)
> > -               return -ENOMEM;
> > -
> > -       bond[ve->num_bonds].master = master;
> > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > -
> > -       ve->bonds = bond;
> > -       ve->num_bonds++;
> > -
> > -       return 0;
> > -}
> > -
> >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >                                    struct drm_printer *m,
> >                                    void (*show_request)(struct drm_printer *m,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > index fd61dae820e9e..80cec37a56ba9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >  struct intel_context *
> >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> >
> > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > -                                    const struct intel_engine_cs *master,
> > -                                    const struct intel_engine_cs *sibling);
> > -
> >  bool
> >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> >
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 1081cd36a2bd3..f03446d587160 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> >         return 0;
> >  }
> >
> > -static int bond_virtual_engine(struct intel_gt *gt,
> > -                              unsigned int class,
> > -                              struct intel_engine_cs **siblings,
> > -                              unsigned int nsibling,
> > -                              unsigned int flags)
> > -#define BOND_SCHEDULE BIT(0)
> > -{
> > -       struct intel_engine_cs *master;
> > -       struct i915_request *rq[16];
> > -       enum intel_engine_id id;
> > -       struct igt_spinner spin;
> > -       unsigned long n;
> > -       int err;
> > -
> > -       /*
> > -        * A set of bonded requests is intended to be run concurrently
> > -        * across a number of engines. We use one request per-engine
> > -        * and a magic fence to schedule each of the bonded requests
> > -        * at the same time. A consequence of our current scheduler is that
> > -        * we only move requests to the HW ready queue when the request
> > -        * becomes ready, that is when all of its prerequisite fences have
> > -        * been signaled. As one of those fences is the master submit fence,
> > -        * there is a delay on all secondary fences as the HW may be
> > -        * currently busy. Equally, as all the requests are independent,
> > -        * they may have other fences that delay individual request
> > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > -        * immediately submitted to HW at the same time, just that if the
> > -        * rules are abided by, they are ready at the same time as the
> > -        * first is submitted. Userspace can embed semaphores in its batch
> > -        * to ensure parallel execution of its phases as it requires.
> > -        * Though naturally it gets requested that perhaps the scheduler should
> > -        * take care of parallel execution, even across preemption events on
> > -        * different HW. (The proper answer is of course "lalalala".)
> > -        *
> > -        * With the submit-fence, we have identified three possible phases
> > -        * of synchronisation depending on the master fence: queued (not
> > -        * ready), executing, and signaled. The first two are quite simple
> > -        * and checked below. However, the signaled master fence handling is
> > -        * contentious. Currently we do not distinguish between a signaled
> > -        * fence and an expired fence, as once signaled it does not convey
> > -        * any information about the previous execution. It may even be freed
> > -        * and hence checking later it may not exist at all. Ergo we currently
> > -        * do not apply the bonding constraint for an already signaled fence,
> > -        * as our expectation is that it should not constrain the secondaries
> > -        * and is outside of the scope of the bonded request API (i.e. all
> > -        * userspace requests are meant to be running in parallel). As
> > -        * it imposes no constraint, and is effectively a no-op, we do not
> > -        * check below as normal execution flows are checked extensively above.
> > -        *
> > -        * XXX Is the degenerate handling of signaled submit fences the
> > -        * expected behaviour for userpace?
> > -        */
> > -
> > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > -
> > -       if (igt_spinner_init(&spin, gt))
> > -               return -ENOMEM;
> > -
> > -       err = 0;
> > -       rq[0] = ERR_PTR(-ENOMEM);
> > -       for_each_engine(master, gt, id) {
> > -               struct i915_sw_fence fence = {};
> > -               struct intel_context *ce;
> > -
> > -               if (master->class == class)
> > -                       continue;
> > -
> > -               ce = intel_context_create(master);
> > -               if (IS_ERR(ce)) {
> > -                       err = PTR_ERR(ce);
> > -                       goto out;
> > -               }
> > -
> > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > -
> > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > -               intel_context_put(ce);
> > -               if (IS_ERR(rq[0])) {
> > -                       err = PTR_ERR(rq[0]);
> > -                       goto out;
> > -               }
> > -               i915_request_get(rq[0]);
> > -
> > -               if (flags & BOND_SCHEDULE) {
> > -                       onstack_fence_init(&fence);
> > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > -                                                              &fence,
> > -                                                              GFP_KERNEL);
> > -               }
> > -
> > -               i915_request_add(rq[0]);
> > -               if (err < 0)
> > -                       goto out;
> > -
> > -               if (!(flags & BOND_SCHEDULE) &&
> > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > -                       err = -EIO;
> > -                       goto out;
> > -               }
> > -
> > -               for (n = 0; n < nsibling; n++) {
> > -                       struct intel_context *ve;
> > -
> > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > -                       if (IS_ERR(ve)) {
> > -                               err = PTR_ERR(ve);
> > -                               onstack_fence_fini(&fence);
> > -                               goto out;
> > -                       }
> > -
> > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > -                                                              master,
> > -                                                              siblings[n]);
> > -                       if (err) {
> > -                               intel_context_put(ve);
> > -                               onstack_fence_fini(&fence);
> > -                               goto out;
> > -                       }
> > -
> > -                       err = intel_context_pin(ve);
> > -                       intel_context_put(ve);
> > -                       if (err) {
> > -                               onstack_fence_fini(&fence);
> > -                               goto out;
> > -                       }
> > -
> > -                       rq[n + 1] = i915_request_create(ve);
> > -                       intel_context_unpin(ve);
> > -                       if (IS_ERR(rq[n + 1])) {
> > -                               err = PTR_ERR(rq[n + 1]);
> > -                               onstack_fence_fini(&fence);
> > -                               goto out;
> > -                       }
> > -                       i915_request_get(rq[n + 1]);
> > -
> > -                       err = i915_request_await_execution(rq[n + 1],
> > -                                                          &rq[0]->fence,
> > -                                                          ve->engine->bond_execute);
> > -                       i915_request_add(rq[n + 1]);
> > -                       if (err < 0) {
> > -                               onstack_fence_fini(&fence);
> > -                               goto out;
> > -                       }
> > -               }
> > -               onstack_fence_fini(&fence);
> > -               intel_engine_flush_submission(master);
> > -               igt_spinner_end(&spin);
> > -
> > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > -                       pr_err("Master request did not execute (on %s)!\n",
> > -                              rq[0]->engine->name);
> > -                       err = -EIO;
> > -                       goto out;
> > -               }
> > -
> > -               for (n = 0; n < nsibling; n++) {
> > -                       if (i915_request_wait(rq[n + 1], 0,
> > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > -                               err = -EIO;
> > -                               goto out;
> > -                       }
> > -
> > -                       if (rq[n + 1]->engine != siblings[n]) {
> > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > -                                      siblings[n]->name,
> > -                                      rq[n + 1]->engine->name,
> > -                                      rq[0]->engine->name);
> > -                               err = -EINVAL;
> > -                               goto out;
> > -                       }
> > -               }
> > -
> > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > -                       i915_request_put(rq[n]);
> > -               rq[0] = ERR_PTR(-ENOMEM);
> > -       }
> > -
> > -out:
> > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > -               i915_request_put(rq[n]);
> > -       if (igt_flush_test(gt->i915))
> > -               err = -EIO;
> > -
> > -       igt_spinner_fini(&spin);
> > -       return err;
> > -}
> > -
> > -static int live_virtual_bond(void *arg)
> > -{
> > -       static const struct phase {
> > -               const char *name;
> > -               unsigned int flags;
> > -       } phases[] = {
> > -               { "", 0 },
> > -               { "schedule", BOND_SCHEDULE },
> > -               { },
> > -       };
> > -       struct intel_gt *gt = arg;
> > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > -       unsigned int class;
> > -       int err;
> > -
> > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > -               return 0;
> > -
> > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > -               const struct phase *p;
> > -               int nsibling;
> > -
> > -               nsibling = select_siblings(gt, class, siblings);
> > -               if (nsibling < 2)
> > -                       continue;
> > -
> > -               for (p = phases; p->name; p++) {
> > -                       err = bond_virtual_engine(gt,
> > -                                                 class, siblings, nsibling,
> > -                                                 p->flags);
> > -                       if (err) {
> > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > -                                      __func__, p->name, class, nsibling, err);
> > -                               return err;
> > -                       }
> > -               }
> > -       }
> > -
> > -       return 0;
> > -}
> > -
> >  static int reset_virtual_engine(struct intel_gt *gt,
> >                                 struct intel_engine_cs **siblings,
> >                                 unsigned int nsibling)
> > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> >                 SUBTEST(live_virtual_mask),
> >                 SUBTEST(live_virtual_preserved),
> >                 SUBTEST(live_virtual_slice),
> > -               SUBTEST(live_virtual_bond),
> >                 SUBTEST(live_virtual_reset),
> >         };
> >
> > --
> > 2.31.1
> >
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Tvrtko Ursulin April 28, 2021, 3:51 p.m. UTC | #3

On 23/04/2021 23:31, Jason Ekstrand wrote:
> This adds a bunch of complexity which the media driver has never
> actually used.  The media driver does technically bond a balanced engine
> to another engine but the balanced engine only has one engine in the
> sibling set.  This doesn't actually result in a virtual engine.

For historical reference, this is not because uapi was over-engineered 
but because certain SKUs never materialized.

Regards,

Tvrtko

> Unless some userspace badly wants it, there's no good reason to support
> this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> leave the validation code in place in case we ever decide we want to do
> something interesting with the bonding information.
> 
> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
>   .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
>   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
>   6 files changed, 7 insertions(+), 353 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index e8179918fa306..5f8d0faf783aa 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
>   	}
>   	virtual = set->engines->engines[idx]->engine;
>   
> +	if (intel_engine_is_virtual(virtual)) {
> +		drm_dbg(&i915->drm,
> +			"Bonding with virtual engines not allowed\n");
> +		return -EINVAL;
> +	}
> +
>   	err = check_user_mbz(&ext->flags);
>   	if (err)
>   		return err;
> @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
>   				n, ci.engine_class, ci.engine_instance);
>   			return -EINVAL;
>   		}
> -
> -		/*
> -		 * A non-virtual engine has no siblings to choose between; and
> -		 * a submit fence will always be directed to the one engine.
> -		 */
> -		if (intel_engine_is_virtual(virtual)) {
> -			err = intel_virtual_engine_attach_bond(virtual,
> -							       master,
> -							       bond);
> -			if (err)
> -				return err;
> -		}
>   	}
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index d640bba6ad9ab..efb2fa3522a42 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   		if (args->flags & I915_EXEC_FENCE_SUBMIT)
>   			err = i915_request_await_execution(eb.request,
>   							   in_fence,
> -							   eb.engine->bond_execute);
> +							   NULL);
>   		else
>   			err = i915_request_await_dma_fence(eb.request,
>   							   in_fence);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 883bafc449024..68cfe5080325c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -446,13 +446,6 @@ struct intel_engine_cs {
>   	 */
>   	void		(*submit_request)(struct i915_request *rq);
>   
> -	/*
> -	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> -	 * request down to the bonded pairs.
> -	 */
> -	void            (*bond_execute)(struct i915_request *rq,
> -					struct dma_fence *signal);
> -
>   	/*
>   	 * Call when the priority on a request has changed and it and its
>   	 * dependencies may need rescheduling. Note the request itself may
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de124870af44d..b6e2b59f133b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -181,18 +181,6 @@ struct virtual_engine {
>   		int prio;
>   	} nodes[I915_NUM_ENGINES];
>   
> -	/*
> -	 * Keep track of bonded pairs -- restrictions upon on our selection
> -	 * of physical engines any particular request may be submitted to.
> -	 * If we receive a submit-fence from a master engine, we will only
> -	 * use one of sibling_mask physical engines.
> -	 */
> -	struct ve_bond {
> -		const struct intel_engine_cs *master;
> -		intel_engine_mask_t sibling_mask;
> -	} *bonds;
> -	unsigned int num_bonds;
> -
>   	/* And finally, which physical engines this virtual engine maps onto. */
>   	unsigned int num_siblings;
>   	struct intel_engine_cs *siblings[];
> @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	intel_breadcrumbs_free(ve->base.breadcrumbs);
>   	intel_engine_free_request_pool(&ve->base);
>   
> -	kfree(ve->bonds);
>   	kfree(ve);
>   }
>   
> @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
>   	spin_unlock_irqrestore(&ve->base.active.lock, flags);
>   }
>   
> -static struct ve_bond *
> -virtual_find_bond(struct virtual_engine *ve,
> -		  const struct intel_engine_cs *master)
> -{
> -	int i;
> -
> -	for (i = 0; i < ve->num_bonds; i++) {
> -		if (ve->bonds[i].master == master)
> -			return &ve->bonds[i];
> -	}
> -
> -	return NULL;
> -}
> -
> -static void
> -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> -{
> -	struct virtual_engine *ve = to_virtual_engine(rq->engine);
> -	intel_engine_mask_t allowed, exec;
> -	struct ve_bond *bond;
> -
> -	allowed = ~to_request(signal)->engine->mask;
> -
> -	bond = virtual_find_bond(ve, to_request(signal)->engine);
> -	if (bond)
> -		allowed &= bond->sibling_mask;
> -
> -	/* Restrict the bonded request to run on only the available engines */
> -	exec = READ_ONCE(rq->execution_mask);
> -	while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> -		;
> -
> -	/* Prevent the master from being re-run on the bonded engines */
> -	to_request(signal)->execution_mask &= ~allowed;
> -}
> -
>   struct intel_context *
>   intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   			       unsigned int count)
> @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   
>   	ve->base.schedule = i915_schedule;
>   	ve->base.submit_request = virtual_submit_request;
> -	ve->base.bond_execute = virtual_bond_execute;
>   
>   	INIT_LIST_HEAD(virtual_queue(ve));
>   	ve->base.execlists.queue_priority_hint = INT_MIN;
> @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
>   	if (IS_ERR(dst))
>   		return dst;
>   
> -	if (se->num_bonds) {
> -		struct virtual_engine *de = to_virtual_engine(dst->engine);
> -
> -		de->bonds = kmemdup(se->bonds,
> -				    sizeof(*se->bonds) * se->num_bonds,
> -				    GFP_KERNEL);
> -		if (!de->bonds) {
> -			intel_context_put(dst);
> -			return ERR_PTR(-ENOMEM);
> -		}
> -
> -		de->num_bonds = se->num_bonds;
> -	}
> -
>   	return dst;
>   }
>   
> -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> -				     const struct intel_engine_cs *master,
> -				     const struct intel_engine_cs *sibling)
> -{
> -	struct virtual_engine *ve = to_virtual_engine(engine);
> -	struct ve_bond *bond;
> -	int n;
> -
> -	/* Sanity check the sibling is part of the virtual engine */
> -	for (n = 0; n < ve->num_siblings; n++)
> -		if (sibling == ve->siblings[n])
> -			break;
> -	if (n == ve->num_siblings)
> -		return -EINVAL;
> -
> -	bond = virtual_find_bond(ve, master);
> -	if (bond) {
> -		bond->sibling_mask |= sibling->mask;
> -		return 0;
> -	}
> -
> -	bond = krealloc(ve->bonds,
> -			sizeof(*bond) * (ve->num_bonds + 1),
> -			GFP_KERNEL);
> -	if (!bond)
> -		return -ENOMEM;
> -
> -	bond[ve->num_bonds].master = master;
> -	bond[ve->num_bonds].sibling_mask = sibling->mask;
> -
> -	ve->bonds = bond;
> -	ve->num_bonds++;
> -
> -	return 0;
> -}
> -
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index fd61dae820e9e..80cec37a56ba9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   struct intel_context *
>   intel_execlists_clone_virtual(struct intel_engine_cs *src);
>   
> -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> -				     const struct intel_engine_cs *master,
> -				     const struct intel_engine_cs *sibling);
> -
>   bool
>   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1081cd36a2bd3..f03446d587160 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
>   	return 0;
>   }
>   
> -static int bond_virtual_engine(struct intel_gt *gt,
> -			       unsigned int class,
> -			       struct intel_engine_cs **siblings,
> -			       unsigned int nsibling,
> -			       unsigned int flags)
> -#define BOND_SCHEDULE BIT(0)
> -{
> -	struct intel_engine_cs *master;
> -	struct i915_request *rq[16];
> -	enum intel_engine_id id;
> -	struct igt_spinner spin;
> -	unsigned long n;
> -	int err;
> -
> -	/*
> -	 * A set of bonded requests is intended to be run concurrently
> -	 * across a number of engines. We use one request per-engine
> -	 * and a magic fence to schedule each of the bonded requests
> -	 * at the same time. A consequence of our current scheduler is that
> -	 * we only move requests to the HW ready queue when the request
> -	 * becomes ready, that is when all of its prerequisite fences have
> -	 * been signaled. As one of those fences is the master submit fence,
> -	 * there is a delay on all secondary fences as the HW may be
> -	 * currently busy. Equally, as all the requests are independent,
> -	 * they may have other fences that delay individual request
> -	 * submission to HW. Ergo, we do not guarantee that all requests are
> -	 * immediately submitted to HW at the same time, just that if the
> -	 * rules are abided by, they are ready at the same time as the
> -	 * first is submitted. Userspace can embed semaphores in its batch
> -	 * to ensure parallel execution of its phases as it requires.
> -	 * Though naturally it gets requested that perhaps the scheduler should
> -	 * take care of parallel execution, even across preemption events on
> -	 * different HW. (The proper answer is of course "lalalala".)
> -	 *
> -	 * With the submit-fence, we have identified three possible phases
> -	 * of synchronisation depending on the master fence: queued (not
> -	 * ready), executing, and signaled. The first two are quite simple
> -	 * and checked below. However, the signaled master fence handling is
> -	 * contentious. Currently we do not distinguish between a signaled
> -	 * fence and an expired fence, as once signaled it does not convey
> -	 * any information about the previous execution. It may even be freed
> -	 * and hence checking later it may not exist at all. Ergo we currently
> -	 * do not apply the bonding constraint for an already signaled fence,
> -	 * as our expectation is that it should not constrain the secondaries
> -	 * and is outside of the scope of the bonded request API (i.e. all
> -	 * userspace requests are meant to be running in parallel). As
> -	 * it imposes no constraint, and is effectively a no-op, we do not
> -	 * check below as normal execution flows are checked extensively above.
> -	 *
> -	 * XXX Is the degenerate handling of signaled submit fences the
> -	 * expected behaviour for userpace?
> -	 */
> -
> -	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> -
> -	if (igt_spinner_init(&spin, gt))
> -		return -ENOMEM;
> -
> -	err = 0;
> -	rq[0] = ERR_PTR(-ENOMEM);
> -	for_each_engine(master, gt, id) {
> -		struct i915_sw_fence fence = {};
> -		struct intel_context *ce;
> -
> -		if (master->class == class)
> -			continue;
> -
> -		ce = intel_context_create(master);
> -		if (IS_ERR(ce)) {
> -			err = PTR_ERR(ce);
> -			goto out;
> -		}
> -
> -		memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> -
> -		rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> -		intel_context_put(ce);
> -		if (IS_ERR(rq[0])) {
> -			err = PTR_ERR(rq[0]);
> -			goto out;
> -		}
> -		i915_request_get(rq[0]);
> -
> -		if (flags & BOND_SCHEDULE) {
> -			onstack_fence_init(&fence);
> -			err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> -							       &fence,
> -							       GFP_KERNEL);
> -		}
> -
> -		i915_request_add(rq[0]);
> -		if (err < 0)
> -			goto out;
> -
> -		if (!(flags & BOND_SCHEDULE) &&
> -		    !igt_wait_for_spinner(&spin, rq[0])) {
> -			err = -EIO;
> -			goto out;
> -		}
> -
> -		for (n = 0; n < nsibling; n++) {
> -			struct intel_context *ve;
> -
> -			ve = intel_execlists_create_virtual(siblings, nsibling);
> -			if (IS_ERR(ve)) {
> -				err = PTR_ERR(ve);
> -				onstack_fence_fini(&fence);
> -				goto out;
> -			}
> -
> -			err = intel_virtual_engine_attach_bond(ve->engine,
> -							       master,
> -							       siblings[n]);
> -			if (err) {
> -				intel_context_put(ve);
> -				onstack_fence_fini(&fence);
> -				goto out;
> -			}
> -
> -			err = intel_context_pin(ve);
> -			intel_context_put(ve);
> -			if (err) {
> -				onstack_fence_fini(&fence);
> -				goto out;
> -			}
> -
> -			rq[n + 1] = i915_request_create(ve);
> -			intel_context_unpin(ve);
> -			if (IS_ERR(rq[n + 1])) {
> -				err = PTR_ERR(rq[n + 1]);
> -				onstack_fence_fini(&fence);
> -				goto out;
> -			}
> -			i915_request_get(rq[n + 1]);
> -
> -			err = i915_request_await_execution(rq[n + 1],
> -							   &rq[0]->fence,
> -							   ve->engine->bond_execute);
> -			i915_request_add(rq[n + 1]);
> -			if (err < 0) {
> -				onstack_fence_fini(&fence);
> -				goto out;
> -			}
> -		}
> -		onstack_fence_fini(&fence);
> -		intel_engine_flush_submission(master);
> -		igt_spinner_end(&spin);
> -
> -		if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> -			pr_err("Master request did not execute (on %s)!\n",
> -			       rq[0]->engine->name);
> -			err = -EIO;
> -			goto out;
> -		}
> -
> -		for (n = 0; n < nsibling; n++) {
> -			if (i915_request_wait(rq[n + 1], 0,
> -					      MAX_SCHEDULE_TIMEOUT) < 0) {
> -				err = -EIO;
> -				goto out;
> -			}
> -
> -			if (rq[n + 1]->engine != siblings[n]) {
> -				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> -				       siblings[n]->name,
> -				       rq[n + 1]->engine->name,
> -				       rq[0]->engine->name);
> -				err = -EINVAL;
> -				goto out;
> -			}
> -		}
> -
> -		for (n = 0; !IS_ERR(rq[n]); n++)
> -			i915_request_put(rq[n]);
> -		rq[0] = ERR_PTR(-ENOMEM);
> -	}
> -
> -out:
> -	for (n = 0; !IS_ERR(rq[n]); n++)
> -		i915_request_put(rq[n]);
> -	if (igt_flush_test(gt->i915))
> -		err = -EIO;
> -
> -	igt_spinner_fini(&spin);
> -	return err;
> -}
> -
> -static int live_virtual_bond(void *arg)
> -{
> -	static const struct phase {
> -		const char *name;
> -		unsigned int flags;
> -	} phases[] = {
> -		{ "", 0 },
> -		{ "schedule", BOND_SCHEDULE },
> -		{ },
> -	};
> -	struct intel_gt *gt = arg;
> -	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> -	unsigned int class;
> -	int err;
> -
> -	if (intel_uc_uses_guc_submission(&gt->uc))
> -		return 0;
> -
> -	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> -		const struct phase *p;
> -		int nsibling;
> -
> -		nsibling = select_siblings(gt, class, siblings);
> -		if (nsibling < 2)
> -			continue;
> -
> -		for (p = phases; p->name; p++) {
> -			err = bond_virtual_engine(gt,
> -						  class, siblings, nsibling,
> -						  p->flags);
> -			if (err) {
> -				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> -				       __func__, p->name, class, nsibling, err);
> -				return err;
> -			}
> -		}
> -	}
> -
> -	return 0;
> -}
> -
>   static int reset_virtual_engine(struct intel_gt *gt,
>   				struct intel_engine_cs **siblings,
>   				unsigned int nsibling)
> @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_virtual_mask),
>   		SUBTEST(live_virtual_preserved),
>   		SUBTEST(live_virtual_slice),
> -		SUBTEST(live_virtual_bond),
>   		SUBTEST(live_virtual_reset),
>   	};
>   
>

Jason Ekstrand April 28, 2021, 5:18 p.m. UTC | #4

On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > >
> > > This adds a bunch of complexity which the media driver has never
> > > actually used.  The media driver does technically bond a balanced engine
> > > to another engine but the balanced engine only has one engine in the
> > > sibling set.  This doesn't actually result in a virtual engine.
> > >
> > > Unless some userspace badly wants it, there's no good reason to support
> > > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > > leave the validation code in place in case we ever decide we want to do
> > > something interesting with the bonding information.
> > >
> > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> > >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> > >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> > >  6 files changed, 7 insertions(+), 353 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > index e8179918fa306..5f8d0faf783aa 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > >         }
> > >         virtual = set->engines->engines[idx]->engine;
> > >
> > > +       if (intel_engine_is_virtual(virtual)) {
> > > +               drm_dbg(&i915->drm,
> > > +                       "Bonding with virtual engines not allowed\n");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > >         err = check_user_mbz(&ext->flags);
> > >         if (err)
> > >                 return err;
> > > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > >                                 n, ci.engine_class, ci.engine_instance);
> > >                         return -EINVAL;
> > >                 }
> > > -
> > > -               /*
> > > -                * A non-virtual engine has no siblings to choose between; and
> > > -                * a submit fence will always be directed to the one engine.
> > > -                */
> > > -               if (intel_engine_is_virtual(virtual)) {
> > > -                       err = intel_virtual_engine_attach_bond(virtual,
> > > -                                                              master,
> > > -                                                              bond);
> > > -                       if (err)
> > > -                               return err;
> > > -               }
> > >         }
> > >
> > >         return 0;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > index d640bba6ad9ab..efb2fa3522a42 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > >                         err = i915_request_await_execution(eb.request,
> > >                                                            in_fence,
> > > -                                                          eb.engine->bond_execute);
> > > +                                                          NULL);
> > >                 else
> > >                         err = i915_request_await_dma_fence(eb.request,
> > >                                                            in_fence);
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index 883bafc449024..68cfe5080325c 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> > >          */
> > >         void            (*submit_request)(struct i915_request *rq);
> > >
> > > -       /*
> > > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > > -        * request down to the bonded pairs.
> > > -        */
> > > -       void            (*bond_execute)(struct i915_request *rq,
> > > -                                       struct dma_fence *signal);
> > > -
> > >         /*
> > >          * Call when the priority on a request has changed and it and its
> > >          * dependencies may need rescheduling. Note the request itself may
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > index de124870af44d..b6e2b59f133b7 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > @@ -181,18 +181,6 @@ struct virtual_engine {
> > >                 int prio;
> > >         } nodes[I915_NUM_ENGINES];
> > >
> > > -       /*
> > > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > > -        * of physical engines any particular request may be submitted to.
> > > -        * If we receive a submit-fence from a master engine, we will only
> > > -        * use one of sibling_mask physical engines.
> > > -        */
> > > -       struct ve_bond {
> > > -               const struct intel_engine_cs *master;
> > > -               intel_engine_mask_t sibling_mask;
> > > -       } *bonds;
> > > -       unsigned int num_bonds;
> > > -
> > >         /* And finally, which physical engines this virtual engine maps onto. */
> > >         unsigned int num_siblings;
> > >         struct intel_engine_cs *siblings[];
> > > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> > >         intel_engine_free_request_pool(&ve->base);
> > >
> > > -       kfree(ve->bonds);
> > >         kfree(ve);
> > >  }
> > >
> > > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> > >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > >  }
> > >
> > > -static struct ve_bond *
> > > -virtual_find_bond(struct virtual_engine *ve,
> > > -                 const struct intel_engine_cs *master)
> > > -{
> > > -       int i;
> > > -
> > > -       for (i = 0; i < ve->num_bonds; i++) {
> > > -               if (ve->bonds[i].master == master)
> > > -                       return &ve->bonds[i];
> > > -       }
> > > -
> > > -       return NULL;
> > > -}
> > > -
> > > -static void
> > > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > > -{
> > > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > > -       intel_engine_mask_t allowed, exec;
> > > -       struct ve_bond *bond;
> > > -
> > > -       allowed = ~to_request(signal)->engine->mask;
> > > -
> > > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > > -       if (bond)
> > > -               allowed &= bond->sibling_mask;
> > > -
> > > -       /* Restrict the bonded request to run on only the available engines */
> > > -       exec = READ_ONCE(rq->execution_mask);
> > > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > > -               ;
> > > -
> > > -       /* Prevent the master from being re-run on the bonded engines */
> > > -       to_request(signal)->execution_mask &= ~allowed;
> >
> > I sent a v2 of this patch because it turns out I deleted a bit too
> > much code.  This function in particular, has to stay, unfortunately.
> > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > the work onto a different engine than than the one it's supposed to
> > run in parallel with.  This means we can't dead-code this function or
> > the bond_execution function pointer and related stuff.
>
> Uh that's disappointing, since if I understand your point correctly, the
> sibling engines should all be singletons, not load balancing virtual ones.
> So there really should not be any need to pick the right one at execution
> time.

The media driver itself seems to work fine if I delete all the code.
It's just an IGT testcase that blows up.  I'll do more digging to see
if I can better isolate why.

--Jason

> At least my understanding is that we're only limiting the engine set
> further, so if both signaller and signalled request can only run on
> singletons (which must be distinct, or the bonded parameter validation is
> busted) there's really nothing to do here.
>
> Also this is the locking code that freaks me out about the current bonded
> execlist code ...
>
> Dazzled and confused.
> -Daniel
>
> >
> > --Jason
> >
> >
> > > -}
> > > -
> > >  struct intel_context *
> > >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > >                                unsigned int count)
> > > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > >
> > >         ve->base.schedule = i915_schedule;
> > >         ve->base.submit_request = virtual_submit_request;
> > > -       ve->base.bond_execute = virtual_bond_execute;
> > >
> > >         INIT_LIST_HEAD(virtual_queue(ve));
> > >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> > >         if (IS_ERR(dst))
> > >                 return dst;
> > >
> > > -       if (se->num_bonds) {
> > > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > > -
> > > -               de->bonds = kmemdup(se->bonds,
> > > -                                   sizeof(*se->bonds) * se->num_bonds,
> > > -                                   GFP_KERNEL);
> > > -               if (!de->bonds) {
> > > -                       intel_context_put(dst);
> > > -                       return ERR_PTR(-ENOMEM);
> > > -               }
> > > -
> > > -               de->num_bonds = se->num_bonds;
> > > -       }
> > > -
> > >         return dst;
> > >  }
> > >
> > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > -                                    const struct intel_engine_cs *master,
> > > -                                    const struct intel_engine_cs *sibling)
> > > -{
> > > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > > -       struct ve_bond *bond;
> > > -       int n;
> > > -
> > > -       /* Sanity check the sibling is part of the virtual engine */
> > > -       for (n = 0; n < ve->num_siblings; n++)
> > > -               if (sibling == ve->siblings[n])
> > > -                       break;
> > > -       if (n == ve->num_siblings)
> > > -               return -EINVAL;
> > > -
> > > -       bond = virtual_find_bond(ve, master);
> > > -       if (bond) {
> > > -               bond->sibling_mask |= sibling->mask;
> > > -               return 0;
> > > -       }
> > > -
> > > -       bond = krealloc(ve->bonds,
> > > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > > -                       GFP_KERNEL);
> > > -       if (!bond)
> > > -               return -ENOMEM;
> > > -
> > > -       bond[ve->num_bonds].master = master;
> > > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > > -
> > > -       ve->bonds = bond;
> > > -       ve->num_bonds++;
> > > -
> > > -       return 0;
> > > -}
> > > -
> > >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > >                                    struct drm_printer *m,
> > >                                    void (*show_request)(struct drm_printer *m,
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > index fd61dae820e9e..80cec37a56ba9 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > >  struct intel_context *
> > >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > >
> > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > -                                    const struct intel_engine_cs *master,
> > > -                                    const struct intel_engine_cs *sibling);
> > > -
> > >  bool
> > >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > index 1081cd36a2bd3..f03446d587160 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> > >         return 0;
> > >  }
> > >
> > > -static int bond_virtual_engine(struct intel_gt *gt,
> > > -                              unsigned int class,
> > > -                              struct intel_engine_cs **siblings,
> > > -                              unsigned int nsibling,
> > > -                              unsigned int flags)
> > > -#define BOND_SCHEDULE BIT(0)
> > > -{
> > > -       struct intel_engine_cs *master;
> > > -       struct i915_request *rq[16];
> > > -       enum intel_engine_id id;
> > > -       struct igt_spinner spin;
> > > -       unsigned long n;
> > > -       int err;
> > > -
> > > -       /*
> > > -        * A set of bonded requests is intended to be run concurrently
> > > -        * across a number of engines. We use one request per-engine
> > > -        * and a magic fence to schedule each of the bonded requests
> > > -        * at the same time. A consequence of our current scheduler is that
> > > -        * we only move requests to the HW ready queue when the request
> > > -        * becomes ready, that is when all of its prerequisite fences have
> > > -        * been signaled. As one of those fences is the master submit fence,
> > > -        * there is a delay on all secondary fences as the HW may be
> > > -        * currently busy. Equally, as all the requests are independent,
> > > -        * they may have other fences that delay individual request
> > > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > > -        * immediately submitted to HW at the same time, just that if the
> > > -        * rules are abided by, they are ready at the same time as the
> > > -        * first is submitted. Userspace can embed semaphores in its batch
> > > -        * to ensure parallel execution of its phases as it requires.
> > > -        * Though naturally it gets requested that perhaps the scheduler should
> > > -        * take care of parallel execution, even across preemption events on
> > > -        * different HW. (The proper answer is of course "lalalala".)
> > > -        *
> > > -        * With the submit-fence, we have identified three possible phases
> > > -        * of synchronisation depending on the master fence: queued (not
> > > -        * ready), executing, and signaled. The first two are quite simple
> > > -        * and checked below. However, the signaled master fence handling is
> > > -        * contentious. Currently we do not distinguish between a signaled
> > > -        * fence and an expired fence, as once signaled it does not convey
> > > -        * any information about the previous execution. It may even be freed
> > > -        * and hence checking later it may not exist at all. Ergo we currently
> > > -        * do not apply the bonding constraint for an already signaled fence,
> > > -        * as our expectation is that it should not constrain the secondaries
> > > -        * and is outside of the scope of the bonded request API (i.e. all
> > > -        * userspace requests are meant to be running in parallel). As
> > > -        * it imposes no constraint, and is effectively a no-op, we do not
> > > -        * check below as normal execution flows are checked extensively above.
> > > -        *
> > > -        * XXX Is the degenerate handling of signaled submit fences the
> > > -        * expected behaviour for userpace?
> > > -        */
> > > -
> > > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > > -
> > > -       if (igt_spinner_init(&spin, gt))
> > > -               return -ENOMEM;
> > > -
> > > -       err = 0;
> > > -       rq[0] = ERR_PTR(-ENOMEM);
> > > -       for_each_engine(master, gt, id) {
> > > -               struct i915_sw_fence fence = {};
> > > -               struct intel_context *ce;
> > > -
> > > -               if (master->class == class)
> > > -                       continue;
> > > -
> > > -               ce = intel_context_create(master);
> > > -               if (IS_ERR(ce)) {
> > > -                       err = PTR_ERR(ce);
> > > -                       goto out;
> > > -               }
> > > -
> > > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > > -
> > > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > > -               intel_context_put(ce);
> > > -               if (IS_ERR(rq[0])) {
> > > -                       err = PTR_ERR(rq[0]);
> > > -                       goto out;
> > > -               }
> > > -               i915_request_get(rq[0]);
> > > -
> > > -               if (flags & BOND_SCHEDULE) {
> > > -                       onstack_fence_init(&fence);
> > > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > > -                                                              &fence,
> > > -                                                              GFP_KERNEL);
> > > -               }
> > > -
> > > -               i915_request_add(rq[0]);
> > > -               if (err < 0)
> > > -                       goto out;
> > > -
> > > -               if (!(flags & BOND_SCHEDULE) &&
> > > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > > -                       err = -EIO;
> > > -                       goto out;
> > > -               }
> > > -
> > > -               for (n = 0; n < nsibling; n++) {
> > > -                       struct intel_context *ve;
> > > -
> > > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > > -                       if (IS_ERR(ve)) {
> > > -                               err = PTR_ERR(ve);
> > > -                               onstack_fence_fini(&fence);
> > > -                               goto out;
> > > -                       }
> > > -
> > > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > > -                                                              master,
> > > -                                                              siblings[n]);
> > > -                       if (err) {
> > > -                               intel_context_put(ve);
> > > -                               onstack_fence_fini(&fence);
> > > -                               goto out;
> > > -                       }
> > > -
> > > -                       err = intel_context_pin(ve);
> > > -                       intel_context_put(ve);
> > > -                       if (err) {
> > > -                               onstack_fence_fini(&fence);
> > > -                               goto out;
> > > -                       }
> > > -
> > > -                       rq[n + 1] = i915_request_create(ve);
> > > -                       intel_context_unpin(ve);
> > > -                       if (IS_ERR(rq[n + 1])) {
> > > -                               err = PTR_ERR(rq[n + 1]);
> > > -                               onstack_fence_fini(&fence);
> > > -                               goto out;
> > > -                       }
> > > -                       i915_request_get(rq[n + 1]);
> > > -
> > > -                       err = i915_request_await_execution(rq[n + 1],
> > > -                                                          &rq[0]->fence,
> > > -                                                          ve->engine->bond_execute);
> > > -                       i915_request_add(rq[n + 1]);
> > > -                       if (err < 0) {
> > > -                               onstack_fence_fini(&fence);
> > > -                               goto out;
> > > -                       }
> > > -               }
> > > -               onstack_fence_fini(&fence);
> > > -               intel_engine_flush_submission(master);
> > > -               igt_spinner_end(&spin);
> > > -
> > > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > > -                       pr_err("Master request did not execute (on %s)!\n",
> > > -                              rq[0]->engine->name);
> > > -                       err = -EIO;
> > > -                       goto out;
> > > -               }
> > > -
> > > -               for (n = 0; n < nsibling; n++) {
> > > -                       if (i915_request_wait(rq[n + 1], 0,
> > > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > > -                               err = -EIO;
> > > -                               goto out;
> > > -                       }
> > > -
> > > -                       if (rq[n + 1]->engine != siblings[n]) {
> > > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > > -                                      siblings[n]->name,
> > > -                                      rq[n + 1]->engine->name,
> > > -                                      rq[0]->engine->name);
> > > -                               err = -EINVAL;
> > > -                               goto out;
> > > -                       }
> > > -               }
> > > -
> > > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > > -                       i915_request_put(rq[n]);
> > > -               rq[0] = ERR_PTR(-ENOMEM);
> > > -       }
> > > -
> > > -out:
> > > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > > -               i915_request_put(rq[n]);
> > > -       if (igt_flush_test(gt->i915))
> > > -               err = -EIO;
> > > -
> > > -       igt_spinner_fini(&spin);
> > > -       return err;
> > > -}
> > > -
> > > -static int live_virtual_bond(void *arg)
> > > -{
> > > -       static const struct phase {
> > > -               const char *name;
> > > -               unsigned int flags;
> > > -       } phases[] = {
> > > -               { "", 0 },
> > > -               { "schedule", BOND_SCHEDULE },
> > > -               { },
> > > -       };
> > > -       struct intel_gt *gt = arg;
> > > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > > -       unsigned int class;
> > > -       int err;
> > > -
> > > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > > -               return 0;
> > > -
> > > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > > -               const struct phase *p;
> > > -               int nsibling;
> > > -
> > > -               nsibling = select_siblings(gt, class, siblings);
> > > -               if (nsibling < 2)
> > > -                       continue;
> > > -
> > > -               for (p = phases; p->name; p++) {
> > > -                       err = bond_virtual_engine(gt,
> > > -                                                 class, siblings, nsibling,
> > > -                                                 p->flags);
> > > -                       if (err) {
> > > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > > -                                      __func__, p->name, class, nsibling, err);
> > > -                               return err;
> > > -                       }
> > > -               }
> > > -       }
> > > -
> > > -       return 0;
> > > -}
> > > -
> > >  static int reset_virtual_engine(struct intel_gt *gt,
> > >                                 struct intel_engine_cs **siblings,
> > >                                 unsigned int nsibling)
> > > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> > >                 SUBTEST(live_virtual_mask),
> > >                 SUBTEST(live_virtual_preserved),
> > >                 SUBTEST(live_virtual_slice),
> > > -               SUBTEST(live_virtual_bond),
> > >                 SUBTEST(live_virtual_reset),
> > >         };
> > >
> > > --
> > > 2.31.1
> > >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Matthew Brost April 28, 2021, 5:18 p.m. UTC | #5

On Wed, Apr 28, 2021 at 12:18:29PM -0500, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > >
> > > > This adds a bunch of complexity which the media driver has never
> > > > actually used.  The media driver does technically bond a balanced engine
> > > > to another engine but the balanced engine only has one engine in the
> > > > sibling set.  This doesn't actually result in a virtual engine.
> > > >
> > > > Unless some userspace badly wants it, there's no good reason to support
> > > > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > > > leave the validation code in place in case we ever decide we want to do
> > > > something interesting with the bonding information.
> > > >
> > > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > > ---
> > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> > > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> > > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> > > >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> > > >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> > > >  6 files changed, 7 insertions(+), 353 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > index e8179918fa306..5f8d0faf783aa 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > >         }
> > > >         virtual = set->engines->engines[idx]->engine;
> > > >
> > > > +       if (intel_engine_is_virtual(virtual)) {
> > > > +               drm_dbg(&i915->drm,
> > > > +                       "Bonding with virtual engines not allowed\n");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > >         err = check_user_mbz(&ext->flags);
> > > >         if (err)
> > > >                 return err;
> > > > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > >                                 n, ci.engine_class, ci.engine_instance);
> > > >                         return -EINVAL;
> > > >                 }
> > > > -
> > > > -               /*
> > > > -                * A non-virtual engine has no siblings to choose between; and
> > > > -                * a submit fence will always be directed to the one engine.
> > > > -                */
> > > > -               if (intel_engine_is_virtual(virtual)) {
> > > > -                       err = intel_virtual_engine_attach_bond(virtual,
> > > > -                                                              master,
> > > > -                                                              bond);
> > > > -                       if (err)
> > > > -                               return err;
> > > > -               }
> > > >         }
> > > >
> > > >         return 0;
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > index d640bba6ad9ab..efb2fa3522a42 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > > >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > > >                         err = i915_request_await_execution(eb.request,
> > > >                                                            in_fence,
> > > > -                                                          eb.engine->bond_execute);
> > > > +                                                          NULL);
> > > >                 else
> > > >                         err = i915_request_await_dma_fence(eb.request,
> > > >                                                            in_fence);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > index 883bafc449024..68cfe5080325c 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> > > >          */
> > > >         void            (*submit_request)(struct i915_request *rq);
> > > >
> > > > -       /*
> > > > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > > > -        * request down to the bonded pairs.
> > > > -        */
> > > > -       void            (*bond_execute)(struct i915_request *rq,
> > > > -                                       struct dma_fence *signal);
> > > > -
> > > >         /*
> > > >          * Call when the priority on a request has changed and it and its
> > > >          * dependencies may need rescheduling. Note the request itself may
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > index de124870af44d..b6e2b59f133b7 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > @@ -181,18 +181,6 @@ struct virtual_engine {
> > > >                 int prio;
> > > >         } nodes[I915_NUM_ENGINES];
> > > >
> > > > -       /*
> > > > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > > > -        * of physical engines any particular request may be submitted to.
> > > > -        * If we receive a submit-fence from a master engine, we will only
> > > > -        * use one of sibling_mask physical engines.
> > > > -        */
> > > > -       struct ve_bond {
> > > > -               const struct intel_engine_cs *master;
> > > > -               intel_engine_mask_t sibling_mask;
> > > > -       } *bonds;
> > > > -       unsigned int num_bonds;
> > > > -
> > > >         /* And finally, which physical engines this virtual engine maps onto. */
> > > >         unsigned int num_siblings;
> > > >         struct intel_engine_cs *siblings[];
> > > > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > > >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> > > >         intel_engine_free_request_pool(&ve->base);
> > > >
> > > > -       kfree(ve->bonds);
> > > >         kfree(ve);
> > > >  }
> > > >
> > > > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> > > >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > > >  }
> > > >
> > > > -static struct ve_bond *
> > > > -virtual_find_bond(struct virtual_engine *ve,
> > > > -                 const struct intel_engine_cs *master)
> > > > -{
> > > > -       int i;
> > > > -
> > > > -       for (i = 0; i < ve->num_bonds; i++) {
> > > > -               if (ve->bonds[i].master == master)
> > > > -                       return &ve->bonds[i];
> > > > -       }
> > > > -
> > > > -       return NULL;
> > > > -}
> > > > -
> > > > -static void
> > > > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > > > -{
> > > > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > > > -       intel_engine_mask_t allowed, exec;
> > > > -       struct ve_bond *bond;
> > > > -
> > > > -       allowed = ~to_request(signal)->engine->mask;
> > > > -
> > > > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > > > -       if (bond)
> > > > -               allowed &= bond->sibling_mask;
> > > > -
> > > > -       /* Restrict the bonded request to run on only the available engines */
> > > > -       exec = READ_ONCE(rq->execution_mask);
> > > > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > > > -               ;
> > > > -
> > > > -       /* Prevent the master from being re-run on the bonded engines */
> > > > -       to_request(signal)->execution_mask &= ~allowed;
> > >
> > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > much code.  This function in particular, has to stay, unfortunately.
> > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > the work onto a different engine than than the one it's supposed to
> > > run in parallel with.  This means we can't dead-code this function or
> > > the bond_execution function pointer and related stuff.
> >
> > Uh that's disappointing, since if I understand your point correctly, the
> > sibling engines should all be singletons, not load balancing virtual ones.
> > So there really should not be any need to pick the right one at execution
> > time.
> 
> The media driver itself seems to work fine if I delete all the code.
> It's just an IGT testcase that blows up.  I'll do more digging to see
> if I can better isolate why.
> 

Jumping on here mid-thread. For what is is worth to make execlists work
with the upcoming parallel submission extension I leveraged some of the
existing bonding code so I wouldn't be too eager to delete this code
until that lands.

Matt

> --Jason
> 
> > At least my understanding is that we're only limiting the engine set
> > further, so if both signaller and signalled request can only run on
> > singletons (which must be distinct, or the bonded parameter validation is
> > busted) there's really nothing to do here.
> >
> > Also this is the locking code that freaks me out about the current bonded
> > execlist code ...
> >
> > Dazzled and confused.
> > -Daniel
> >
> > >
> > > --Jason
> > >
> > >
> > > > -}
> > > > -
> > > >  struct intel_context *
> > > >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > >                                unsigned int count)
> > > > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > >
> > > >         ve->base.schedule = i915_schedule;
> > > >         ve->base.submit_request = virtual_submit_request;
> > > > -       ve->base.bond_execute = virtual_bond_execute;
> > > >
> > > >         INIT_LIST_HEAD(virtual_queue(ve));
> > > >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > > > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> > > >         if (IS_ERR(dst))
> > > >                 return dst;
> > > >
> > > > -       if (se->num_bonds) {
> > > > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > > > -
> > > > -               de->bonds = kmemdup(se->bonds,
> > > > -                                   sizeof(*se->bonds) * se->num_bonds,
> > > > -                                   GFP_KERNEL);
> > > > -               if (!de->bonds) {
> > > > -                       intel_context_put(dst);
> > > > -                       return ERR_PTR(-ENOMEM);
> > > > -               }
> > > > -
> > > > -               de->num_bonds = se->num_bonds;
> > > > -       }
> > > > -
> > > >         return dst;
> > > >  }
> > > >
> > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > -                                    const struct intel_engine_cs *master,
> > > > -                                    const struct intel_engine_cs *sibling)
> > > > -{
> > > > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > > > -       struct ve_bond *bond;
> > > > -       int n;
> > > > -
> > > > -       /* Sanity check the sibling is part of the virtual engine */
> > > > -       for (n = 0; n < ve->num_siblings; n++)
> > > > -               if (sibling == ve->siblings[n])
> > > > -                       break;
> > > > -       if (n == ve->num_siblings)
> > > > -               return -EINVAL;
> > > > -
> > > > -       bond = virtual_find_bond(ve, master);
> > > > -       if (bond) {
> > > > -               bond->sibling_mask |= sibling->mask;
> > > > -               return 0;
> > > > -       }
> > > > -
> > > > -       bond = krealloc(ve->bonds,
> > > > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > > > -                       GFP_KERNEL);
> > > > -       if (!bond)
> > > > -               return -ENOMEM;
> > > > -
> > > > -       bond[ve->num_bonds].master = master;
> > > > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > > > -
> > > > -       ve->bonds = bond;
> > > > -       ve->num_bonds++;
> > > > -
> > > > -       return 0;
> > > > -}
> > > > -
> > > >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > >                                    struct drm_printer *m,
> > > >                                    void (*show_request)(struct drm_printer *m,
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > index fd61dae820e9e..80cec37a56ba9 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > >  struct intel_context *
> > > >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > > >
> > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > -                                    const struct intel_engine_cs *master,
> > > > -                                    const struct intel_engine_cs *sibling);
> > > > -
> > > >  bool
> > > >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > index 1081cd36a2bd3..f03446d587160 100644
> > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> > > >         return 0;
> > > >  }
> > > >
> > > > -static int bond_virtual_engine(struct intel_gt *gt,
> > > > -                              unsigned int class,
> > > > -                              struct intel_engine_cs **siblings,
> > > > -                              unsigned int nsibling,
> > > > -                              unsigned int flags)
> > > > -#define BOND_SCHEDULE BIT(0)
> > > > -{
> > > > -       struct intel_engine_cs *master;
> > > > -       struct i915_request *rq[16];
> > > > -       enum intel_engine_id id;
> > > > -       struct igt_spinner spin;
> > > > -       unsigned long n;
> > > > -       int err;
> > > > -
> > > > -       /*
> > > > -        * A set of bonded requests is intended to be run concurrently
> > > > -        * across a number of engines. We use one request per-engine
> > > > -        * and a magic fence to schedule each of the bonded requests
> > > > -        * at the same time. A consequence of our current scheduler is that
> > > > -        * we only move requests to the HW ready queue when the request
> > > > -        * becomes ready, that is when all of its prerequisite fences have
> > > > -        * been signaled. As one of those fences is the master submit fence,
> > > > -        * there is a delay on all secondary fences as the HW may be
> > > > -        * currently busy. Equally, as all the requests are independent,
> > > > -        * they may have other fences that delay individual request
> > > > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > > > -        * immediately submitted to HW at the same time, just that if the
> > > > -        * rules are abided by, they are ready at the same time as the
> > > > -        * first is submitted. Userspace can embed semaphores in its batch
> > > > -        * to ensure parallel execution of its phases as it requires.
> > > > -        * Though naturally it gets requested that perhaps the scheduler should
> > > > -        * take care of parallel execution, even across preemption events on
> > > > -        * different HW. (The proper answer is of course "lalalala".)
> > > > -        *
> > > > -        * With the submit-fence, we have identified three possible phases
> > > > -        * of synchronisation depending on the master fence: queued (not
> > > > -        * ready), executing, and signaled. The first two are quite simple
> > > > -        * and checked below. However, the signaled master fence handling is
> > > > -        * contentious. Currently we do not distinguish between a signaled
> > > > -        * fence and an expired fence, as once signaled it does not convey
> > > > -        * any information about the previous execution. It may even be freed
> > > > -        * and hence checking later it may not exist at all. Ergo we currently
> > > > -        * do not apply the bonding constraint for an already signaled fence,
> > > > -        * as our expectation is that it should not constrain the secondaries
> > > > -        * and is outside of the scope of the bonded request API (i.e. all
> > > > -        * userspace requests are meant to be running in parallel). As
> > > > -        * it imposes no constraint, and is effectively a no-op, we do not
> > > > -        * check below as normal execution flows are checked extensively above.
> > > > -        *
> > > > -        * XXX Is the degenerate handling of signaled submit fences the
> > > > -        * expected behaviour for userpace?
> > > > -        */
> > > > -
> > > > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > > > -
> > > > -       if (igt_spinner_init(&spin, gt))
> > > > -               return -ENOMEM;
> > > > -
> > > > -       err = 0;
> > > > -       rq[0] = ERR_PTR(-ENOMEM);
> > > > -       for_each_engine(master, gt, id) {
> > > > -               struct i915_sw_fence fence = {};
> > > > -               struct intel_context *ce;
> > > > -
> > > > -               if (master->class == class)
> > > > -                       continue;
> > > > -
> > > > -               ce = intel_context_create(master);
> > > > -               if (IS_ERR(ce)) {
> > > > -                       err = PTR_ERR(ce);
> > > > -                       goto out;
> > > > -               }
> > > > -
> > > > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > > > -
> > > > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > > > -               intel_context_put(ce);
> > > > -               if (IS_ERR(rq[0])) {
> > > > -                       err = PTR_ERR(rq[0]);
> > > > -                       goto out;
> > > > -               }
> > > > -               i915_request_get(rq[0]);
> > > > -
> > > > -               if (flags & BOND_SCHEDULE) {
> > > > -                       onstack_fence_init(&fence);
> > > > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > > > -                                                              &fence,
> > > > -                                                              GFP_KERNEL);
> > > > -               }
> > > > -
> > > > -               i915_request_add(rq[0]);
> > > > -               if (err < 0)
> > > > -                       goto out;
> > > > -
> > > > -               if (!(flags & BOND_SCHEDULE) &&
> > > > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > > > -                       err = -EIO;
> > > > -                       goto out;
> > > > -               }
> > > > -
> > > > -               for (n = 0; n < nsibling; n++) {
> > > > -                       struct intel_context *ve;
> > > > -
> > > > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > -                       if (IS_ERR(ve)) {
> > > > -                               err = PTR_ERR(ve);
> > > > -                               onstack_fence_fini(&fence);
> > > > -                               goto out;
> > > > -                       }
> > > > -
> > > > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > > > -                                                              master,
> > > > -                                                              siblings[n]);
> > > > -                       if (err) {
> > > > -                               intel_context_put(ve);
> > > > -                               onstack_fence_fini(&fence);
> > > > -                               goto out;
> > > > -                       }
> > > > -
> > > > -                       err = intel_context_pin(ve);
> > > > -                       intel_context_put(ve);
> > > > -                       if (err) {
> > > > -                               onstack_fence_fini(&fence);
> > > > -                               goto out;
> > > > -                       }
> > > > -
> > > > -                       rq[n + 1] = i915_request_create(ve);
> > > > -                       intel_context_unpin(ve);
> > > > -                       if (IS_ERR(rq[n + 1])) {
> > > > -                               err = PTR_ERR(rq[n + 1]);
> > > > -                               onstack_fence_fini(&fence);
> > > > -                               goto out;
> > > > -                       }
> > > > -                       i915_request_get(rq[n + 1]);
> > > > -
> > > > -                       err = i915_request_await_execution(rq[n + 1],
> > > > -                                                          &rq[0]->fence,
> > > > -                                                          ve->engine->bond_execute);
> > > > -                       i915_request_add(rq[n + 1]);
> > > > -                       if (err < 0) {
> > > > -                               onstack_fence_fini(&fence);
> > > > -                               goto out;
> > > > -                       }
> > > > -               }
> > > > -               onstack_fence_fini(&fence);
> > > > -               intel_engine_flush_submission(master);
> > > > -               igt_spinner_end(&spin);
> > > > -
> > > > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > > > -                       pr_err("Master request did not execute (on %s)!\n",
> > > > -                              rq[0]->engine->name);
> > > > -                       err = -EIO;
> > > > -                       goto out;
> > > > -               }
> > > > -
> > > > -               for (n = 0; n < nsibling; n++) {
> > > > -                       if (i915_request_wait(rq[n + 1], 0,
> > > > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > > > -                               err = -EIO;
> > > > -                               goto out;
> > > > -                       }
> > > > -
> > > > -                       if (rq[n + 1]->engine != siblings[n]) {
> > > > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > > > -                                      siblings[n]->name,
> > > > -                                      rq[n + 1]->engine->name,
> > > > -                                      rq[0]->engine->name);
> > > > -                               err = -EINVAL;
> > > > -                               goto out;
> > > > -                       }
> > > > -               }
> > > > -
> > > > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > > > -                       i915_request_put(rq[n]);
> > > > -               rq[0] = ERR_PTR(-ENOMEM);
> > > > -       }
> > > > -
> > > > -out:
> > > > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > > > -               i915_request_put(rq[n]);
> > > > -       if (igt_flush_test(gt->i915))
> > > > -               err = -EIO;
> > > > -
> > > > -       igt_spinner_fini(&spin);
> > > > -       return err;
> > > > -}
> > > > -
> > > > -static int live_virtual_bond(void *arg)
> > > > -{
> > > > -       static const struct phase {
> > > > -               const char *name;
> > > > -               unsigned int flags;
> > > > -       } phases[] = {
> > > > -               { "", 0 },
> > > > -               { "schedule", BOND_SCHEDULE },
> > > > -               { },
> > > > -       };
> > > > -       struct intel_gt *gt = arg;
> > > > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > > > -       unsigned int class;
> > > > -       int err;
> > > > -
> > > > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > > > -               return 0;
> > > > -
> > > > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > > > -               const struct phase *p;
> > > > -               int nsibling;
> > > > -
> > > > -               nsibling = select_siblings(gt, class, siblings);
> > > > -               if (nsibling < 2)
> > > > -                       continue;
> > > > -
> > > > -               for (p = phases; p->name; p++) {
> > > > -                       err = bond_virtual_engine(gt,
> > > > -                                                 class, siblings, nsibling,
> > > > -                                                 p->flags);
> > > > -                       if (err) {
> > > > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > > > -                                      __func__, p->name, class, nsibling, err);
> > > > -                               return err;
> > > > -                       }
> > > > -               }
> > > > -       }
> > > > -
> > > > -       return 0;
> > > > -}
> > > > -
> > > >  static int reset_virtual_engine(struct intel_gt *gt,
> > > >                                 struct intel_engine_cs **siblings,
> > > >                                 unsigned int nsibling)
> > > > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> > > >                 SUBTEST(live_virtual_mask),
> > > >                 SUBTEST(live_virtual_preserved),
> > > >                 SUBTEST(live_virtual_slice),
> > > > -               SUBTEST(live_virtual_bond),
> > > >                 SUBTEST(live_virtual_reset),
> > > >         };
> > > >
> > > > --
> > > > 2.31.1
> > > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Jason Ekstrand April 28, 2021, 5:46 p.m. UTC | #6

On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Wed, Apr 28, 2021 at 12:18:29PM -0500, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > > >
> > > > > This adds a bunch of complexity which the media driver has never
> > > > > actually used.  The media driver does technically bond a balanced engine
> > > > > to another engine but the balanced engine only has one engine in the
> > > > > sibling set.  This doesn't actually result in a virtual engine.
> > > > >
> > > > > Unless some userspace badly wants it, there's no good reason to support
> > > > > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > > > > leave the validation code in place in case we ever decide we want to do
> > > > > something interesting with the bonding information.
> > > > >
> > > > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> > > > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> > > > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> > > > >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> > > > >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> > > > >  6 files changed, 7 insertions(+), 353 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > index e8179918fa306..5f8d0faf783aa 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > >         }
> > > > >         virtual = set->engines->engines[idx]->engine;
> > > > >
> > > > > +       if (intel_engine_is_virtual(virtual)) {
> > > > > +               drm_dbg(&i915->drm,
> > > > > +                       "Bonding with virtual engines not allowed\n");
> > > > > +               return -EINVAL;
> > > > > +       }
> > > > > +
> > > > >         err = check_user_mbz(&ext->flags);
> > > > >         if (err)
> > > > >                 return err;
> > > > > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > >                                 n, ci.engine_class, ci.engine_instance);
> > > > >                         return -EINVAL;
> > > > >                 }
> > > > > -
> > > > > -               /*
> > > > > -                * A non-virtual engine has no siblings to choose between; and
> > > > > -                * a submit fence will always be directed to the one engine.
> > > > > -                */
> > > > > -               if (intel_engine_is_virtual(virtual)) {
> > > > > -                       err = intel_virtual_engine_attach_bond(virtual,
> > > > > -                                                              master,
> > > > > -                                                              bond);
> > > > > -                       if (err)
> > > > > -                               return err;
> > > > > -               }
> > > > >         }
> > > > >
> > > > >         return 0;
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > index d640bba6ad9ab..efb2fa3522a42 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > > > >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > > > >                         err = i915_request_await_execution(eb.request,
> > > > >                                                            in_fence,
> > > > > -                                                          eb.engine->bond_execute);
> > > > > +                                                          NULL);
> > > > >                 else
> > > > >                         err = i915_request_await_dma_fence(eb.request,
> > > > >                                                            in_fence);
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > index 883bafc449024..68cfe5080325c 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> > > > >          */
> > > > >         void            (*submit_request)(struct i915_request *rq);
> > > > >
> > > > > -       /*
> > > > > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > > > > -        * request down to the bonded pairs.
> > > > > -        */
> > > > > -       void            (*bond_execute)(struct i915_request *rq,
> > > > > -                                       struct dma_fence *signal);
> > > > > -
> > > > >         /*
> > > > >          * Call when the priority on a request has changed and it and its
> > > > >          * dependencies may need rescheduling. Note the request itself may
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > index de124870af44d..b6e2b59f133b7 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > @@ -181,18 +181,6 @@ struct virtual_engine {
> > > > >                 int prio;
> > > > >         } nodes[I915_NUM_ENGINES];
> > > > >
> > > > > -       /*
> > > > > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > > > > -        * of physical engines any particular request may be submitted to.
> > > > > -        * If we receive a submit-fence from a master engine, we will only
> > > > > -        * use one of sibling_mask physical engines.
> > > > > -        */
> > > > > -       struct ve_bond {
> > > > > -               const struct intel_engine_cs *master;
> > > > > -               intel_engine_mask_t sibling_mask;
> > > > > -       } *bonds;
> > > > > -       unsigned int num_bonds;
> > > > > -
> > > > >         /* And finally, which physical engines this virtual engine maps onto. */
> > > > >         unsigned int num_siblings;
> > > > >         struct intel_engine_cs *siblings[];
> > > > > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > > > >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> > > > >         intel_engine_free_request_pool(&ve->base);
> > > > >
> > > > > -       kfree(ve->bonds);
> > > > >         kfree(ve);
> > > > >  }
> > > > >
> > > > > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> > > > >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > > > >  }
> > > > >
> > > > > -static struct ve_bond *
> > > > > -virtual_find_bond(struct virtual_engine *ve,
> > > > > -                 const struct intel_engine_cs *master)
> > > > > -{
> > > > > -       int i;
> > > > > -
> > > > > -       for (i = 0; i < ve->num_bonds; i++) {
> > > > > -               if (ve->bonds[i].master == master)
> > > > > -                       return &ve->bonds[i];
> > > > > -       }
> > > > > -
> > > > > -       return NULL;
> > > > > -}
> > > > > -
> > > > > -static void
> > > > > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > > > > -{
> > > > > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > > > > -       intel_engine_mask_t allowed, exec;
> > > > > -       struct ve_bond *bond;
> > > > > -
> > > > > -       allowed = ~to_request(signal)->engine->mask;
> > > > > -
> > > > > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > > > > -       if (bond)
> > > > > -               allowed &= bond->sibling_mask;
> > > > > -
> > > > > -       /* Restrict the bonded request to run on only the available engines */
> > > > > -       exec = READ_ONCE(rq->execution_mask);
> > > > > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > > > > -               ;
> > > > > -
> > > > > -       /* Prevent the master from being re-run on the bonded engines */
> > > > > -       to_request(signal)->execution_mask &= ~allowed;
> > > >
> > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > much code.  This function in particular, has to stay, unfortunately.
> > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > the work onto a different engine than than the one it's supposed to
> > > > run in parallel with.  This means we can't dead-code this function or
> > > > the bond_execution function pointer and related stuff.
> > >
> > > Uh that's disappointing, since if I understand your point correctly, the
> > > sibling engines should all be singletons, not load balancing virtual ones.
> > > So there really should not be any need to pick the right one at execution
> > > time.
> >
> > The media driver itself seems to work fine if I delete all the code.
> > It's just an IGT testcase that blows up.  I'll do more digging to see
> > if I can better isolate why.
> >
>
> Jumping on here mid-thread. For what is is worth to make execlists work
> with the upcoming parallel submission extension I leveraged some of the
> existing bonding code so I wouldn't be too eager to delete this code
> until that lands.

Mind being a bit more specific about that?  The motivation for this
patch is that the current bonding handling and uAPI is, well, very odd
and confusing IMO.  It doesn't let you create sets of bonded engines.
Instead you create engines and then bond them together after the fact.
I didn't want to blindly duplicate those oddities with the proto-ctx
stuff unless they were useful.  With parallel submit, I would expect
we want a more explicit API where you specify a set of engine
class/instance pairs to bond together into a single engine similar to
how the current balancing API works.

Of course, that's all focused on the API and not the internals.  But,
again, I'm not sure how we want things to look internally.  What we've
got now doesn't seem great for the GuC submission model but I'm very
much not the expert there.  I don't want to be working at cross
purposes to you and I'm happy to leave bits if you think they're
useful.  But I thought I was clearing things away so that you can put
in what you actually want for GuC/parallel submit.

--Jason

> Matt
>
> > --Jason
> >
> > > At least my understanding is that we're only limiting the engine set
> > > further, so if both signaller and signalled request can only run on
> > > singletons (which must be distinct, or the bonded parameter validation is
> > > busted) there's really nothing to do here.
> > >
> > > Also this is the locking code that freaks me out about the current bonded
> > > execlist code ...
> > >
> > > Dazzled and confused.
> > > -Daniel
> > >
> > > >
> > > > --Jason
> > > >
> > > >
> > > > > -}
> > > > > -
> > > > >  struct intel_context *
> > > > >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > >                                unsigned int count)
> > > > > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > >
> > > > >         ve->base.schedule = i915_schedule;
> > > > >         ve->base.submit_request = virtual_submit_request;
> > > > > -       ve->base.bond_execute = virtual_bond_execute;
> > > > >
> > > > >         INIT_LIST_HEAD(virtual_queue(ve));
> > > > >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > > > > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> > > > >         if (IS_ERR(dst))
> > > > >                 return dst;
> > > > >
> > > > > -       if (se->num_bonds) {
> > > > > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > > > > -
> > > > > -               de->bonds = kmemdup(se->bonds,
> > > > > -                                   sizeof(*se->bonds) * se->num_bonds,
> > > > > -                                   GFP_KERNEL);
> > > > > -               if (!de->bonds) {
> > > > > -                       intel_context_put(dst);
> > > > > -                       return ERR_PTR(-ENOMEM);
> > > > > -               }
> > > > > -
> > > > > -               de->num_bonds = se->num_bonds;
> > > > > -       }
> > > > > -
> > > > >         return dst;
> > > > >  }
> > > > >
> > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > -                                    const struct intel_engine_cs *master,
> > > > > -                                    const struct intel_engine_cs *sibling)
> > > > > -{
> > > > > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > > > > -       struct ve_bond *bond;
> > > > > -       int n;
> > > > > -
> > > > > -       /* Sanity check the sibling is part of the virtual engine */
> > > > > -       for (n = 0; n < ve->num_siblings; n++)
> > > > > -               if (sibling == ve->siblings[n])
> > > > > -                       break;
> > > > > -       if (n == ve->num_siblings)
> > > > > -               return -EINVAL;
> > > > > -
> > > > > -       bond = virtual_find_bond(ve, master);
> > > > > -       if (bond) {
> > > > > -               bond->sibling_mask |= sibling->mask;
> > > > > -               return 0;
> > > > > -       }
> > > > > -
> > > > > -       bond = krealloc(ve->bonds,
> > > > > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > > > > -                       GFP_KERNEL);
> > > > > -       if (!bond)
> > > > > -               return -ENOMEM;
> > > > > -
> > > > > -       bond[ve->num_bonds].master = master;
> > > > > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > > > > -
> > > > > -       ve->bonds = bond;
> > > > > -       ve->num_bonds++;
> > > > > -
> > > > > -       return 0;
> > > > > -}
> > > > > -
> > > > >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > > >                                    struct drm_printer *m,
> > > > >                                    void (*show_request)(struct drm_printer *m,
> > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > index fd61dae820e9e..80cec37a56ba9 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > >  struct intel_context *
> > > > >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > > > >
> > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > -                                    const struct intel_engine_cs *master,
> > > > > -                                    const struct intel_engine_cs *sibling);
> > > > > -
> > > > >  bool
> > > > >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > > >
> > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > index 1081cd36a2bd3..f03446d587160 100644
> > > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > -static int bond_virtual_engine(struct intel_gt *gt,
> > > > > -                              unsigned int class,
> > > > > -                              struct intel_engine_cs **siblings,
> > > > > -                              unsigned int nsibling,
> > > > > -                              unsigned int flags)
> > > > > -#define BOND_SCHEDULE BIT(0)
> > > > > -{
> > > > > -       struct intel_engine_cs *master;
> > > > > -       struct i915_request *rq[16];
> > > > > -       enum intel_engine_id id;
> > > > > -       struct igt_spinner spin;
> > > > > -       unsigned long n;
> > > > > -       int err;
> > > > > -
> > > > > -       /*
> > > > > -        * A set of bonded requests is intended to be run concurrently
> > > > > -        * across a number of engines. We use one request per-engine
> > > > > -        * and a magic fence to schedule each of the bonded requests
> > > > > -        * at the same time. A consequence of our current scheduler is that
> > > > > -        * we only move requests to the HW ready queue when the request
> > > > > -        * becomes ready, that is when all of its prerequisite fences have
> > > > > -        * been signaled. As one of those fences is the master submit fence,
> > > > > -        * there is a delay on all secondary fences as the HW may be
> > > > > -        * currently busy. Equally, as all the requests are independent,
> > > > > -        * they may have other fences that delay individual request
> > > > > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > > > > -        * immediately submitted to HW at the same time, just that if the
> > > > > -        * rules are abided by, they are ready at the same time as the
> > > > > -        * first is submitted. Userspace can embed semaphores in its batch
> > > > > -        * to ensure parallel execution of its phases as it requires.
> > > > > -        * Though naturally it gets requested that perhaps the scheduler should
> > > > > -        * take care of parallel execution, even across preemption events on
> > > > > -        * different HW. (The proper answer is of course "lalalala".)
> > > > > -        *
> > > > > -        * With the submit-fence, we have identified three possible phases
> > > > > -        * of synchronisation depending on the master fence: queued (not
> > > > > -        * ready), executing, and signaled. The first two are quite simple
> > > > > -        * and checked below. However, the signaled master fence handling is
> > > > > -        * contentious. Currently we do not distinguish between a signaled
> > > > > -        * fence and an expired fence, as once signaled it does not convey
> > > > > -        * any information about the previous execution. It may even be freed
> > > > > -        * and hence checking later it may not exist at all. Ergo we currently
> > > > > -        * do not apply the bonding constraint for an already signaled fence,
> > > > > -        * as our expectation is that it should not constrain the secondaries
> > > > > -        * and is outside of the scope of the bonded request API (i.e. all
> > > > > -        * userspace requests are meant to be running in parallel). As
> > > > > -        * it imposes no constraint, and is effectively a no-op, we do not
> > > > > -        * check below as normal execution flows are checked extensively above.
> > > > > -        *
> > > > > -        * XXX Is the degenerate handling of signaled submit fences the
> > > > > -        * expected behaviour for userpace?
> > > > > -        */
> > > > > -
> > > > > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > > > > -
> > > > > -       if (igt_spinner_init(&spin, gt))
> > > > > -               return -ENOMEM;
> > > > > -
> > > > > -       err = 0;
> > > > > -       rq[0] = ERR_PTR(-ENOMEM);
> > > > > -       for_each_engine(master, gt, id) {
> > > > > -               struct i915_sw_fence fence = {};
> > > > > -               struct intel_context *ce;
> > > > > -
> > > > > -               if (master->class == class)
> > > > > -                       continue;
> > > > > -
> > > > > -               ce = intel_context_create(master);
> > > > > -               if (IS_ERR(ce)) {
> > > > > -                       err = PTR_ERR(ce);
> > > > > -                       goto out;
> > > > > -               }
> > > > > -
> > > > > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > > > > -
> > > > > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > > > > -               intel_context_put(ce);
> > > > > -               if (IS_ERR(rq[0])) {
> > > > > -                       err = PTR_ERR(rq[0]);
> > > > > -                       goto out;
> > > > > -               }
> > > > > -               i915_request_get(rq[0]);
> > > > > -
> > > > > -               if (flags & BOND_SCHEDULE) {
> > > > > -                       onstack_fence_init(&fence);
> > > > > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > > > > -                                                              &fence,
> > > > > -                                                              GFP_KERNEL);
> > > > > -               }
> > > > > -
> > > > > -               i915_request_add(rq[0]);
> > > > > -               if (err < 0)
> > > > > -                       goto out;
> > > > > -
> > > > > -               if (!(flags & BOND_SCHEDULE) &&
> > > > > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > > > > -                       err = -EIO;
> > > > > -                       goto out;
> > > > > -               }
> > > > > -
> > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > -                       struct intel_context *ve;
> > > > > -
> > > > > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > > -                       if (IS_ERR(ve)) {
> > > > > -                               err = PTR_ERR(ve);
> > > > > -                               onstack_fence_fini(&fence);
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -
> > > > > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > > > > -                                                              master,
> > > > > -                                                              siblings[n]);
> > > > > -                       if (err) {
> > > > > -                               intel_context_put(ve);
> > > > > -                               onstack_fence_fini(&fence);
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -
> > > > > -                       err = intel_context_pin(ve);
> > > > > -                       intel_context_put(ve);
> > > > > -                       if (err) {
> > > > > -                               onstack_fence_fini(&fence);
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -
> > > > > -                       rq[n + 1] = i915_request_create(ve);
> > > > > -                       intel_context_unpin(ve);
> > > > > -                       if (IS_ERR(rq[n + 1])) {
> > > > > -                               err = PTR_ERR(rq[n + 1]);
> > > > > -                               onstack_fence_fini(&fence);
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -                       i915_request_get(rq[n + 1]);
> > > > > -
> > > > > -                       err = i915_request_await_execution(rq[n + 1],
> > > > > -                                                          &rq[0]->fence,
> > > > > -                                                          ve->engine->bond_execute);
> > > > > -                       i915_request_add(rq[n + 1]);
> > > > > -                       if (err < 0) {
> > > > > -                               onstack_fence_fini(&fence);
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -               }
> > > > > -               onstack_fence_fini(&fence);
> > > > > -               intel_engine_flush_submission(master);
> > > > > -               igt_spinner_end(&spin);
> > > > > -
> > > > > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > > > > -                       pr_err("Master request did not execute (on %s)!\n",
> > > > > -                              rq[0]->engine->name);
> > > > > -                       err = -EIO;
> > > > > -                       goto out;
> > > > > -               }
> > > > > -
> > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > -                       if (i915_request_wait(rq[n + 1], 0,
> > > > > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > > > > -                               err = -EIO;
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -
> > > > > -                       if (rq[n + 1]->engine != siblings[n]) {
> > > > > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > > > > -                                      siblings[n]->name,
> > > > > -                                      rq[n + 1]->engine->name,
> > > > > -                                      rq[0]->engine->name);
> > > > > -                               err = -EINVAL;
> > > > > -                               goto out;
> > > > > -                       }
> > > > > -               }
> > > > > -
> > > > > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > -                       i915_request_put(rq[n]);
> > > > > -               rq[0] = ERR_PTR(-ENOMEM);
> > > > > -       }
> > > > > -
> > > > > -out:
> > > > > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > -               i915_request_put(rq[n]);
> > > > > -       if (igt_flush_test(gt->i915))
> > > > > -               err = -EIO;
> > > > > -
> > > > > -       igt_spinner_fini(&spin);
> > > > > -       return err;
> > > > > -}
> > > > > -
> > > > > -static int live_virtual_bond(void *arg)
> > > > > -{
> > > > > -       static const struct phase {
> > > > > -               const char *name;
> > > > > -               unsigned int flags;
> > > > > -       } phases[] = {
> > > > > -               { "", 0 },
> > > > > -               { "schedule", BOND_SCHEDULE },
> > > > > -               { },
> > > > > -       };
> > > > > -       struct intel_gt *gt = arg;
> > > > > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > > > > -       unsigned int class;
> > > > > -       int err;
> > > > > -
> > > > > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > > > > -               return 0;
> > > > > -
> > > > > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > > > > -               const struct phase *p;
> > > > > -               int nsibling;
> > > > > -
> > > > > -               nsibling = select_siblings(gt, class, siblings);
> > > > > -               if (nsibling < 2)
> > > > > -                       continue;
> > > > > -
> > > > > -               for (p = phases; p->name; p++) {
> > > > > -                       err = bond_virtual_engine(gt,
> > > > > -                                                 class, siblings, nsibling,
> > > > > -                                                 p->flags);
> > > > > -                       if (err) {
> > > > > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > > > > -                                      __func__, p->name, class, nsibling, err);
> > > > > -                               return err;
> > > > > -                       }
> > > > > -               }
> > > > > -       }
> > > > > -
> > > > > -       return 0;
> > > > > -}
> > > > > -
> > > > >  static int reset_virtual_engine(struct intel_gt *gt,
> > > > >                                 struct intel_engine_cs **siblings,
> > > > >                                 unsigned int nsibling)
> > > > > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> > > > >                 SUBTEST(live_virtual_mask),
> > > > >                 SUBTEST(live_virtual_preserved),
> > > > >                 SUBTEST(live_virtual_slice),
> > > > > -               SUBTEST(live_virtual_bond),
> > > > >                 SUBTEST(live_virtual_reset),
> > > > >         };
> > > > >
> > > > > --
> > > > > 2.31.1
> > > > >
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Matthew Brost April 28, 2021, 5:55 p.m. UTC | #7

On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > On Wed, Apr 28, 2021 at 12:18:29PM -0500, Jason Ekstrand wrote:
> > > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > > On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > > > >
> > > > > > This adds a bunch of complexity which the media driver has never
> > > > > > actually used.  The media driver does technically bond a balanced engine
> > > > > > to another engine but the balanced engine only has one engine in the
> > > > > > sibling set.  This doesn't actually result in a virtual engine.
> > > > > >
> > > > > > Unless some userspace badly wants it, there's no good reason to support
> > > > > > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > > > > > leave the validation code in place in case we ever decide we want to do
> > > > > > something interesting with the bonding information.
> > > > > >
> > > > > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> > > > > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> > > > > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> > > > > >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> > > > > >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > > > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> > > > > >  6 files changed, 7 insertions(+), 353 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > index e8179918fa306..5f8d0faf783aa 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > > >         }
> > > > > >         virtual = set->engines->engines[idx]->engine;
> > > > > >
> > > > > > +       if (intel_engine_is_virtual(virtual)) {
> > > > > > +               drm_dbg(&i915->drm,
> > > > > > +                       "Bonding with virtual engines not allowed\n");
> > > > > > +               return -EINVAL;
> > > > > > +       }
> > > > > > +
> > > > > >         err = check_user_mbz(&ext->flags);
> > > > > >         if (err)
> > > > > >                 return err;
> > > > > > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > > >                                 n, ci.engine_class, ci.engine_instance);
> > > > > >                         return -EINVAL;
> > > > > >                 }
> > > > > > -
> > > > > > -               /*
> > > > > > -                * A non-virtual engine has no siblings to choose between; and
> > > > > > -                * a submit fence will always be directed to the one engine.
> > > > > > -                */
> > > > > > -               if (intel_engine_is_virtual(virtual)) {
> > > > > > -                       err = intel_virtual_engine_attach_bond(virtual,
> > > > > > -                                                              master,
> > > > > > -                                                              bond);
> > > > > > -                       if (err)
> > > > > > -                               return err;
> > > > > > -               }
> > > > > >         }
> > > > > >
> > > > > >         return 0;
> > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > index d640bba6ad9ab..efb2fa3522a42 100644
> > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > > > > >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > > > > >                         err = i915_request_await_execution(eb.request,
> > > > > >                                                            in_fence,
> > > > > > -                                                          eb.engine->bond_execute);
> > > > > > +                                                          NULL);
> > > > > >                 else
> > > > > >                         err = i915_request_await_dma_fence(eb.request,
> > > > > >                                                            in_fence);
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > index 883bafc449024..68cfe5080325c 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> > > > > >          */
> > > > > >         void            (*submit_request)(struct i915_request *rq);
> > > > > >
> > > > > > -       /*
> > > > > > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > > > > > -        * request down to the bonded pairs.
> > > > > > -        */
> > > > > > -       void            (*bond_execute)(struct i915_request *rq,
> > > > > > -                                       struct dma_fence *signal);
> > > > > > -
> > > > > >         /*
> > > > > >          * Call when the priority on a request has changed and it and its
> > > > > >          * dependencies may need rescheduling. Note the request itself may
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > index de124870af44d..b6e2b59f133b7 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > @@ -181,18 +181,6 @@ struct virtual_engine {
> > > > > >                 int prio;
> > > > > >         } nodes[I915_NUM_ENGINES];
> > > > > >
> > > > > > -       /*
> > > > > > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > > > > > -        * of physical engines any particular request may be submitted to.
> > > > > > -        * If we receive a submit-fence from a master engine, we will only
> > > > > > -        * use one of sibling_mask physical engines.
> > > > > > -        */
> > > > > > -       struct ve_bond {
> > > > > > -               const struct intel_engine_cs *master;
> > > > > > -               intel_engine_mask_t sibling_mask;
> > > > > > -       } *bonds;
> > > > > > -       unsigned int num_bonds;
> > > > > > -
> > > > > >         /* And finally, which physical engines this virtual engine maps onto. */
> > > > > >         unsigned int num_siblings;
> > > > > >         struct intel_engine_cs *siblings[];
> > > > > > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > > > > >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> > > > > >         intel_engine_free_request_pool(&ve->base);
> > > > > >
> > > > > > -       kfree(ve->bonds);
> > > > > >         kfree(ve);
> > > > > >  }
> > > > > >
> > > > > > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> > > > > >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > > > > >  }
> > > > > >
> > > > > > -static struct ve_bond *
> > > > > > -virtual_find_bond(struct virtual_engine *ve,
> > > > > > -                 const struct intel_engine_cs *master)
> > > > > > -{
> > > > > > -       int i;
> > > > > > -
> > > > > > -       for (i = 0; i < ve->num_bonds; i++) {
> > > > > > -               if (ve->bonds[i].master == master)
> > > > > > -                       return &ve->bonds[i];
> > > > > > -       }
> > > > > > -
> > > > > > -       return NULL;
> > > > > > -}
> > > > > > -
> > > > > > -static void
> > > > > > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > > > > > -{
> > > > > > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > > > > > -       intel_engine_mask_t allowed, exec;
> > > > > > -       struct ve_bond *bond;
> > > > > > -
> > > > > > -       allowed = ~to_request(signal)->engine->mask;
> > > > > > -
> > > > > > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > > > > > -       if (bond)
> > > > > > -               allowed &= bond->sibling_mask;
> > > > > > -
> > > > > > -       /* Restrict the bonded request to run on only the available engines */
> > > > > > -       exec = READ_ONCE(rq->execution_mask);
> > > > > > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > > > > > -               ;
> > > > > > -
> > > > > > -       /* Prevent the master from being re-run on the bonded engines */
> > > > > > -       to_request(signal)->execution_mask &= ~allowed;
> > > > >
> > > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > > much code.  This function in particular, has to stay, unfortunately.
> > > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > > the work onto a different engine than than the one it's supposed to
> > > > > run in parallel with.  This means we can't dead-code this function or
> > > > > the bond_execution function pointer and related stuff.
> > > >
> > > > Uh that's disappointing, since if I understand your point correctly, the
> > > > sibling engines should all be singletons, not load balancing virtual ones.
> > > > So there really should not be any need to pick the right one at execution
> > > > time.
> > >
> > > The media driver itself seems to work fine if I delete all the code.
> > > It's just an IGT testcase that blows up.  I'll do more digging to see
> > > if I can better isolate why.
> > >
> >
> > Jumping on here mid-thread. For what is is worth to make execlists work
> > with the upcoming parallel submission extension I leveraged some of the
> > existing bonding code so I wouldn't be too eager to delete this code
> > until that lands.
> 
> Mind being a bit more specific about that?  The motivation for this
> patch is that the current bonding handling and uAPI is, well, very odd
> and confusing IMO.  It doesn't let you create sets of bonded engines.
> Instead you create engines and then bond them together after the fact.
> I didn't want to blindly duplicate those oddities with the proto-ctx
> stuff unless they were useful.  With parallel submit, I would expect
> we want a more explicit API where you specify a set of engine
> class/instance pairs to bond together into a single engine similar to
> how the current balancing API works.
> 
> Of course, that's all focused on the API and not the internals.  But,
> again, I'm not sure how we want things to look internally.  What we've
> got now doesn't seem great for the GuC submission model but I'm very
> much not the expert there.  I don't want to be working at cross
> purposes to you and I'm happy to leave bits if you think they're
> useful.  But I thought I was clearing things away so that you can put
> in what you actually want for GuC/parallel submit.
> 

Removing all the UAPI things are fine but I wouldn't delete some of the
internal stuff (e.g. intel_virtual_engine_attach_bond, bond
intel_context_ops, the hook for a submit fence, etc...) as that will
still likely be used for the new parallel submission interface with
execlists. As you say the new UAPI wont allow crazy configurations,
only simple ones.

Matt

> --Jason
> 
> > Matt
> >
> > > --Jason
> > >
> > > > At least my understanding is that we're only limiting the engine set
> > > > further, so if both signaller and signalled request can only run on
> > > > singletons (which must be distinct, or the bonded parameter validation is
> > > > busted) there's really nothing to do here.
> > > >
> > > > Also this is the locking code that freaks me out about the current bonded
> > > > execlist code ...
> > > >
> > > > Dazzled and confused.
> > > > -Daniel
> > > >
> > > > >
> > > > > --Jason
> > > > >
> > > > >
> > > > > > -}
> > > > > > -
> > > > > >  struct intel_context *
> > > > > >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > >                                unsigned int count)
> > > > > > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > >
> > > > > >         ve->base.schedule = i915_schedule;
> > > > > >         ve->base.submit_request = virtual_submit_request;
> > > > > > -       ve->base.bond_execute = virtual_bond_execute;
> > > > > >
> > > > > >         INIT_LIST_HEAD(virtual_queue(ve));
> > > > > >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > > > > > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> > > > > >         if (IS_ERR(dst))
> > > > > >                 return dst;
> > > > > >
> > > > > > -       if (se->num_bonds) {
> > > > > > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > > > > > -
> > > > > > -               de->bonds = kmemdup(se->bonds,
> > > > > > -                                   sizeof(*se->bonds) * se->num_bonds,
> > > > > > -                                   GFP_KERNEL);
> > > > > > -               if (!de->bonds) {
> > > > > > -                       intel_context_put(dst);
> > > > > > -                       return ERR_PTR(-ENOMEM);
> > > > > > -               }
> > > > > > -
> > > > > > -               de->num_bonds = se->num_bonds;
> > > > > > -       }
> > > > > > -
> > > > > >         return dst;
> > > > > >  }
> > > > > >
> > > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > > -                                    const struct intel_engine_cs *master,
> > > > > > -                                    const struct intel_engine_cs *sibling)
> > > > > > -{
> > > > > > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > > > > > -       struct ve_bond *bond;
> > > > > > -       int n;
> > > > > > -
> > > > > > -       /* Sanity check the sibling is part of the virtual engine */
> > > > > > -       for (n = 0; n < ve->num_siblings; n++)
> > > > > > -               if (sibling == ve->siblings[n])
> > > > > > -                       break;
> > > > > > -       if (n == ve->num_siblings)
> > > > > > -               return -EINVAL;
> > > > > > -
> > > > > > -       bond = virtual_find_bond(ve, master);
> > > > > > -       if (bond) {
> > > > > > -               bond->sibling_mask |= sibling->mask;
> > > > > > -               return 0;
> > > > > > -       }
> > > > > > -
> > > > > > -       bond = krealloc(ve->bonds,
> > > > > > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > > > > > -                       GFP_KERNEL);
> > > > > > -       if (!bond)
> > > > > > -               return -ENOMEM;
> > > > > > -
> > > > > > -       bond[ve->num_bonds].master = master;
> > > > > > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > > > > > -
> > > > > > -       ve->bonds = bond;
> > > > > > -       ve->num_bonds++;
> > > > > > -
> > > > > > -       return 0;
> > > > > > -}
> > > > > > -
> > > > > >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > > > >                                    struct drm_printer *m,
> > > > > >                                    void (*show_request)(struct drm_printer *m,
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > index fd61dae820e9e..80cec37a56ba9 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > >  struct intel_context *
> > > > > >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > > > > >
> > > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > > -                                    const struct intel_engine_cs *master,
> > > > > > -                                    const struct intel_engine_cs *sibling);
> > > > > > -
> > > > > >  bool
> > > > > >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > index 1081cd36a2bd3..f03446d587160 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> > > > > >         return 0;
> > > > > >  }
> > > > > >
> > > > > > -static int bond_virtual_engine(struct intel_gt *gt,
> > > > > > -                              unsigned int class,
> > > > > > -                              struct intel_engine_cs **siblings,
> > > > > > -                              unsigned int nsibling,
> > > > > > -                              unsigned int flags)
> > > > > > -#define BOND_SCHEDULE BIT(0)
> > > > > > -{
> > > > > > -       struct intel_engine_cs *master;
> > > > > > -       struct i915_request *rq[16];
> > > > > > -       enum intel_engine_id id;
> > > > > > -       struct igt_spinner spin;
> > > > > > -       unsigned long n;
> > > > > > -       int err;
> > > > > > -
> > > > > > -       /*
> > > > > > -        * A set of bonded requests is intended to be run concurrently
> > > > > > -        * across a number of engines. We use one request per-engine
> > > > > > -        * and a magic fence to schedule each of the bonded requests
> > > > > > -        * at the same time. A consequence of our current scheduler is that
> > > > > > -        * we only move requests to the HW ready queue when the request
> > > > > > -        * becomes ready, that is when all of its prerequisite fences have
> > > > > > -        * been signaled. As one of those fences is the master submit fence,
> > > > > > -        * there is a delay on all secondary fences as the HW may be
> > > > > > -        * currently busy. Equally, as all the requests are independent,
> > > > > > -        * they may have other fences that delay individual request
> > > > > > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > > > > > -        * immediately submitted to HW at the same time, just that if the
> > > > > > -        * rules are abided by, they are ready at the same time as the
> > > > > > -        * first is submitted. Userspace can embed semaphores in its batch
> > > > > > -        * to ensure parallel execution of its phases as it requires.
> > > > > > -        * Though naturally it gets requested that perhaps the scheduler should
> > > > > > -        * take care of parallel execution, even across preemption events on
> > > > > > -        * different HW. (The proper answer is of course "lalalala".)
> > > > > > -        *
> > > > > > -        * With the submit-fence, we have identified three possible phases
> > > > > > -        * of synchronisation depending on the master fence: queued (not
> > > > > > -        * ready), executing, and signaled. The first two are quite simple
> > > > > > -        * and checked below. However, the signaled master fence handling is
> > > > > > -        * contentious. Currently we do not distinguish between a signaled
> > > > > > -        * fence and an expired fence, as once signaled it does not convey
> > > > > > -        * any information about the previous execution. It may even be freed
> > > > > > -        * and hence checking later it may not exist at all. Ergo we currently
> > > > > > -        * do not apply the bonding constraint for an already signaled fence,
> > > > > > -        * as our expectation is that it should not constrain the secondaries
> > > > > > -        * and is outside of the scope of the bonded request API (i.e. all
> > > > > > -        * userspace requests are meant to be running in parallel). As
> > > > > > -        * it imposes no constraint, and is effectively a no-op, we do not
> > > > > > -        * check below as normal execution flows are checked extensively above.
> > > > > > -        *
> > > > > > -        * XXX Is the degenerate handling of signaled submit fences the
> > > > > > -        * expected behaviour for userpace?
> > > > > > -        */
> > > > > > -
> > > > > > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > > > > > -
> > > > > > -       if (igt_spinner_init(&spin, gt))
> > > > > > -               return -ENOMEM;
> > > > > > -
> > > > > > -       err = 0;
> > > > > > -       rq[0] = ERR_PTR(-ENOMEM);
> > > > > > -       for_each_engine(master, gt, id) {
> > > > > > -               struct i915_sw_fence fence = {};
> > > > > > -               struct intel_context *ce;
> > > > > > -
> > > > > > -               if (master->class == class)
> > > > > > -                       continue;
> > > > > > -
> > > > > > -               ce = intel_context_create(master);
> > > > > > -               if (IS_ERR(ce)) {
> > > > > > -                       err = PTR_ERR(ce);
> > > > > > -                       goto out;
> > > > > > -               }
> > > > > > -
> > > > > > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > > > > > -
> > > > > > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > > > > > -               intel_context_put(ce);
> > > > > > -               if (IS_ERR(rq[0])) {
> > > > > > -                       err = PTR_ERR(rq[0]);
> > > > > > -                       goto out;
> > > > > > -               }
> > > > > > -               i915_request_get(rq[0]);
> > > > > > -
> > > > > > -               if (flags & BOND_SCHEDULE) {
> > > > > > -                       onstack_fence_init(&fence);
> > > > > > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > > > > > -                                                              &fence,
> > > > > > -                                                              GFP_KERNEL);
> > > > > > -               }
> > > > > > -
> > > > > > -               i915_request_add(rq[0]);
> > > > > > -               if (err < 0)
> > > > > > -                       goto out;
> > > > > > -
> > > > > > -               if (!(flags & BOND_SCHEDULE) &&
> > > > > > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > > > > > -                       err = -EIO;
> > > > > > -                       goto out;
> > > > > > -               }
> > > > > > -
> > > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > > -                       struct intel_context *ve;
> > > > > > -
> > > > > > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > > > -                       if (IS_ERR(ve)) {
> > > > > > -                               err = PTR_ERR(ve);
> > > > > > -                               onstack_fence_fini(&fence);
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -
> > > > > > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > > > > > -                                                              master,
> > > > > > -                                                              siblings[n]);
> > > > > > -                       if (err) {
> > > > > > -                               intel_context_put(ve);
> > > > > > -                               onstack_fence_fini(&fence);
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -
> > > > > > -                       err = intel_context_pin(ve);
> > > > > > -                       intel_context_put(ve);
> > > > > > -                       if (err) {
> > > > > > -                               onstack_fence_fini(&fence);
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -
> > > > > > -                       rq[n + 1] = i915_request_create(ve);
> > > > > > -                       intel_context_unpin(ve);
> > > > > > -                       if (IS_ERR(rq[n + 1])) {
> > > > > > -                               err = PTR_ERR(rq[n + 1]);
> > > > > > -                               onstack_fence_fini(&fence);
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -                       i915_request_get(rq[n + 1]);
> > > > > > -
> > > > > > -                       err = i915_request_await_execution(rq[n + 1],
> > > > > > -                                                          &rq[0]->fence,
> > > > > > -                                                          ve->engine->bond_execute);
> > > > > > -                       i915_request_add(rq[n + 1]);
> > > > > > -                       if (err < 0) {
> > > > > > -                               onstack_fence_fini(&fence);
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -               }
> > > > > > -               onstack_fence_fini(&fence);
> > > > > > -               intel_engine_flush_submission(master);
> > > > > > -               igt_spinner_end(&spin);
> > > > > > -
> > > > > > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > > > > > -                       pr_err("Master request did not execute (on %s)!\n",
> > > > > > -                              rq[0]->engine->name);
> > > > > > -                       err = -EIO;
> > > > > > -                       goto out;
> > > > > > -               }
> > > > > > -
> > > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > > -                       if (i915_request_wait(rq[n + 1], 0,
> > > > > > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > > > > > -                               err = -EIO;
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -
> > > > > > -                       if (rq[n + 1]->engine != siblings[n]) {
> > > > > > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > > > > > -                                      siblings[n]->name,
> > > > > > -                                      rq[n + 1]->engine->name,
> > > > > > -                                      rq[0]->engine->name);
> > > > > > -                               err = -EINVAL;
> > > > > > -                               goto out;
> > > > > > -                       }
> > > > > > -               }
> > > > > > -
> > > > > > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > > -                       i915_request_put(rq[n]);
> > > > > > -               rq[0] = ERR_PTR(-ENOMEM);
> > > > > > -       }
> > > > > > -
> > > > > > -out:
> > > > > > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > > -               i915_request_put(rq[n]);
> > > > > > -       if (igt_flush_test(gt->i915))
> > > > > > -               err = -EIO;
> > > > > > -
> > > > > > -       igt_spinner_fini(&spin);
> > > > > > -       return err;
> > > > > > -}
> > > > > > -
> > > > > > -static int live_virtual_bond(void *arg)
> > > > > > -{
> > > > > > -       static const struct phase {
> > > > > > -               const char *name;
> > > > > > -               unsigned int flags;
> > > > > > -       } phases[] = {
> > > > > > -               { "", 0 },
> > > > > > -               { "schedule", BOND_SCHEDULE },
> > > > > > -               { },
> > > > > > -       };
> > > > > > -       struct intel_gt *gt = arg;
> > > > > > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > > > > > -       unsigned int class;
> > > > > > -       int err;
> > > > > > -
> > > > > > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > > > > > -               return 0;
> > > > > > -
> > > > > > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > > > > > -               const struct phase *p;
> > > > > > -               int nsibling;
> > > > > > -
> > > > > > -               nsibling = select_siblings(gt, class, siblings);
> > > > > > -               if (nsibling < 2)
> > > > > > -                       continue;
> > > > > > -
> > > > > > -               for (p = phases; p->name; p++) {
> > > > > > -                       err = bond_virtual_engine(gt,
> > > > > > -                                                 class, siblings, nsibling,
> > > > > > -                                                 p->flags);
> > > > > > -                       if (err) {
> > > > > > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > > > > > -                                      __func__, p->name, class, nsibling, err);
> > > > > > -                               return err;
> > > > > > -                       }
> > > > > > -               }
> > > > > > -       }
> > > > > > -
> > > > > > -       return 0;
> > > > > > -}
> > > > > > -
> > > > > >  static int reset_virtual_engine(struct intel_gt *gt,
> > > > > >                                 struct intel_engine_cs **siblings,
> > > > > >                                 unsigned int nsibling)
> > > > > > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> > > > > >                 SUBTEST(live_virtual_mask),
> > > > > >                 SUBTEST(live_virtual_preserved),
> > > > > >                 SUBTEST(live_virtual_slice),
> > > > > > -               SUBTEST(live_virtual_bond),
> > > > > >                 SUBTEST(live_virtual_reset),
> > > > > >         };
> > > > > >
> > > > > > --
> > > > > > 2.31.1
> > > > > >
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Jason Ekstrand April 28, 2021, 6:17 p.m. UTC | #8

On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 12:18:29PM -0500, Jason Ekstrand wrote:
> > > > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > > > On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > > > > >
> > > > > > > This adds a bunch of complexity which the media driver has never
> > > > > > > actually used.  The media driver does technically bond a balanced engine
> > > > > > > to another engine but the balanced engine only has one engine in the
> > > > > > > sibling set.  This doesn't actually result in a virtual engine.
> > > > > > >
> > > > > > > Unless some userspace badly wants it, there's no good reason to support
> > > > > > > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > > > > > > leave the validation code in place in case we ever decide we want to do
> > > > > > > something interesting with the bonding information.
> > > > > > >
> > > > > > > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> > > > > > >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> > > > > > >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> > > > > > >  .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> > > > > > >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > > > > >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> > > > > > >  6 files changed, 7 insertions(+), 353 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > > index e8179918fa306..5f8d0faf783aa 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > > > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > > > >         }
> > > > > > >         virtual = set->engines->engines[idx]->engine;
> > > > > > >
> > > > > > > +       if (intel_engine_is_virtual(virtual)) {
> > > > > > > +               drm_dbg(&i915->drm,
> > > > > > > +                       "Bonding with virtual engines not allowed\n");
> > > > > > > +               return -EINVAL;
> > > > > > > +       }
> > > > > > > +
> > > > > > >         err = check_user_mbz(&ext->flags);
> > > > > > >         if (err)
> > > > > > >                 return err;
> > > > > > > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> > > > > > >                                 n, ci.engine_class, ci.engine_instance);
> > > > > > >                         return -EINVAL;
> > > > > > >                 }
> > > > > > > -
> > > > > > > -               /*
> > > > > > > -                * A non-virtual engine has no siblings to choose between; and
> > > > > > > -                * a submit fence will always be directed to the one engine.
> > > > > > > -                */
> > > > > > > -               if (intel_engine_is_virtual(virtual)) {
> > > > > > > -                       err = intel_virtual_engine_attach_bond(virtual,
> > > > > > > -                                                              master,
> > > > > > > -                                                              bond);
> > > > > > > -                       if (err)
> > > > > > > -                               return err;
> > > > > > > -               }
> > > > > > >         }
> > > > > > >
> > > > > > >         return 0;
> > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > index d640bba6ad9ab..efb2fa3522a42 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > > > > > > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > > > > > >                 if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > > > > > >                         err = i915_request_await_execution(eb.request,
> > > > > > >                                                            in_fence,
> > > > > > > -                                                          eb.engine->bond_execute);
> > > > > > > +                                                          NULL);
> > > > > > >                 else
> > > > > > >                         err = i915_request_await_dma_fence(eb.request,
> > > > > > >                                                            in_fence);
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > > index 883bafc449024..68cfe5080325c 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > > > > > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> > > > > > >          */
> > > > > > >         void            (*submit_request)(struct i915_request *rq);
> > > > > > >
> > > > > > > -       /*
> > > > > > > -        * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > > > > > > -        * request down to the bonded pairs.
> > > > > > > -        */
> > > > > > > -       void            (*bond_execute)(struct i915_request *rq,
> > > > > > > -                                       struct dma_fence *signal);
> > > > > > > -
> > > > > > >         /*
> > > > > > >          * Call when the priority on a request has changed and it and its
> > > > > > >          * dependencies may need rescheduling. Note the request itself may
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > > index de124870af44d..b6e2b59f133b7 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > > > > @@ -181,18 +181,6 @@ struct virtual_engine {
> > > > > > >                 int prio;
> > > > > > >         } nodes[I915_NUM_ENGINES];
> > > > > > >
> > > > > > > -       /*
> > > > > > > -        * Keep track of bonded pairs -- restrictions upon on our selection
> > > > > > > -        * of physical engines any particular request may be submitted to.
> > > > > > > -        * If we receive a submit-fence from a master engine, we will only
> > > > > > > -        * use one of sibling_mask physical engines.
> > > > > > > -        */
> > > > > > > -       struct ve_bond {
> > > > > > > -               const struct intel_engine_cs *master;
> > > > > > > -               intel_engine_mask_t sibling_mask;
> > > > > > > -       } *bonds;
> > > > > > > -       unsigned int num_bonds;
> > > > > > > -
> > > > > > >         /* And finally, which physical engines this virtual engine maps onto. */
> > > > > > >         unsigned int num_siblings;
> > > > > > >         struct intel_engine_cs *siblings[];
> > > > > > > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> > > > > > >         intel_breadcrumbs_free(ve->base.breadcrumbs);
> > > > > > >         intel_engine_free_request_pool(&ve->base);
> > > > > > >
> > > > > > > -       kfree(ve->bonds);
> > > > > > >         kfree(ve);
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> > > > > > >         spin_unlock_irqrestore(&ve->base.active.lock, flags);
> > > > > > >  }
> > > > > > >
> > > > > > > -static struct ve_bond *
> > > > > > > -virtual_find_bond(struct virtual_engine *ve,
> > > > > > > -                 const struct intel_engine_cs *master)
> > > > > > > -{
> > > > > > > -       int i;
> > > > > > > -
> > > > > > > -       for (i = 0; i < ve->num_bonds; i++) {
> > > > > > > -               if (ve->bonds[i].master == master)
> > > > > > > -                       return &ve->bonds[i];
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       return NULL;
> > > > > > > -}
> > > > > > > -
> > > > > > > -static void
> > > > > > > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > > > > > > -{
> > > > > > > -       struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > > > > > > -       intel_engine_mask_t allowed, exec;
> > > > > > > -       struct ve_bond *bond;
> > > > > > > -
> > > > > > > -       allowed = ~to_request(signal)->engine->mask;
> > > > > > > -
> > > > > > > -       bond = virtual_find_bond(ve, to_request(signal)->engine);
> > > > > > > -       if (bond)
> > > > > > > -               allowed &= bond->sibling_mask;
> > > > > > > -
> > > > > > > -       /* Restrict the bonded request to run on only the available engines */
> > > > > > > -       exec = READ_ONCE(rq->execution_mask);
> > > > > > > -       while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > > > > > > -               ;
> > > > > > > -
> > > > > > > -       /* Prevent the master from being re-run on the bonded engines */
> > > > > > > -       to_request(signal)->execution_mask &= ~allowed;
> > > > > >
> > > > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > > > much code.  This function in particular, has to stay, unfortunately.
> > > > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > > > the work onto a different engine than than the one it's supposed to
> > > > > > run in parallel with.  This means we can't dead-code this function or
> > > > > > the bond_execution function pointer and related stuff.
> > > > >
> > > > > Uh that's disappointing, since if I understand your point correctly, the
> > > > > sibling engines should all be singletons, not load balancing virtual ones.
> > > > > So there really should not be any need to pick the right one at execution
> > > > > time.
> > > >
> > > > The media driver itself seems to work fine if I delete all the code.
> > > > It's just an IGT testcase that blows up.  I'll do more digging to see
> > > > if I can better isolate why.
> > > >
> > >
> > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > with the upcoming parallel submission extension I leveraged some of the
> > > existing bonding code so I wouldn't be too eager to delete this code
> > > until that lands.
> >
> > Mind being a bit more specific about that?  The motivation for this
> > patch is that the current bonding handling and uAPI is, well, very odd
> > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > Instead you create engines and then bond them together after the fact.
> > I didn't want to blindly duplicate those oddities with the proto-ctx
> > stuff unless they were useful.  With parallel submit, I would expect
> > we want a more explicit API where you specify a set of engine
> > class/instance pairs to bond together into a single engine similar to
> > how the current balancing API works.
> >
> > Of course, that's all focused on the API and not the internals.  But,
> > again, I'm not sure how we want things to look internally.  What we've
> > got now doesn't seem great for the GuC submission model but I'm very
> > much not the expert there.  I don't want to be working at cross
> > purposes to you and I'm happy to leave bits if you think they're
> > useful.  But I thought I was clearing things away so that you can put
> > in what you actually want for GuC/parallel submit.
> >
>
> Removing all the UAPI things are fine but I wouldn't delete some of the
> internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> intel_context_ops, the hook for a submit fence, etc...) as that will
> still likely be used for the new parallel submission interface with
> execlists. As you say the new UAPI wont allow crazy configurations,
> only simple ones.

I'm fine with leaving some of the internal bits for a little while if
it makes pulling the GuC scheduler in easier.  I'm just a bit
skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
thoughts?

--Jason

> Matt
>
> > --Jason
> >
> > > Matt
> > >
> > > > --Jason
> > > >
> > > > > At least my understanding is that we're only limiting the engine set
> > > > > further, so if both signaller and signalled request can only run on
> > > > > singletons (which must be distinct, or the bonded parameter validation is
> > > > > busted) there's really nothing to do here.
> > > > >
> > > > > Also this is the locking code that freaks me out about the current bonded
> > > > > execlist code ...
> > > > >
> > > > > Dazzled and confused.
> > > > > -Daniel
> > > > >
> > > > > >
> > > > > > --Jason
> > > > > >
> > > > > >
> > > > > > > -}
> > > > > > > -
> > > > > > >  struct intel_context *
> > > > > > >  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > > >                                unsigned int count)
> > > > > > > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > > >
> > > > > > >         ve->base.schedule = i915_schedule;
> > > > > > >         ve->base.submit_request = virtual_submit_request;
> > > > > > > -       ve->base.bond_execute = virtual_bond_execute;
> > > > > > >
> > > > > > >         INIT_LIST_HEAD(virtual_queue(ve));
> > > > > > >         ve->base.execlists.queue_priority_hint = INT_MIN;
> > > > > > > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> > > > > > >         if (IS_ERR(dst))
> > > > > > >                 return dst;
> > > > > > >
> > > > > > > -       if (se->num_bonds) {
> > > > > > > -               struct virtual_engine *de = to_virtual_engine(dst->engine);
> > > > > > > -
> > > > > > > -               de->bonds = kmemdup(se->bonds,
> > > > > > > -                                   sizeof(*se->bonds) * se->num_bonds,
> > > > > > > -                                   GFP_KERNEL);
> > > > > > > -               if (!de->bonds) {
> > > > > > > -                       intel_context_put(dst);
> > > > > > > -                       return ERR_PTR(-ENOMEM);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               de->num_bonds = se->num_bonds;
> > > > > > > -       }
> > > > > > > -
> > > > > > >         return dst;
> > > > > > >  }
> > > > > > >
> > > > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > > > -                                    const struct intel_engine_cs *master,
> > > > > > > -                                    const struct intel_engine_cs *sibling)
> > > > > > > -{
> > > > > > > -       struct virtual_engine *ve = to_virtual_engine(engine);
> > > > > > > -       struct ve_bond *bond;
> > > > > > > -       int n;
> > > > > > > -
> > > > > > > -       /* Sanity check the sibling is part of the virtual engine */
> > > > > > > -       for (n = 0; n < ve->num_siblings; n++)
> > > > > > > -               if (sibling == ve->siblings[n])
> > > > > > > -                       break;
> > > > > > > -       if (n == ve->num_siblings)
> > > > > > > -               return -EINVAL;
> > > > > > > -
> > > > > > > -       bond = virtual_find_bond(ve, master);
> > > > > > > -       if (bond) {
> > > > > > > -               bond->sibling_mask |= sibling->mask;
> > > > > > > -               return 0;
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       bond = krealloc(ve->bonds,
> > > > > > > -                       sizeof(*bond) * (ve->num_bonds + 1),
> > > > > > > -                       GFP_KERNEL);
> > > > > > > -       if (!bond)
> > > > > > > -               return -ENOMEM;
> > > > > > > -
> > > > > > > -       bond[ve->num_bonds].master = master;
> > > > > > > -       bond[ve->num_bonds].sibling_mask = sibling->mask;
> > > > > > > -
> > > > > > > -       ve->bonds = bond;
> > > > > > > -       ve->num_bonds++;
> > > > > > > -
> > > > > > > -       return 0;
> > > > > > > -}
> > > > > > > -
> > > > > > >  void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > > > > >                                    struct drm_printer *m,
> > > > > > >                                    void (*show_request)(struct drm_printer *m,
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > > index fd61dae820e9e..80cec37a56ba9 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > > > > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > > > >  struct intel_context *
> > > > > > >  intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > > > > > >
> > > > > > > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > > > > > > -                                    const struct intel_engine_cs *master,
> > > > > > > -                                    const struct intel_engine_cs *sibling);
> > > > > > > -
> > > > > > >  bool
> > > > > > >  intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > > index 1081cd36a2bd3..f03446d587160 100644
> > > > > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > > > > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> > > > > > >         return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > -static int bond_virtual_engine(struct intel_gt *gt,
> > > > > > > -                              unsigned int class,
> > > > > > > -                              struct intel_engine_cs **siblings,
> > > > > > > -                              unsigned int nsibling,
> > > > > > > -                              unsigned int flags)
> > > > > > > -#define BOND_SCHEDULE BIT(0)
> > > > > > > -{
> > > > > > > -       struct intel_engine_cs *master;
> > > > > > > -       struct i915_request *rq[16];
> > > > > > > -       enum intel_engine_id id;
> > > > > > > -       struct igt_spinner spin;
> > > > > > > -       unsigned long n;
> > > > > > > -       int err;
> > > > > > > -
> > > > > > > -       /*
> > > > > > > -        * A set of bonded requests is intended to be run concurrently
> > > > > > > -        * across a number of engines. We use one request per-engine
> > > > > > > -        * and a magic fence to schedule each of the bonded requests
> > > > > > > -        * at the same time. A consequence of our current scheduler is that
> > > > > > > -        * we only move requests to the HW ready queue when the request
> > > > > > > -        * becomes ready, that is when all of its prerequisite fences have
> > > > > > > -        * been signaled. As one of those fences is the master submit fence,
> > > > > > > -        * there is a delay on all secondary fences as the HW may be
> > > > > > > -        * currently busy. Equally, as all the requests are independent,
> > > > > > > -        * they may have other fences that delay individual request
> > > > > > > -        * submission to HW. Ergo, we do not guarantee that all requests are
> > > > > > > -        * immediately submitted to HW at the same time, just that if the
> > > > > > > -        * rules are abided by, they are ready at the same time as the
> > > > > > > -        * first is submitted. Userspace can embed semaphores in its batch
> > > > > > > -        * to ensure parallel execution of its phases as it requires.
> > > > > > > -        * Though naturally it gets requested that perhaps the scheduler should
> > > > > > > -        * take care of parallel execution, even across preemption events on
> > > > > > > -        * different HW. (The proper answer is of course "lalalala".)
> > > > > > > -        *
> > > > > > > -        * With the submit-fence, we have identified three possible phases
> > > > > > > -        * of synchronisation depending on the master fence: queued (not
> > > > > > > -        * ready), executing, and signaled. The first two are quite simple
> > > > > > > -        * and checked below. However, the signaled master fence handling is
> > > > > > > -        * contentious. Currently we do not distinguish between a signaled
> > > > > > > -        * fence and an expired fence, as once signaled it does not convey
> > > > > > > -        * any information about the previous execution. It may even be freed
> > > > > > > -        * and hence checking later it may not exist at all. Ergo we currently
> > > > > > > -        * do not apply the bonding constraint for an already signaled fence,
> > > > > > > -        * as our expectation is that it should not constrain the secondaries
> > > > > > > -        * and is outside of the scope of the bonded request API (i.e. all
> > > > > > > -        * userspace requests are meant to be running in parallel). As
> > > > > > > -        * it imposes no constraint, and is effectively a no-op, we do not
> > > > > > > -        * check below as normal execution flows are checked extensively above.
> > > > > > > -        *
> > > > > > > -        * XXX Is the degenerate handling of signaled submit fences the
> > > > > > > -        * expected behaviour for userpace?
> > > > > > > -        */
> > > > > > > -
> > > > > > > -       GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > > > > > > -
> > > > > > > -       if (igt_spinner_init(&spin, gt))
> > > > > > > -               return -ENOMEM;
> > > > > > > -
> > > > > > > -       err = 0;
> > > > > > > -       rq[0] = ERR_PTR(-ENOMEM);
> > > > > > > -       for_each_engine(master, gt, id) {
> > > > > > > -               struct i915_sw_fence fence = {};
> > > > > > > -               struct intel_context *ce;
> > > > > > > -
> > > > > > > -               if (master->class == class)
> > > > > > > -                       continue;
> > > > > > > -
> > > > > > > -               ce = intel_context_create(master);
> > > > > > > -               if (IS_ERR(ce)) {
> > > > > > > -                       err = PTR_ERR(ce);
> > > > > > > -                       goto out;
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > > > > > > -
> > > > > > > -               rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > > > > > > -               intel_context_put(ce);
> > > > > > > -               if (IS_ERR(rq[0])) {
> > > > > > > -                       err = PTR_ERR(rq[0]);
> > > > > > > -                       goto out;
> > > > > > > -               }
> > > > > > > -               i915_request_get(rq[0]);
> > > > > > > -
> > > > > > > -               if (flags & BOND_SCHEDULE) {
> > > > > > > -                       onstack_fence_init(&fence);
> > > > > > > -                       err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > > > > > > -                                                              &fence,
> > > > > > > -                                                              GFP_KERNEL);
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               i915_request_add(rq[0]);
> > > > > > > -               if (err < 0)
> > > > > > > -                       goto out;
> > > > > > > -
> > > > > > > -               if (!(flags & BOND_SCHEDULE) &&
> > > > > > > -                   !igt_wait_for_spinner(&spin, rq[0])) {
> > > > > > > -                       err = -EIO;
> > > > > > > -                       goto out;
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > > > -                       struct intel_context *ve;
> > > > > > > -
> > > > > > > -                       ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > > > > -                       if (IS_ERR(ve)) {
> > > > > > > -                               err = PTR_ERR(ve);
> > > > > > > -                               onstack_fence_fini(&fence);
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -
> > > > > > > -                       err = intel_virtual_engine_attach_bond(ve->engine,
> > > > > > > -                                                              master,
> > > > > > > -                                                              siblings[n]);
> > > > > > > -                       if (err) {
> > > > > > > -                               intel_context_put(ve);
> > > > > > > -                               onstack_fence_fini(&fence);
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -
> > > > > > > -                       err = intel_context_pin(ve);
> > > > > > > -                       intel_context_put(ve);
> > > > > > > -                       if (err) {
> > > > > > > -                               onstack_fence_fini(&fence);
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -
> > > > > > > -                       rq[n + 1] = i915_request_create(ve);
> > > > > > > -                       intel_context_unpin(ve);
> > > > > > > -                       if (IS_ERR(rq[n + 1])) {
> > > > > > > -                               err = PTR_ERR(rq[n + 1]);
> > > > > > > -                               onstack_fence_fini(&fence);
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -                       i915_request_get(rq[n + 1]);
> > > > > > > -
> > > > > > > -                       err = i915_request_await_execution(rq[n + 1],
> > > > > > > -                                                          &rq[0]->fence,
> > > > > > > -                                                          ve->engine->bond_execute);
> > > > > > > -                       i915_request_add(rq[n + 1]);
> > > > > > > -                       if (err < 0) {
> > > > > > > -                               onstack_fence_fini(&fence);
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -               }
> > > > > > > -               onstack_fence_fini(&fence);
> > > > > > > -               intel_engine_flush_submission(master);
> > > > > > > -               igt_spinner_end(&spin);
> > > > > > > -
> > > > > > > -               if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > > > > > > -                       pr_err("Master request did not execute (on %s)!\n",
> > > > > > > -                              rq[0]->engine->name);
> > > > > > > -                       err = -EIO;
> > > > > > > -                       goto out;
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               for (n = 0; n < nsibling; n++) {
> > > > > > > -                       if (i915_request_wait(rq[n + 1], 0,
> > > > > > > -                                             MAX_SCHEDULE_TIMEOUT) < 0) {
> > > > > > > -                               err = -EIO;
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -
> > > > > > > -                       if (rq[n + 1]->engine != siblings[n]) {
> > > > > > > -                               pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > > > > > > -                                      siblings[n]->name,
> > > > > > > -                                      rq[n + 1]->engine->name,
> > > > > > > -                                      rq[0]->engine->name);
> > > > > > > -                               err = -EINVAL;
> > > > > > > -                               goto out;
> > > > > > > -                       }
> > > > > > > -               }
> > > > > > > -
> > > > > > > -               for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > > > -                       i915_request_put(rq[n]);
> > > > > > > -               rq[0] = ERR_PTR(-ENOMEM);
> > > > > > > -       }
> > > > > > > -
> > > > > > > -out:
> > > > > > > -       for (n = 0; !IS_ERR(rq[n]); n++)
> > > > > > > -               i915_request_put(rq[n]);
> > > > > > > -       if (igt_flush_test(gt->i915))
> > > > > > > -               err = -EIO;
> > > > > > > -
> > > > > > > -       igt_spinner_fini(&spin);
> > > > > > > -       return err;
> > > > > > > -}
> > > > > > > -
> > > > > > > -static int live_virtual_bond(void *arg)
> > > > > > > -{
> > > > > > > -       static const struct phase {
> > > > > > > -               const char *name;
> > > > > > > -               unsigned int flags;
> > > > > > > -       } phases[] = {
> > > > > > > -               { "", 0 },
> > > > > > > -               { "schedule", BOND_SCHEDULE },
> > > > > > > -               { },
> > > > > > > -       };
> > > > > > > -       struct intel_gt *gt = arg;
> > > > > > > -       struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > > > > > > -       unsigned int class;
> > > > > > > -       int err;
> > > > > > > -
> > > > > > > -       if (intel_uc_uses_guc_submission(&gt->uc))
> > > > > > > -               return 0;
> > > > > > > -
> > > > > > > -       for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > > > > > > -               const struct phase *p;
> > > > > > > -               int nsibling;
> > > > > > > -
> > > > > > > -               nsibling = select_siblings(gt, class, siblings);
> > > > > > > -               if (nsibling < 2)
> > > > > > > -                       continue;
> > > > > > > -
> > > > > > > -               for (p = phases; p->name; p++) {
> > > > > > > -                       err = bond_virtual_engine(gt,
> > > > > > > -                                                 class, siblings, nsibling,
> > > > > > > -                                                 p->flags);
> > > > > > > -                       if (err) {
> > > > > > > -                               pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > > > > > > -                                      __func__, p->name, class, nsibling, err);
> > > > > > > -                               return err;
> > > > > > > -                       }
> > > > > > > -               }
> > > > > > > -       }
> > > > > > > -
> > > > > > > -       return 0;
> > > > > > > -}
> > > > > > > -
> > > > > > >  static int reset_virtual_engine(struct intel_gt *gt,
> > > > > > >                                 struct intel_engine_cs **siblings,
> > > > > > >                                 unsigned int nsibling)
> > > > > > > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> > > > > > >                 SUBTEST(live_virtual_mask),
> > > > > > >                 SUBTEST(live_virtual_preserved),
> > > > > > >                 SUBTEST(live_virtual_slice),
> > > > > > > -               SUBTEST(live_virtual_bond),
> > > > > > >                 SUBTEST(live_virtual_reset),
> > > > > > >         };
> > > > > > >
> > > > > > > --
> > > > > > > 2.31.1
> > > > > > >
> > > > > > _______________________________________________
> > > > > > dri-devel mailing list
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Jason Ekstrand April 28, 2021, 6:58 p.m. UTC | #9

On Wed, Apr 28, 2021 at 12:18 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > much code.  This function in particular, has to stay, unfortunately.
> > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > the work onto a different engine than than the one it's supposed to
> > > run in parallel with.  This means we can't dead-code this function or
> > > the bond_execution function pointer and related stuff.
> >
> > Uh that's disappointing, since if I understand your point correctly, the
> > sibling engines should all be singletons, not load balancing virtual ones.
> > So there really should not be any need to pick the right one at execution
> > time.
>
> The media driver itself seems to work fine if I delete all the code.
> It's just an IGT testcase that blows up.  I'll do more digging to see
> if I can better isolate why.

I did more digging and I figured out why this test hangs.  The test
looks at an engine class where there's more than one of that class
(currently only vcs) and creates a context where engine[0] is all of
the engines of that class bonded together and engine[1-N] is each of
those engines individually.  It then tests that you can submit a batch
to one of the individual engines and then submit with
EXEC_FENCE_SUBMIT to the balanced engine and the kernel will sort it
out.  This doesn't seem like a use-case we care about.

If we cared about anything, I would expect it to be submitting to two
balanced contexts and expecting "pick any two" behavior.  But that's
not what the test is testing for.

--Jason

Daniel Vetter April 29, 2021, 12:14 p.m. UTC | #10

On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > > with the upcoming parallel submission extension I leveraged some of the
> > > > existing bonding code so I wouldn't be too eager to delete this code
> > > > until that lands.
> > >
> > > Mind being a bit more specific about that?  The motivation for this
> > > patch is that the current bonding handling and uAPI is, well, very odd
> > > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > > Instead you create engines and then bond them together after the fact.
> > > I didn't want to blindly duplicate those oddities with the proto-ctx
> > > stuff unless they were useful.  With parallel submit, I would expect
> > > we want a more explicit API where you specify a set of engine
> > > class/instance pairs to bond together into a single engine similar to
> > > how the current balancing API works.
> > >
> > > Of course, that's all focused on the API and not the internals.  But,
> > > again, I'm not sure how we want things to look internally.  What we've
> > > got now doesn't seem great for the GuC submission model but I'm very
> > > much not the expert there.  I don't want to be working at cross
> > > purposes to you and I'm happy to leave bits if you think they're
> > > useful.  But I thought I was clearing things away so that you can put
> > > in what you actually want for GuC/parallel submit.
> > >
> >
> > Removing all the UAPI things are fine but I wouldn't delete some of the
> > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > intel_context_ops, the hook for a submit fence, etc...) as that will
> > still likely be used for the new parallel submission interface with
> > execlists. As you say the new UAPI wont allow crazy configurations,
> > only simple ones.
> 
> I'm fine with leaving some of the internal bits for a little while if
> it makes pulling the GuC scheduler in easier.  I'm just a bit
> skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> thoughts?

Yeah I'm also wondering why we need this. Essentially your insight (and
Tony Ye from media team confirmed) is that media umd never uses bonded on
virtual engines.

So the only thing we need is the await_fence submit_fence logic to stall
the subsequent patches just long enough. I think that stays.

All the additional logic with the cmpxchg lockless trickery and all that
isn't needed, because we _never_ have to select an engine for bonded
submission: It's always the single one available.

This would mean that for execlist parallel submit we can apply a
limitation (beyond what GuC supports perhaps) and it's all ok. With that
everything except the submit fence await logic itself can go I think.

Also one for Matt: We decided to ZBB implementing parallel submit on
execlist, it's going to be just for GuC. At least until someone starts
screaming really loudly.

Cheers, Daniel

Daniel Vetter April 29, 2021, 12:16 p.m. UTC | #11

On Wed, Apr 28, 2021 at 01:58:17PM -0500, Jason Ekstrand wrote:
> On Wed, Apr 28, 2021 at 12:18 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > much code.  This function in particular, has to stay, unfortunately.
> > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > the work onto a different engine than than the one it's supposed to
> > > > run in parallel with.  This means we can't dead-code this function or
> > > > the bond_execution function pointer and related stuff.
> > >
> > > Uh that's disappointing, since if I understand your point correctly, the
> > > sibling engines should all be singletons, not load balancing virtual ones.
> > > So there really should not be any need to pick the right one at execution
> > > time.
> >
> > The media driver itself seems to work fine if I delete all the code.
> > It's just an IGT testcase that blows up.  I'll do more digging to see
> > if I can better isolate why.
> 
> I did more digging and I figured out why this test hangs.  The test
> looks at an engine class where there's more than one of that class
> (currently only vcs) and creates a context where engine[0] is all of
> the engines of that class bonded together and engine[1-N] is each of
> those engines individually.  It then tests that you can submit a batch
> to one of the individual engines and then submit with
> EXEC_FENCE_SUBMIT to the balanced engine and the kernel will sort it
> out.  This doesn't seem like a use-case we care about.
> 
> If we cared about anything, I would expect it to be submitting to two
> balanced contexts and expecting "pick any two" behavior.  But that's
> not what the test is testing for.

Yeah ditch it.

Instead make sure that the bonded setparam/ctx validation makes sure that
1) no virtual engines are used
2) no engine used twice
3) anything else stupid you can come up with that we should make sure is
blocked.

Cheers, Daniel

Daniel Vetter April 29, 2021, 12:24 p.m. UTC | #12

On Wed, Apr 28, 2021 at 04:51:19PM +0100, Tvrtko Ursulin wrote:
> 
> On 23/04/2021 23:31, Jason Ekstrand wrote:
> > This adds a bunch of complexity which the media driver has never
> > actually used.  The media driver does technically bond a balanced engine
> > to another engine but the balanced engine only has one engine in the
> > sibling set.  This doesn't actually result in a virtual engine.
> 
> For historical reference, this is not because uapi was over-engineered but
> because certain SKUs never materialized.

Jason said that for SKU with lots of media engines media-driver sets up a
set of ctx in userspace with all the pairings (and I guess then load
balances in userspace or something like that). Tony Ye also seems to have
confirmed that. So I'm not clear on which SKU this is?

Or maybe the real deal is only future platforms, and there we have GuC
scheduler backend.

Not against adding a bit more context to the commit message, but we need
to make sure what we put there is actually correct. Maybe best to ask
Tony/Carl as part of getting an ack from them.
-Daniel

> 
> Regards,
> 
> Tvrtko
> 
> > Unless some userspace badly wants it, there's no good reason to support
> > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > leave the validation code in place in case we ever decide we want to do
> > something interesting with the bonding information.
> > 
> > Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> >   .../drm/i915/gt/intel_execlists_submission.c  | 100 --------
> >   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> >   drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 ------------------
> >   6 files changed, 7 insertions(+), 353 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index e8179918fa306..5f8d0faf783aa 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> >   	}
> >   	virtual = set->engines->engines[idx]->engine;
> > +	if (intel_engine_is_virtual(virtual)) {
> > +		drm_dbg(&i915->drm,
> > +			"Bonding with virtual engines not allowed\n");
> > +		return -EINVAL;
> > +	}
> > +
> >   	err = check_user_mbz(&ext->flags);
> >   	if (err)
> >   		return err;
> > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user *base, void *data)
> >   				n, ci.engine_class, ci.engine_instance);
> >   			return -EINVAL;
> >   		}
> > -
> > -		/*
> > -		 * A non-virtual engine has no siblings to choose between; and
> > -		 * a submit fence will always be directed to the one engine.
> > -		 */
> > -		if (intel_engine_is_virtual(virtual)) {
> > -			err = intel_virtual_engine_attach_bond(virtual,
> > -							       master,
> > -							       bond);
> > -			if (err)
> > -				return err;
> > -		}
> >   	}
> >   	return 0;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index d640bba6ad9ab..efb2fa3522a42 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >   		if (args->flags & I915_EXEC_FENCE_SUBMIT)
> >   			err = i915_request_await_execution(eb.request,
> >   							   in_fence,
> > -							   eb.engine->bond_execute);
> > +							   NULL);
> >   		else
> >   			err = i915_request_await_dma_fence(eb.request,
> >   							   in_fence);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 883bafc449024..68cfe5080325c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> >   	 */
> >   	void		(*submit_request)(struct i915_request *rq);
> > -	/*
> > -	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> > -	 * request down to the bonded pairs.
> > -	 */
> > -	void            (*bond_execute)(struct i915_request *rq,
> > -					struct dma_fence *signal);
> > -
> >   	/*
> >   	 * Call when the priority on a request has changed and it and its
> >   	 * dependencies may need rescheduling. Note the request itself may
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index de124870af44d..b6e2b59f133b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -181,18 +181,6 @@ struct virtual_engine {
> >   		int prio;
> >   	} nodes[I915_NUM_ENGINES];
> > -	/*
> > -	 * Keep track of bonded pairs -- restrictions upon on our selection
> > -	 * of physical engines any particular request may be submitted to.
> > -	 * If we receive a submit-fence from a master engine, we will only
> > -	 * use one of sibling_mask physical engines.
> > -	 */
> > -	struct ve_bond {
> > -		const struct intel_engine_cs *master;
> > -		intel_engine_mask_t sibling_mask;
> > -	} *bonds;
> > -	unsigned int num_bonds;
> > -
> >   	/* And finally, which physical engines this virtual engine maps onto. */
> >   	unsigned int num_siblings;
> >   	struct intel_engine_cs *siblings[];
> > @@ -3307,7 +3295,6 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >   	intel_breadcrumbs_free(ve->base.breadcrumbs);
> >   	intel_engine_free_request_pool(&ve->base);
> > -	kfree(ve->bonds);
> >   	kfree(ve);
> >   }
> > @@ -3560,42 +3547,6 @@ static void virtual_submit_request(struct i915_request *rq)
> >   	spin_unlock_irqrestore(&ve->base.active.lock, flags);
> >   }
> > -static struct ve_bond *
> > -virtual_find_bond(struct virtual_engine *ve,
> > -		  const struct intel_engine_cs *master)
> > -{
> > -	int i;
> > -
> > -	for (i = 0; i < ve->num_bonds; i++) {
> > -		if (ve->bonds[i].master == master)
> > -			return &ve->bonds[i];
> > -	}
> > -
> > -	return NULL;
> > -}
> > -
> > -static void
> > -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> > -{
> > -	struct virtual_engine *ve = to_virtual_engine(rq->engine);
> > -	intel_engine_mask_t allowed, exec;
> > -	struct ve_bond *bond;
> > -
> > -	allowed = ~to_request(signal)->engine->mask;
> > -
> > -	bond = virtual_find_bond(ve, to_request(signal)->engine);
> > -	if (bond)
> > -		allowed &= bond->sibling_mask;
> > -
> > -	/* Restrict the bonded request to run on only the available engines */
> > -	exec = READ_ONCE(rq->execution_mask);
> > -	while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> > -		;
> > -
> > -	/* Prevent the master from being re-run on the bonded engines */
> > -	to_request(signal)->execution_mask &= ~allowed;
> > -}
> > -
> >   struct intel_context *
> >   intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >   			       unsigned int count)
> > @@ -3649,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >   	ve->base.schedule = i915_schedule;
> >   	ve->base.submit_request = virtual_submit_request;
> > -	ve->base.bond_execute = virtual_bond_execute;
> >   	INIT_LIST_HEAD(virtual_queue(ve));
> >   	ve->base.execlists.queue_priority_hint = INT_MIN;
> > @@ -3747,59 +3697,9 @@ intel_execlists_clone_virtual(struct intel_engine_cs *src)
> >   	if (IS_ERR(dst))
> >   		return dst;
> > -	if (se->num_bonds) {
> > -		struct virtual_engine *de = to_virtual_engine(dst->engine);
> > -
> > -		de->bonds = kmemdup(se->bonds,
> > -				    sizeof(*se->bonds) * se->num_bonds,
> > -				    GFP_KERNEL);
> > -		if (!de->bonds) {
> > -			intel_context_put(dst);
> > -			return ERR_PTR(-ENOMEM);
> > -		}
> > -
> > -		de->num_bonds = se->num_bonds;
> > -	}
> > -
> >   	return dst;
> >   }
> > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > -				     const struct intel_engine_cs *master,
> > -				     const struct intel_engine_cs *sibling)
> > -{
> > -	struct virtual_engine *ve = to_virtual_engine(engine);
> > -	struct ve_bond *bond;
> > -	int n;
> > -
> > -	/* Sanity check the sibling is part of the virtual engine */
> > -	for (n = 0; n < ve->num_siblings; n++)
> > -		if (sibling == ve->siblings[n])
> > -			break;
> > -	if (n == ve->num_siblings)
> > -		return -EINVAL;
> > -
> > -	bond = virtual_find_bond(ve, master);
> > -	if (bond) {
> > -		bond->sibling_mask |= sibling->mask;
> > -		return 0;
> > -	}
> > -
> > -	bond = krealloc(ve->bonds,
> > -			sizeof(*bond) * (ve->num_bonds + 1),
> > -			GFP_KERNEL);
> > -	if (!bond)
> > -		return -ENOMEM;
> > -
> > -	bond[ve->num_bonds].master = master;
> > -	bond[ve->num_bonds].sibling_mask = sibling->mask;
> > -
> > -	ve->bonds = bond;
> > -	ve->num_bonds++;
> > -
> > -	return 0;
> > -}
> > -
> >   void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >   				   struct drm_printer *m,
> >   				   void (*show_request)(struct drm_printer *m,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > index fd61dae820e9e..80cec37a56ba9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > @@ -39,10 +39,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> >   struct intel_context *
> >   intel_execlists_clone_virtual(struct intel_engine_cs *src);
> > -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > -				     const struct intel_engine_cs *master,
> > -				     const struct intel_engine_cs *sibling);
> > -
> >   bool
> >   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 1081cd36a2bd3..f03446d587160 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -4311,234 +4311,6 @@ static int live_virtual_preserved(void *arg)
> >   	return 0;
> >   }
> > -static int bond_virtual_engine(struct intel_gt *gt,
> > -			       unsigned int class,
> > -			       struct intel_engine_cs **siblings,
> > -			       unsigned int nsibling,
> > -			       unsigned int flags)
> > -#define BOND_SCHEDULE BIT(0)
> > -{
> > -	struct intel_engine_cs *master;
> > -	struct i915_request *rq[16];
> > -	enum intel_engine_id id;
> > -	struct igt_spinner spin;
> > -	unsigned long n;
> > -	int err;
> > -
> > -	/*
> > -	 * A set of bonded requests is intended to be run concurrently
> > -	 * across a number of engines. We use one request per-engine
> > -	 * and a magic fence to schedule each of the bonded requests
> > -	 * at the same time. A consequence of our current scheduler is that
> > -	 * we only move requests to the HW ready queue when the request
> > -	 * becomes ready, that is when all of its prerequisite fences have
> > -	 * been signaled. As one of those fences is the master submit fence,
> > -	 * there is a delay on all secondary fences as the HW may be
> > -	 * currently busy. Equally, as all the requests are independent,
> > -	 * they may have other fences that delay individual request
> > -	 * submission to HW. Ergo, we do not guarantee that all requests are
> > -	 * immediately submitted to HW at the same time, just that if the
> > -	 * rules are abided by, they are ready at the same time as the
> > -	 * first is submitted. Userspace can embed semaphores in its batch
> > -	 * to ensure parallel execution of its phases as it requires.
> > -	 * Though naturally it gets requested that perhaps the scheduler should
> > -	 * take care of parallel execution, even across preemption events on
> > -	 * different HW. (The proper answer is of course "lalalala".)
> > -	 *
> > -	 * With the submit-fence, we have identified three possible phases
> > -	 * of synchronisation depending on the master fence: queued (not
> > -	 * ready), executing, and signaled. The first two are quite simple
> > -	 * and checked below. However, the signaled master fence handling is
> > -	 * contentious. Currently we do not distinguish between a signaled
> > -	 * fence and an expired fence, as once signaled it does not convey
> > -	 * any information about the previous execution. It may even be freed
> > -	 * and hence checking later it may not exist at all. Ergo we currently
> > -	 * do not apply the bonding constraint for an already signaled fence,
> > -	 * as our expectation is that it should not constrain the secondaries
> > -	 * and is outside of the scope of the bonded request API (i.e. all
> > -	 * userspace requests are meant to be running in parallel). As
> > -	 * it imposes no constraint, and is effectively a no-op, we do not
> > -	 * check below as normal execution flows are checked extensively above.
> > -	 *
> > -	 * XXX Is the degenerate handling of signaled submit fences the
> > -	 * expected behaviour for userpace?
> > -	 */
> > -
> > -	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> > -
> > -	if (igt_spinner_init(&spin, gt))
> > -		return -ENOMEM;
> > -
> > -	err = 0;
> > -	rq[0] = ERR_PTR(-ENOMEM);
> > -	for_each_engine(master, gt, id) {
> > -		struct i915_sw_fence fence = {};
> > -		struct intel_context *ce;
> > -
> > -		if (master->class == class)
> > -			continue;
> > -
> > -		ce = intel_context_create(master);
> > -		if (IS_ERR(ce)) {
> > -			err = PTR_ERR(ce);
> > -			goto out;
> > -		}
> > -
> > -		memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
> > -
> > -		rq[0] = igt_spinner_create_request(&spin, ce, MI_NOOP);
> > -		intel_context_put(ce);
> > -		if (IS_ERR(rq[0])) {
> > -			err = PTR_ERR(rq[0]);
> > -			goto out;
> > -		}
> > -		i915_request_get(rq[0]);
> > -
> > -		if (flags & BOND_SCHEDULE) {
> > -			onstack_fence_init(&fence);
> > -			err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
> > -							       &fence,
> > -							       GFP_KERNEL);
> > -		}
> > -
> > -		i915_request_add(rq[0]);
> > -		if (err < 0)
> > -			goto out;
> > -
> > -		if (!(flags & BOND_SCHEDULE) &&
> > -		    !igt_wait_for_spinner(&spin, rq[0])) {
> > -			err = -EIO;
> > -			goto out;
> > -		}
> > -
> > -		for (n = 0; n < nsibling; n++) {
> > -			struct intel_context *ve;
> > -
> > -			ve = intel_execlists_create_virtual(siblings, nsibling);
> > -			if (IS_ERR(ve)) {
> > -				err = PTR_ERR(ve);
> > -				onstack_fence_fini(&fence);
> > -				goto out;
> > -			}
> > -
> > -			err = intel_virtual_engine_attach_bond(ve->engine,
> > -							       master,
> > -							       siblings[n]);
> > -			if (err) {
> > -				intel_context_put(ve);
> > -				onstack_fence_fini(&fence);
> > -				goto out;
> > -			}
> > -
> > -			err = intel_context_pin(ve);
> > -			intel_context_put(ve);
> > -			if (err) {
> > -				onstack_fence_fini(&fence);
> > -				goto out;
> > -			}
> > -
> > -			rq[n + 1] = i915_request_create(ve);
> > -			intel_context_unpin(ve);
> > -			if (IS_ERR(rq[n + 1])) {
> > -				err = PTR_ERR(rq[n + 1]);
> > -				onstack_fence_fini(&fence);
> > -				goto out;
> > -			}
> > -			i915_request_get(rq[n + 1]);
> > -
> > -			err = i915_request_await_execution(rq[n + 1],
> > -							   &rq[0]->fence,
> > -							   ve->engine->bond_execute);
> > -			i915_request_add(rq[n + 1]);
> > -			if (err < 0) {
> > -				onstack_fence_fini(&fence);
> > -				goto out;
> > -			}
> > -		}
> > -		onstack_fence_fini(&fence);
> > -		intel_engine_flush_submission(master);
> > -		igt_spinner_end(&spin);
> > -
> > -		if (i915_request_wait(rq[0], 0, HZ / 10) < 0) {
> > -			pr_err("Master request did not execute (on %s)!\n",
> > -			       rq[0]->engine->name);
> > -			err = -EIO;
> > -			goto out;
> > -		}
> > -
> > -		for (n = 0; n < nsibling; n++) {
> > -			if (i915_request_wait(rq[n + 1], 0,
> > -					      MAX_SCHEDULE_TIMEOUT) < 0) {
> > -				err = -EIO;
> > -				goto out;
> > -			}
> > -
> > -			if (rq[n + 1]->engine != siblings[n]) {
> > -				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> > -				       siblings[n]->name,
> > -				       rq[n + 1]->engine->name,
> > -				       rq[0]->engine->name);
> > -				err = -EINVAL;
> > -				goto out;
> > -			}
> > -		}
> > -
> > -		for (n = 0; !IS_ERR(rq[n]); n++)
> > -			i915_request_put(rq[n]);
> > -		rq[0] = ERR_PTR(-ENOMEM);
> > -	}
> > -
> > -out:
> > -	for (n = 0; !IS_ERR(rq[n]); n++)
> > -		i915_request_put(rq[n]);
> > -	if (igt_flush_test(gt->i915))
> > -		err = -EIO;
> > -
> > -	igt_spinner_fini(&spin);
> > -	return err;
> > -}
> > -
> > -static int live_virtual_bond(void *arg)
> > -{
> > -	static const struct phase {
> > -		const char *name;
> > -		unsigned int flags;
> > -	} phases[] = {
> > -		{ "", 0 },
> > -		{ "schedule", BOND_SCHEDULE },
> > -		{ },
> > -	};
> > -	struct intel_gt *gt = arg;
> > -	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > -	unsigned int class;
> > -	int err;
> > -
> > -	if (intel_uc_uses_guc_submission(&gt->uc))
> > -		return 0;
> > -
> > -	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > -		const struct phase *p;
> > -		int nsibling;
> > -
> > -		nsibling = select_siblings(gt, class, siblings);
> > -		if (nsibling < 2)
> > -			continue;
> > -
> > -		for (p = phases; p->name; p++) {
> > -			err = bond_virtual_engine(gt,
> > -						  class, siblings, nsibling,
> > -						  p->flags);
> > -			if (err) {
> > -				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> > -				       __func__, p->name, class, nsibling, err);
> > -				return err;
> > -			}
> > -		}
> > -	}
> > -
> > -	return 0;
> > -}
> > -
> >   static int reset_virtual_engine(struct intel_gt *gt,
> >   				struct intel_engine_cs **siblings,
> >   				unsigned int nsibling)
> > @@ -4712,7 +4484,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
> >   		SUBTEST(live_virtual_mask),
> >   		SUBTEST(live_virtual_preserved),
> >   		SUBTEST(live_virtual_slice),
> > -		SUBTEST(live_virtual_bond),
> >   		SUBTEST(live_virtual_reset),
> >   	};
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Tvrtko Ursulin April 29, 2021, 12:54 p.m. UTC | #13

On 29/04/2021 13:24, Daniel Vetter wrote:
> On Wed, Apr 28, 2021 at 04:51:19PM +0100, Tvrtko Ursulin wrote:
>>
>> On 23/04/2021 23:31, Jason Ekstrand wrote:
>>> This adds a bunch of complexity which the media driver has never
>>> actually used.  The media driver does technically bond a balanced engine
>>> to another engine but the balanced engine only has one engine in the
>>> sibling set.  This doesn't actually result in a virtual engine.
>>
>> For historical reference, this is not because uapi was over-engineered but
>> because certain SKUs never materialized.
> 
> Jason said that for SKU with lots of media engines media-driver sets up a
> set of ctx in userspace with all the pairings (and I guess then load
> balances in userspace or something like that). Tony Ye also seems to have
> confirmed that. So I'm not clear on which SKU this is?

Not sure if I should disclose it here. But anyway, platform which is 
currently in upstream and was supposed to be the first to use this uapi 
was supposed to have at least 4 vcs engines initially, or even 8 vcs + 4 
vecs at some point. That was the requirement uapi was designed for. For 
that kind of platform there were supposed to be two virtual engines 
created, with bonding, for instance parent = [vcs0, vcs2], child = 
[vcs1, vcs3]; bonds = [vcs0 - vcs1, vcs2 - vcs3]. With more engines the 
merrier.

Userspace load balancing, from memory, came into the picture only as a 
consequence of balancing between two types of media pipelines which was 
either working around the rcs contention or lack of sfc, or both. Along 
the lines of - one stage of a media pipeline can be done either as GPGPU 
work, or on the media engine, and so userspace was deciding to spawn "a 
bit of these and a bit of those" to utilise all the GPU blocks. Not 
really about frame split virtual engines and bonding, but completely 
different load balancing, between gpgpu and fixed pipeline.

> Or maybe the real deal is only future platforms, and there we have GuC
> scheduler backend.

Yes, because SKUs never materialised.

> Not against adding a bit more context to the commit message, but we need
> to make sure what we put there is actually correct. Maybe best to ask
> Tony/Carl as part of getting an ack from them.

I think there is no need - fact uapi was designed for way more engines 
than we got to have is straight forward enough.

Only unasked for flexibility in the uapi was the fact bonding can 
express any dependency and not only N consecutive engines as media fixed 
function needed at the time. I say "at the time" because in fact the 
"consecutive" engines requirement also got more complicated / broken in 
a following gen (via fusing and logical instance remapping), proving the 
point having the uapi disassociated from the hw limitations of the _day_ 
was a good call.

Regards,

Tvrtko

Jason Ekstrand April 29, 2021, 3:41 p.m. UTC | #14

On Thu, Apr 29, 2021 at 7:54 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 29/04/2021 13:24, Daniel Vetter wrote:
> > On Wed, Apr 28, 2021 at 04:51:19PM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 23/04/2021 23:31, Jason Ekstrand wrote:
> >>> This adds a bunch of complexity which the media driver has never
> >>> actually used.  The media driver does technically bond a balanced engine
> >>> to another engine but the balanced engine only has one engine in the
> >>> sibling set.  This doesn't actually result in a virtual engine.
> >>
> >> For historical reference, this is not because uapi was over-engineered but
> >> because certain SKUs never materialized.
> >
> > Jason said that for SKU with lots of media engines media-driver sets up a
> > set of ctx in userspace with all the pairings (and I guess then load
> > balances in userspace or something like that). Tony Ye also seems to have
> > confirmed that. So I'm not clear on which SKU this is?
>
> Not sure if I should disclose it here. But anyway, platform which is
> currently in upstream and was supposed to be the first to use this uapi
> was supposed to have at least 4 vcs engines initially, or even 8 vcs + 4
> vecs at some point. That was the requirement uapi was designed for. For
> that kind of platform there were supposed to be two virtual engines
> created, with bonding, for instance parent = [vcs0, vcs2], child =
> [vcs1, vcs3]; bonds = [vcs0 - vcs1, vcs2 - vcs3]. With more engines the
> merrier.

I've added the following to the commit message:

    This functionality was originally added to handle cases where we may
    have more than two video engines and media might want to load-balance
    their bonded submits by, for instance, submitting to a balanced vcs0-1
    as the primary and then vcs2-3 as the secondary.  However, no such
    hardware has shipped thus far and, if we ever want to enable such
    use-cases in the future, we'll use the up-and-coming parallel submit API
    which targets GuC submission.

--Jason

> Userspace load balancing, from memory, came into the picture only as a
> consequence of balancing between two types of media pipelines which was
> either working around the rcs contention or lack of sfc, or both. Along
> the lines of - one stage of a media pipeline can be done either as GPGPU
> work, or on the media engine, and so userspace was deciding to spawn "a
> bit of these and a bit of those" to utilise all the GPU blocks. Not
> really about frame split virtual engines and bonding, but completely
> different load balancing, between gpgpu and fixed pipeline.

> > Or maybe the real deal is only future platforms, and there we have GuC
> > scheduler backend.
>
> Yes, because SKUs never materialised.
>
> > Not against adding a bit more context to the commit message, but we need
> > to make sure what we put there is actually correct. Maybe best to ask
> > Tony/Carl as part of getting an ack from them.
>
> I think there is no need - fact uapi was designed for way more engines
> than we got to have is straight forward enough.
>
> Only unasked for flexibility in the uapi was the fact bonding can
> express any dependency and not only N consecutive engines as media fixed
> function needed at the time. I say "at the time" because in fact the
> "consecutive" engines requirement also got more complicated / broken in
> a following gen (via fusing and logical instance remapping), proving the
> point having the uapi disassociated from the hw limitations of the _day_
> was a good call.
>
> Regards,
>
> Tvrtko

Jason Ekstrand April 29, 2021, 4:02 p.m. UTC | #15

On Thu, Apr 29, 2021 at 7:16 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Apr 28, 2021 at 01:58:17PM -0500, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 12:18 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > > much code.  This function in particular, has to stay, unfortunately.
> > > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > > the work onto a different engine than than the one it's supposed to
> > > > > run in parallel with.  This means we can't dead-code this function or
> > > > > the bond_execution function pointer and related stuff.
> > > >
> > > > Uh that's disappointing, since if I understand your point correctly, the
> > > > sibling engines should all be singletons, not load balancing virtual ones.
> > > > So there really should not be any need to pick the right one at execution
> > > > time.
> > >
> > > The media driver itself seems to work fine if I delete all the code.
> > > It's just an IGT testcase that blows up.  I'll do more digging to see
> > > if I can better isolate why.
> >
> > I did more digging and I figured out why this test hangs.  The test
> > looks at an engine class where there's more than one of that class
> > (currently only vcs) and creates a context where engine[0] is all of
> > the engines of that class bonded together and engine[1-N] is each of
> > those engines individually.  It then tests that you can submit a batch
> > to one of the individual engines and then submit with
> > EXEC_FENCE_SUBMIT to the balanced engine and the kernel will sort it
> > out.  This doesn't seem like a use-case we care about.
> >
> > If we cared about anything, I would expect it to be submitting to two
> > balanced contexts and expecting "pick any two" behavior.  But that's
> > not what the test is testing for.
>
> Yeah ditch it.
>
> Instead make sure that the bonded setparam/ctx validation makes sure that
> 1) no virtual engines are used
> 2) no engine used twice
> 3) anything else stupid you can come up with that we should make sure is
> blocked.

I've re-introduced the deletion and I'll add nuking that test to my
IGT series.  I did it as a separate patch as the FENCE_SUBMIT logic
and the bonding are somewhat separate concerns.

As far as validation goes, I don't think we need any more for this
case.  You used FENCE_SUBMIT and didn't properly isolate things such
that the two run on different engines.  Not our problem.

--Jason

Daniel Vetter April 29, 2021, 5:14 p.m. UTC | #16

On Thu, Apr 29, 2021 at 11:02:27AM -0500, Jason Ekstrand wrote:
> On Thu, Apr 29, 2021 at 7:16 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Apr 28, 2021 at 01:58:17PM -0500, Jason Ekstrand wrote:
> > > On Wed, Apr 28, 2021 at 12:18 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> > > >
> > > > On Wed, Apr 28, 2021 at 5:13 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> > > > > > I sent a v2 of this patch because it turns out I deleted a bit too
> > > > > > much code.  This function in particular, has to stay, unfortunately.
> > > > > > When a batch is submitted with a SUBMIT_FENCE, this is used to push
> > > > > > the work onto a different engine than than the one it's supposed to
> > > > > > run in parallel with.  This means we can't dead-code this function or
> > > > > > the bond_execution function pointer and related stuff.
> > > > >
> > > > > Uh that's disappointing, since if I understand your point correctly, the
> > > > > sibling engines should all be singletons, not load balancing virtual ones.
> > > > > So there really should not be any need to pick the right one at execution
> > > > > time.
> > > >
> > > > The media driver itself seems to work fine if I delete all the code.
> > > > It's just an IGT testcase that blows up.  I'll do more digging to see
> > > > if I can better isolate why.
> > >
> > > I did more digging and I figured out why this test hangs.  The test
> > > looks at an engine class where there's more than one of that class
> > > (currently only vcs) and creates a context where engine[0] is all of
> > > the engines of that class bonded together and engine[1-N] is each of
> > > those engines individually.  It then tests that you can submit a batch
> > > to one of the individual engines and then submit with
> > > EXEC_FENCE_SUBMIT to the balanced engine and the kernel will sort it
> > > out.  This doesn't seem like a use-case we care about.
> > >
> > > If we cared about anything, I would expect it to be submitting to two
> > > balanced contexts and expecting "pick any two" behavior.  But that's
> > > not what the test is testing for.
> >
> > Yeah ditch it.
> >
> > Instead make sure that the bonded setparam/ctx validation makes sure that
> > 1) no virtual engines are used
> > 2) no engine used twice
> > 3) anything else stupid you can come up with that we should make sure is
> > blocked.
> 
> I've re-introduced the deletion and I'll add nuking that test to my
> IGT series.  I did it as a separate patch as the FENCE_SUBMIT logic
> and the bonding are somewhat separate concerns.
> 
> As far as validation goes, I don't think we need any more for this
> case.  You used FENCE_SUBMIT and didn't properly isolate things such
> that the two run on different engines.  Not our problem.

Oh I just meant validating the bonded ctx extension thing. Not validating
submit fence, that's rather hopeless since it really allows anything you
can think of, by design.
-Daniel

Matthew Brost April 30, 2021, 4:03 a.m. UTC | #17

On Thu, Apr 29, 2021 at 02:14:19PM +0200, Daniel Vetter wrote:
> On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> > On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > > > with the upcoming parallel submission extension I leveraged some of the
> > > > > existing bonding code so I wouldn't be too eager to delete this code
> > > > > until that lands.
> > > >
> > > > Mind being a bit more specific about that?  The motivation for this
> > > > patch is that the current bonding handling and uAPI is, well, very odd
> > > > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > > > Instead you create engines and then bond them together after the fact.
> > > > I didn't want to blindly duplicate those oddities with the proto-ctx
> > > > stuff unless they were useful.  With parallel submit, I would expect
> > > > we want a more explicit API where you specify a set of engine
> > > > class/instance pairs to bond together into a single engine similar to
> > > > how the current balancing API works.
> > > >
> > > > Of course, that's all focused on the API and not the internals.  But,
> > > > again, I'm not sure how we want things to look internally.  What we've
> > > > got now doesn't seem great for the GuC submission model but I'm very
> > > > much not the expert there.  I don't want to be working at cross
> > > > purposes to you and I'm happy to leave bits if you think they're
> > > > useful.  But I thought I was clearing things away so that you can put
> > > > in what you actually want for GuC/parallel submit.
> > > >
> > >
> > > Removing all the UAPI things are fine but I wouldn't delete some of the
> > > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > > intel_context_ops, the hook for a submit fence, etc...) as that will
> > > still likely be used for the new parallel submission interface with
> > > execlists. As you say the new UAPI wont allow crazy configurations,
> > > only simple ones.
> > 
> > I'm fine with leaving some of the internal bits for a little while if
> > it makes pulling the GuC scheduler in easier.  I'm just a bit
> > skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> > thoughts?
> 
> Yeah I'm also wondering why we need this. Essentially your insight (and
> Tony Ye from media team confirmed) is that media umd never uses bonded on
> virtual engines.
>

Well you should use virtual engines with parallel submission interface 
if are you using it correctly.

e.g. You want a 2 wide parallel submission and there are 4 engine
instances.

You'd create 2 VEs:

A: 0, 2
B: 1, 3
set_parallel

For GuC submission we just configure context and the GuC load balances
it.

For execlists we'd need to create bonds.

Also likely the reason virtual engines wasn't used with the old
interface was we only had 2 instances max per class so no need for
virtual engines. If they used it for my above example if they were using
the interface correctly they would have to use virtual engines too.
 
> So the only thing we need is the await_fence submit_fence logic to stall
> the subsequent patches just long enough. I think that stays.
>

My implementation, for the new parallel submission interface, with
execlists used a bonds + priority boosts to ensure both are present at
the same time. This was used for both non-virtual and virtual engines.
This was never reviewed though and the code died on the list.

> All the additional logic with the cmpxchg lockless trickery and all that
> isn't needed, because we _never_ have to select an engine for bonded
> submission: It's always the single one available.
> 
> This would mean that for execlist parallel submit we can apply a
> limitation (beyond what GuC supports perhaps) and it's all ok. With that
> everything except the submit fence await logic itself can go I think.
> 
> Also one for Matt: We decided to ZBB implementing parallel submit on
> execlist, it's going to be just for GuC. At least until someone starts
> screaming really loudly.

If this is the case, then bonds can be deleted.

Matt

> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Daniel Vetter April 30, 2021, 10:11 a.m. UTC | #18

On Thu, Apr 29, 2021 at 09:03:48PM -0700, Matthew Brost wrote:
> On Thu, Apr 29, 2021 at 02:14:19PM +0200, Daniel Vetter wrote:
> > On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> > > On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > >
> > > > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > > > > with the upcoming parallel submission extension I leveraged some of the
> > > > > > existing bonding code so I wouldn't be too eager to delete this code
> > > > > > until that lands.
> > > > >
> > > > > Mind being a bit more specific about that?  The motivation for this
> > > > > patch is that the current bonding handling and uAPI is, well, very odd
> > > > > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > > > > Instead you create engines and then bond them together after the fact.
> > > > > I didn't want to blindly duplicate those oddities with the proto-ctx
> > > > > stuff unless they were useful.  With parallel submit, I would expect
> > > > > we want a more explicit API where you specify a set of engine
> > > > > class/instance pairs to bond together into a single engine similar to
> > > > > how the current balancing API works.
> > > > >
> > > > > Of course, that's all focused on the API and not the internals.  But,
> > > > > again, I'm not sure how we want things to look internally.  What we've
> > > > > got now doesn't seem great for the GuC submission model but I'm very
> > > > > much not the expert there.  I don't want to be working at cross
> > > > > purposes to you and I'm happy to leave bits if you think they're
> > > > > useful.  But I thought I was clearing things away so that you can put
> > > > > in what you actually want for GuC/parallel submit.
> > > > >
> > > >
> > > > Removing all the UAPI things are fine but I wouldn't delete some of the
> > > > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > > > intel_context_ops, the hook for a submit fence, etc...) as that will
> > > > still likely be used for the new parallel submission interface with
> > > > execlists. As you say the new UAPI wont allow crazy configurations,
> > > > only simple ones.
> > > 
> > > I'm fine with leaving some of the internal bits for a little while if
> > > it makes pulling the GuC scheduler in easier.  I'm just a bit
> > > skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> > > thoughts?
> > 
> > Yeah I'm also wondering why we need this. Essentially your insight (and
> > Tony Ye from media team confirmed) is that media umd never uses bonded on
> > virtual engines.
> >
> 
> Well you should use virtual engines with parallel submission interface 
> if are you using it correctly.
> 
> e.g. You want a 2 wide parallel submission and there are 4 engine
> instances.
> 
> You'd create 2 VEs:
> 
> A: 0, 2
> B: 1, 3
> set_parallel

So tbh I'm not really liking this part. At least my understanding is that
with GuC this is really one overall virtual engine, backed by a multi-lrc.

So it should fill one engine slot, not fill multiple virtual engines and
then be an awkward thing wrapped on top.

I think (but maybe my understanding of GuC and the parallel submit execbuf
interface is wrong) that the parallel engine should occupy a single VE
slot, not require additional VE just for fun (maybe the execlist backend
would require that internally, but that should not leak into the higher
levels, much less the uapi). And you submit your multi-batch execbuf on
that single parallel VE, which then gets passed to GuC as a multi-LRC.
Internally in the backend there's a bit of fan-out to put the right
MI_BB_START into the right rings and all that, but again I think that
should be backend concerns.

Or am I missing something big here?

> For GuC submission we just configure context and the GuC load balances
> it.
> 
> For execlists we'd need to create bonds.
> 
> Also likely the reason virtual engines wasn't used with the old
> interface was we only had 2 instances max per class so no need for
> virtual engines. If they used it for my above example if they were using
> the interface correctly they would have to use virtual engines too.

They do actually use virtual engines, it's just the virtual engine only
contains a single one, and internally i915 folds that into the hw engine
directly. So we can take away the entire implementation complexity.

Also I still think for execlist we shouldn't bother with trying to enable
parallel submit. Or at least only way down if there's no other reasonable
option.

> > So the only thing we need is the await_fence submit_fence logic to stall
> > the subsequent patches just long enough. I think that stays.
> >
> 
> My implementation, for the new parallel submission interface, with
> execlists used a bonds + priority boosts to ensure both are present at
> the same time. This was used for both non-virtual and virtual engines.
> This was never reviewed though and the code died on the list.

:-(

> > All the additional logic with the cmpxchg lockless trickery and all that
> > isn't needed, because we _never_ have to select an engine for bonded
> > submission: It's always the single one available.
> > 
> > This would mean that for execlist parallel submit we can apply a
> > limitation (beyond what GuC supports perhaps) and it's all ok. With that
> > everything except the submit fence await logic itself can go I think.
> > 
> > Also one for Matt: We decided to ZBB implementing parallel submit on
> > execlist, it's going to be just for GuC. At least until someone starts
> > screaming really loudly.
> 
> If this is the case, then bonds can be deleted.

Yeah that's the goal we're aiming for.
-Daniel

Matthew Brost May 1, 2021, 5:17 p.m. UTC | #19

On Fri, Apr 30, 2021 at 12:11:07PM +0200, Daniel Vetter wrote:
> On Thu, Apr 29, 2021 at 09:03:48PM -0700, Matthew Brost wrote:
> > On Thu, Apr 29, 2021 at 02:14:19PM +0200, Daniel Vetter wrote:
> > > On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> > > > On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > >
> > > > > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > > > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > > > > > with the upcoming parallel submission extension I leveraged some of the
> > > > > > > existing bonding code so I wouldn't be too eager to delete this code
> > > > > > > until that lands.
> > > > > >
> > > > > > Mind being a bit more specific about that?  The motivation for this
> > > > > > patch is that the current bonding handling and uAPI is, well, very odd
> > > > > > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > > > > > Instead you create engines and then bond them together after the fact.
> > > > > > I didn't want to blindly duplicate those oddities with the proto-ctx
> > > > > > stuff unless they were useful.  With parallel submit, I would expect
> > > > > > we want a more explicit API where you specify a set of engine
> > > > > > class/instance pairs to bond together into a single engine similar to
> > > > > > how the current balancing API works.
> > > > > >
> > > > > > Of course, that's all focused on the API and not the internals.  But,
> > > > > > again, I'm not sure how we want things to look internally.  What we've
> > > > > > got now doesn't seem great for the GuC submission model but I'm very
> > > > > > much not the expert there.  I don't want to be working at cross
> > > > > > purposes to you and I'm happy to leave bits if you think they're
> > > > > > useful.  But I thought I was clearing things away so that you can put
> > > > > > in what you actually want for GuC/parallel submit.
> > > > > >
> > > > >
> > > > > Removing all the UAPI things are fine but I wouldn't delete some of the
> > > > > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > > > > intel_context_ops, the hook for a submit fence, etc...) as that will
> > > > > still likely be used for the new parallel submission interface with
> > > > > execlists. As you say the new UAPI wont allow crazy configurations,
> > > > > only simple ones.
> > > > 
> > > > I'm fine with leaving some of the internal bits for a little while if
> > > > it makes pulling the GuC scheduler in easier.  I'm just a bit
> > > > skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> > > > thoughts?
> > > 
> > > Yeah I'm also wondering why we need this. Essentially your insight (and
> > > Tony Ye from media team confirmed) is that media umd never uses bonded on
> > > virtual engines.
> > >
> > 
> > Well you should use virtual engines with parallel submission interface 
> > if are you using it correctly.
> > 
> > e.g. You want a 2 wide parallel submission and there are 4 engine
> > instances.
> > 
> > You'd create 2 VEs:
> > 
> > A: 0, 2
> > B: 1, 3
> > set_parallel
> 
> So tbh I'm not really liking this part. At least my understanding is that
> with GuC this is really one overall virtual engine, backed by a multi-lrc.
> 
> So it should fill one engine slot, not fill multiple virtual engines and
> then be an awkward thing wrapped on top.
> 
> I think (but maybe my understanding of GuC and the parallel submit execbuf
> interface is wrong) that the parallel engine should occupy a single VE
> slot, not require additional VE just for fun (maybe the execlist backend
> would require that internally, but that should not leak into the higher
> levels, much less the uapi). And you submit your multi-batch execbuf on
> that single parallel VE, which then gets passed to GuC as a multi-LRC.
> Internally in the backend there's a bit of fan-out to put the right
> MI_BB_START into the right rings and all that, but again I think that
> should be backend concerns.
> 
> Or am I missing something big here?

Unfortunately that is not how the interface works. The user must
configure the engine set with either physical or virtual engines which
determine the valid placements of each BB (LRC, ring, whatever we want
to call it) and call the set parallel extension which validations engine
layout. After that the engines are ready be used with multi-BB
submission in single IOCTL. 

We discussed this internally with the i915 developers + with the media
for like 6 months and this is where we landed after some very
contentious discussions. One of the proposals was pretty similar to your
understanding but got NACK'd as it was too specific to what our HW can
do / what the UMDs need rather than being able to do tons of wild things
our HW / UMDs will never support (sounds familiar, right?). 

What we landed on is still simpler than most of the other proposals - we
almost really went off the deep end but voices of reason thankfully won
out.

> 
> > For GuC submission we just configure context and the GuC load balances
> > it.
> > 
> > For execlists we'd need to create bonds.
> > 
> > Also likely the reason virtual engines wasn't used with the old
> > interface was we only had 2 instances max per class so no need for
> > virtual engines. If they used it for my above example if they were using
> > the interface correctly they would have to use virtual engines too.
> 
> They do actually use virtual engines, it's just the virtual engine only
> contains a single one, and internally i915 folds that into the hw engine
> directly. So we can take away the entire implementation complexity.
> 
> Also I still think for execlist we shouldn't bother with trying to enable
> parallel submit. Or at least only way down if there's no other reasonable
> option.
>

Agree but honestly if we have to it isn't going to be that painful. I
think my patch to enable this was a couple hundred lines.

Matt
 
> > > So the only thing we need is the await_fence submit_fence logic to stall
> > > the subsequent patches just long enough. I think that stays.
> > >
> > 
> > My implementation, for the new parallel submission interface, with
> > execlists used a bonds + priority boosts to ensure both are present at
> > the same time. This was used for both non-virtual and virtual engines.
> > This was never reviewed though and the code died on the list.
> 
> :-(
> 
> > > All the additional logic with the cmpxchg lockless trickery and all that
> > > isn't needed, because we _never_ have to select an engine for bonded
> > > submission: It's always the single one available.
> > > 
> > > This would mean that for execlist parallel submit we can apply a
> > > limitation (beyond what GuC supports perhaps) and it's all ok. With that
> > > everything except the submit fence await logic itself can go I think.
> > > 
> > > Also one for Matt: We decided to ZBB implementing parallel submit on
> > > execlist, it's going to be just for GuC. At least until someone starts
> > > screaming really loudly.
> > 
> > If this is the case, then bonds can be deleted.
> 
> Yeah that's the goal we're aiming for.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Daniel Vetter May 4, 2021, 7:36 a.m. UTC | #20

On Sat, May 01, 2021 at 10:17:46AM -0700, Matthew Brost wrote:
> On Fri, Apr 30, 2021 at 12:11:07PM +0200, Daniel Vetter wrote:
> > On Thu, Apr 29, 2021 at 09:03:48PM -0700, Matthew Brost wrote:
> > > On Thu, Apr 29, 2021 at 02:14:19PM +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> > > > > On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > >
> > > > > > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > > > > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost <matthew.brost@intel.com> wrote:
> > > > > > > > Jumping on here mid-thread. For what is is worth to make execlists work
> > > > > > > > with the upcoming parallel submission extension I leveraged some of the
> > > > > > > > existing bonding code so I wouldn't be too eager to delete this code
> > > > > > > > until that lands.
> > > > > > >
> > > > > > > Mind being a bit more specific about that?  The motivation for this
> > > > > > > patch is that the current bonding handling and uAPI is, well, very odd
> > > > > > > and confusing IMO.  It doesn't let you create sets of bonded engines.
> > > > > > > Instead you create engines and then bond them together after the fact.
> > > > > > > I didn't want to blindly duplicate those oddities with the proto-ctx
> > > > > > > stuff unless they were useful.  With parallel submit, I would expect
> > > > > > > we want a more explicit API where you specify a set of engine
> > > > > > > class/instance pairs to bond together into a single engine similar to
> > > > > > > how the current balancing API works.
> > > > > > >
> > > > > > > Of course, that's all focused on the API and not the internals.  But,
> > > > > > > again, I'm not sure how we want things to look internally.  What we've
> > > > > > > got now doesn't seem great for the GuC submission model but I'm very
> > > > > > > much not the expert there.  I don't want to be working at cross
> > > > > > > purposes to you and I'm happy to leave bits if you think they're
> > > > > > > useful.  But I thought I was clearing things away so that you can put
> > > > > > > in what you actually want for GuC/parallel submit.
> > > > > > >
> > > > > >
> > > > > > Removing all the UAPI things are fine but I wouldn't delete some of the
> > > > > > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > > > > > intel_context_ops, the hook for a submit fence, etc...) as that will
> > > > > > still likely be used for the new parallel submission interface with
> > > > > > execlists. As you say the new UAPI wont allow crazy configurations,
> > > > > > only simple ones.
> > > > > 
> > > > > I'm fine with leaving some of the internal bits for a little while if
> > > > > it makes pulling the GuC scheduler in easier.  I'm just a bit
> > > > > skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> > > > > thoughts?
> > > > 
> > > > Yeah I'm also wondering why we need this. Essentially your insight (and
> > > > Tony Ye from media team confirmed) is that media umd never uses bonded on
> > > > virtual engines.
> > > >
> > > 
> > > Well you should use virtual engines with parallel submission interface 
> > > if are you using it correctly.
> > > 
> > > e.g. You want a 2 wide parallel submission and there are 4 engine
> > > instances.
> > > 
> > > You'd create 2 VEs:
> > > 
> > > A: 0, 2
> > > B: 1, 3
> > > set_parallel
> > 
> > So tbh I'm not really liking this part. At least my understanding is that
> > with GuC this is really one overall virtual engine, backed by a multi-lrc.
> > 
> > So it should fill one engine slot, not fill multiple virtual engines and
> > then be an awkward thing wrapped on top.
> > 
> > I think (but maybe my understanding of GuC and the parallel submit execbuf
> > interface is wrong) that the parallel engine should occupy a single VE
> > slot, not require additional VE just for fun (maybe the execlist backend
> > would require that internally, but that should not leak into the higher
> > levels, much less the uapi). And you submit your multi-batch execbuf on
> > that single parallel VE, which then gets passed to GuC as a multi-LRC.
> > Internally in the backend there's a bit of fan-out to put the right
> > MI_BB_START into the right rings and all that, but again I think that
> > should be backend concerns.
> > 
> > Or am I missing something big here?
> 
> Unfortunately that is not how the interface works. The user must
> configure the engine set with either physical or virtual engines which
> determine the valid placements of each BB (LRC, ring, whatever we want
> to call it) and call the set parallel extension which validations engine
> layout. After that the engines are ready be used with multi-BB
> submission in single IOCTL. 
> 
> We discussed this internally with the i915 developers + with the media
> for like 6 months and this is where we landed after some very
> contentious discussions. One of the proposals was pretty similar to your
> understanding but got NACK'd as it was too specific to what our HW can
> do / what the UMDs need rather than being able to do tons of wild things
> our HW / UMDs will never support (sounds familiar, right?). 
> 
> What we landed on is still simpler than most of the other proposals - we
> almost really went off the deep end but voices of reason thankfully won
> out.

Yeah I know some of the story here. But the thing is, we're ripping out
tons of these design decisions because they're just plain bogus.

And this very much looks like one, and since it's new uapi, it's better to
correct it before we finalize it in upstream for 10+ years. Or we're just
right back to where we are right now, and this hole is too deep for my
taste.

btw these kind of discussions are what the rfc patch with documentation of
our new uapi&plans is meant for.

> > > For GuC submission we just configure context and the GuC load balances
> > > it.
> > > 
> > > For execlists we'd need to create bonds.
> > > 
> > > Also likely the reason virtual engines wasn't used with the old
> > > interface was we only had 2 instances max per class so no need for
> > > virtual engines. If they used it for my above example if they were using
> > > the interface correctly they would have to use virtual engines too.
> > 
> > They do actually use virtual engines, it's just the virtual engine only
> > contains a single one, and internally i915 folds that into the hw engine
> > directly. So we can take away the entire implementation complexity.
> > 
> > Also I still think for execlist we shouldn't bother with trying to enable
> > parallel submit. Or at least only way down if there's no other reasonable
> > option.
> >
> 
> Agree but honestly if we have to it isn't going to be that painful. I
> think my patch to enable this was a couple hundred lines.

Ah that sounds good at least, as a fallback.
-Daniel

> 
> Matt
>  
> > > > So the only thing we need is the await_fence submit_fence logic to stall
> > > > the subsequent patches just long enough. I think that stays.
> > > >
> > > 
> > > My implementation, for the new parallel submission interface, with
> > > execlists used a bonds + priority boosts to ensure both are present at
> > > the same time. This was used for both non-virtual and virtual engines.
> > > This was never reviewed though and the code died on the list.
> > 
> > :-(
> > 
> > > > All the additional logic with the cmpxchg lockless trickery and all that
> > > > isn't needed, because we _never_ have to select an engine for bonded
> > > > submission: It's always the single one available.
> > > > 
> > > > This would mean that for execlist parallel submit we can apply a
> > > > limitation (beyond what GuC supports perhaps) and it's all ok. With that
> > > > everything except the submit fence await logic itself can go I think.
> > > > 
> > > > Also one for Matt: We decided to ZBB implementing parallel submit on
> > > > execlist, it's going to be just for GuC. At least until someone starts
> > > > screaming really loudly.
> > > 
> > > If this is the case, then bonds can be deleted.
> > 
> > Yeah that's the goal we're aiming for.
> > -Daniel
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

[08/21] drm/i915/gem: Disallow bonding of virtual engines

Commit Message

Comments

Patch