From patchwork Fri Aug 5 21:13:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9265749 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7DA7660760 for ; Fri, 5 Aug 2016 21:13:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B92A28455 for ; Fri, 5 Aug 2016 21:13:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6028028488; Fri, 5 Aug 2016 21:13:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D80DD28455 for ; Fri, 5 Aug 2016 21:13:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B6FA36EC50; Fri, 5 Aug 2016 21:13:32 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x242.google.com (mail-wm0-x242.google.com [IPv6:2a00:1450:400c:c09::242]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9C9706EC50 for ; Fri, 5 Aug 2016 21:13:31 +0000 (UTC) Received: by mail-wm0-x242.google.com with SMTP id q128so5382291wma.1 for ; Fri, 05 Aug 2016 14:13:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=iVtHGn89Lfarv6So2C3xqVtBDmd+BwjRCG4fa+WBd4Y=; b=0GVfqbMtY2frN7fdGjmSyRBi4usGvnpVxnnzkXw6m0Frn6qPUBAlGPl+7YlN5JFAXW hayMuf9R1alLxn9Dw3cqJqHkF5afAm9lSmGqJEaDC1Pvg98eQjBIVkuVJxrsxTFY/QSA L9tG6af2OSDAH/hPXqrF0+Gwjwz/uMDtQKR/VN8a+d6qGADaxSuUJD5M3AiDz/H1JeNI G7XuH0q1HlsUSkMEEO97kHbxglPIi0vDD0vKR+KQNx4BZPI+Xz2a447B8Ldr4NlghoDy 9kOQAB+ij/Vmn8nQPr291idvRUhXRouxR81kw/h2IJolc4aqH5lhpKWWRP565OuEvdU6 3O5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=iVtHGn89Lfarv6So2C3xqVtBDmd+BwjRCG4fa+WBd4Y=; b=XsmDKJ7hYY1ateIiBd/BVvcU7K7y4j8E8qoqSrQ8bpr+esvO3T5ztRqDM+ozhKCd7g zdfiVwckdx9U590pvwhL5JNfsyLo4o00p2yQCKuqRS9WXgmdK3Vl+bRoXV6RAZk/SGgG 3m1HB/EUCopfV1CL8PmnmYQDNxpP8xlSpEXnkT2XM4yPE0CPEAoYF/oBoaeoz+2sly13 2R1ZrOgm2oqq7MArq9yVK448HFUIONxef6o24Q8+QIzti8dwDB6agVKM8icBUN5DqAcV sLsjKGnU5a+3CVV0bsAoXEsweuzA9VbGD/7Q66aNSkj/+3V9+dNw3/p+3wf5IUBtA3Hz B8ng== X-Gm-Message-State: AEkoouv17MCQMqiOd655DWFmEQUn7stcheDkD3ieIdtN3CUC/JxP8qqIpBcpYgwh2Cx3eQ== X-Received: by 10.28.126.75 with SMTP id z72mr5083535wmc.74.1470431610222; Fri, 05 Aug 2016 14:13:30 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id p83sm10327864wma.18.2016.08.05.14.13.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 14:13:29 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 5 Aug 2016 22:13:23 +0100 Message-Id: <1470431603-6289-2-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.8.1 In-Reply-To: <1470431603-6289-1-git-send-email-chris@chris-wilson.co.uk> References: <1470431603-6289-1-git-send-email-chris@chris-wilson.co.uk> Cc: Daniel Vetter , "Goel, Akash" Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Do not overwrite the request with zero on reallocation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915: Enable lockless lookup of request tracking via RCU"), we acknowledge that we may race with another thread that could have reallocated the request. In order for the first thread not to blow up, the second thread must not clear the request completed before overwriting it. In the RCU lookup, we allow for the engine/seqno to be replaced but we do not allow for it to be zeroed. The choice we make is to either add extra checking to the RCU lookup, or embrace the inherent races (as intended). It is more complicated as we need to manually clear everything we depend upon being zero initialised, but we benefit from not emiting the memset() to clear the entire frequently allocated structure (that memset turns up in throughput profiles). And at the same time, the lookup remains flexible for future adjustments. v2: Old style LRC requires another variable to be initialize. (The danger inherent in not zeroing everything.) v3: request->batch also needs to be cleared Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...") Signed-off-by: Chris Wilson Cc: "Goel, Akash" Cc: Daniel Vetter Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++- drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index 6a1661643d3d..b7ffde002a62 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine, if (req && i915_gem_request_completed(req)) i915_gem_request_retire(req); - req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL); + /* Beware: Dragons be flying overhead. + * + * We use RCU to look up requests in flight. The lookups may + * race with the request being allocated from the slab freelist. + * That is the request we are writing to here, may be in the process + * of being read by __i915_gem_active_get_request_rcu(). As such, + * we have to be very careful when overwriting the contents. During + * the RCU lookup, we change chase the request->engine pointer, + * read the request->fence.seqno and increment the reference count. + * + * The reference count is incremented atomically. If it is zero, + * the lookup knows the request is unallocated and complete. Otherwise, + * it is either still in use, or has been reallocated and reset + * with fence_init(). This increment is safe for release as we check + * that the request we have a reference to and matches the active + * request. + * + * Before we increment the refcount, we chase the request->engine + * pointer. We must not call kmem_cache_zalloc() or else we set + * that pointer to NULL and cause a crash during the lookup. If + * we see the request is completed (based on the value of the + * old engine and seqno), the lookup is complete and reports NULL. + * If we decide the request is not completed (new engine or seqno), + * then we grab a reference and double check that it is still the + * active request - which it won't be and restart the lookup. + * + * Do not use kmem_cache_zalloc() here! + */ + req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL); if (!req) return ERR_PTR(-ENOMEM); @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine, req->engine = engine; req->ctx = i915_gem_context_get(ctx); + /* No zalloc, must clear what we need by hand */ + req->signaling.wait.tsk = NULL; + req->previous_context = NULL; + req->file_priv = NULL; + req->batch_obj = NULL; + req->elsp_submitted = 0; + /* * Reserve space in the ring buffer for all the commands required to * eventually emit this request. This is to guarantee that the diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h index b2456dede3ad..721eb8cbce9b 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.h +++ b/drivers/gpu/drm/i915/i915_gem_request.h @@ -51,6 +51,13 @@ struct intel_signal_node { * emission time to be associated with the request for tracking how far ahead * of the GPU the submission is. * + * When modifying this structure be very aware that we perform a lockless + * RCU lookup of it that may race against reallocation of the struct + * from the slab freelist. We intentionally do not zero the structure on + * allocation so that the lookup can use the dangling pointers (and is + * cogniscent that those pointers may be wrong). Instead, everything that + * needs to be initialised must be done so explicitly. + * * The requests are reference counted. */ struct drm_i915_gem_request { @@ -465,6 +472,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active) * just report the active tracker is idle. If the new request is * incomplete, then we acquire a reference on it and check that * it remained the active request. + * + * It is then imperative that we do not zero the request on + * reallocation, so that we can chase the dangling pointers! + * See i915_gem_request_alloc(). */ do { struct drm_i915_gem_request *request;