From patchwork Tue Jan 7 12:02:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: George Dunlap X-Patchwork-Id: 11320807 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E9CF930 for ; Tue, 7 Jan 2020 12:04:29 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3AC74207E0 for ; Tue, 7 Jan 2020 12:04:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b="OnxorUZt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3AC74207E0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=citrix.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ionZa-0004Qy-G8; Tue, 07 Jan 2020 12:02:58 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ionZZ-0004QE-88 for xen-devel@lists.xenproject.org; Tue, 07 Jan 2020 12:02:57 +0000 X-Inumbo-ID: 9ef356c8-3145-11ea-bf56-bc764e2007e4 Received: from esa3.hc3370-68.iphmx.com (unknown [216.71.145.155]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 9ef356c8-3145-11ea-bf56-bc764e2007e4; Tue, 07 Jan 2020 12:02:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=citrix.com; s=securemail; t=1578398569; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=I4pyeThGfpZFs7dmgvMqPAy8hBnOCHu0GVKLHOLXR/I=; b=OnxorUZtulu6uxmof0AnivpBbLVJnvZFJW2pUCXQMBs6McaTsB8idEoW 64uP+UJIpoaukzqadqvzICDsyxmyU2hzfoAN+SoVa3u5UtVc1i635zrux Z9qlKuRBqsUnF1AdfSBV9ihBbkK6z1m4ULHMbjv7+Ozok79yrizVtSd9g w=; Authentication-Results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=george.dunlap@citrix.com; spf=Pass smtp.mailfrom=George.Dunlap@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of george.dunlap@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="george.dunlap@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of George.Dunlap@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="George.Dunlap@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="George.Dunlap@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: izc3cCDQRIUQo4dNjBkJM8WcyqOFer5czQZvB5oaV31xai8681QJ2vSVXFu5E/dRbjXJPjUdPo +H+uxPUZIPLu7esbdgbIMBUujF6QJvKewq1S8v746FBDa7wmf+foqyNO/Ux5mZWIXgP7g/VsPv aQ2utujSFimP+LTLN6pFSB9RXYm2aNCj6XQFWg/kMQ/9y2VE0fnzvgUHh4h4LHrithHIvqrpFc dAp+5YR1eve63Xz4rYO6CANyBfIGcR4DODUufkY5pEyztWDaSS1yPwHTA9XXRc3dGVxJV2YKxm lpA= X-SBRS: 2.7 X-MesageID: 10531775 X-Ironport-Server: esa3.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.69,406,1571716800"; d="scan'208";a="10531775" From: George Dunlap To: Date: Tue, 7 Jan 2020 12:02:43 +0000 Message-ID: <20200107120243.222183-1-george.dunlap@citrix.com> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH] CODING_STYLE: Document how to handle unexpected conditions X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Stefano Stabellini , Julien Grall , Wei Liu , Konrad Wilk , Andrew Cooper , George Dunlap , Jan Beulich , Ian Jackson Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" It's not always clear what the best way is to handle unexpected conditions: whether with ASSERT(), domain_crash(), BUG_ON(), or some other method. All methods have a risk of introducing security vulnerabilities and unnecessary instabilities to production systems. Provide guidelines for different options and when to use them. Signed-off-by: George Dunlap Acked-by: Jan Beulich Acked-by: Julien Grall --- v4: - s/guest should/guests shouldn't/; - Add a note about the effect of domain_crash() further up the stack. v3: - A number of minor edits - Expand on domain_crash a bit. v2: - Clarify meaning of "or" clause - Add domain_crash as an option - Make it clear that ASSERT() is not an error handling mechanism. CC: Ian Jackson CC: Wei Liu CC: Andrew Cooper CC: Jan Beulich CC: Konrad Wilk CC: Stefano Stabellini CC: Julien Grall --- CODING_STYLE | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/CODING_STYLE b/CODING_STYLE index 810b71c16d..9f50d9cec4 100644 --- a/CODING_STYLE +++ b/CODING_STYLE @@ -133,3 +133,105 @@ the end of files. It should be: * indent-tabs-mode: nil * End: */ + +Handling unexpected conditions +------------------------------ + +GUIDELINES: + +Passing errors up the stack should be used when the caller is already +expecting to handle errors, and the state when the error was +discovered isn’t broken, or isn't too hard to fix. + +domain_crash() should be used when passing errors up the stack is too +difficult, and/or when fixing up state of a guest is impractical, but +where fixing up the state of Xen will allow Xen to continue running. +This is particularly appropriate when the guest is exhibiting behavior +well-behaved guests shouldn't. + +BUG_ON() should be used when you can’t pass errors up the stack, and +either continuing or crashing the guest would likely cause an +information leak or privilege escalation vulnerability. + +ASSERT() IS NOT AN ERROR HANDLING MECHANISM. ASSERT is a way to move +detection of a bug earlier in the programming cycle; it is a +more-noticeable printk. It should only be added after one of the +other three error-handling mechanisms has been evaluated for +reliability and security. + +RATIONALE: + +It's frequently the case that code is written with the assumption that +certain conditions can never happen. There are several possible +actions programmers can take in these situations: + +* Programmers can simply not handle those cases in any way, other than +perhaps to write a comment documenting what the assumption is. + +* Programmers can try to handle the case gracefully -- fixing up +in-progress state and returning an error to the user. + +* Programmers can crash the guest. + +* Programmers can use ASSERT(), which will cause the check to be +executed in DEBUG builds, and cause the hypervisor to crash if it's +violated + +* Programmers can use BUG_ON(), which will cause the check to be +executed in both DEBUG and non-DEBUG builds, and cause the hypervisor +to crash if it's violated. + +In selecting which response to use, we want to achieve several goals: + +- To minimize risk of introducing security vulnerabilities, + particularly as the code evolves over time + +- To efficiently spend programmer time + +- To detect violations of assumptions as early as possible + +- To minimize the impact of bugs on production use cases + +The guidelines above attempt to balance these: + +- When the caller is expecting to handle errors, and there is no +broken state at the time the unexpected condition is discovered, or +when fixing the state is straightforward, then fixing up the state and +returning an error is the most robust thing to do. However, if the +caller isn't expecting to handle errors, or if the state is difficult +to fix, then returning an error may require extensive refactoring, +which is not a good use of programmer time when they're certain that +this condition cannot occur. + +- BUG_ON() will stop all hypervisor action immediately. In situations +where continuing might allow an attacker to escalate privilege, a +BUG_ON() can change a privilege escalation or information leak into a +denial-of-service (an improvement). But in situations where +continuing (say, returning an error) might be safe, then BUG_ON() can +change a benign failure into denial-of-service (a degradation). + +- domain_crash() is similar to BUG_ON(), but with a more limited +effect: it stops that domain immediately. In situations where +continuing might cause guest or hypervisor corruption, but destroying +the guest allows the hypervisor to continue, this can change a more +serious bug into a guest denial-of-service. But in situations where +returning an error might be safe, then domain_crash() can change a +benign failure into a guest denial-of-service. + +- ASSERT() will stop the hypervisor during development, but allow +hypervisor action to continue during production. In situations where +continuing will at worst result in a denial-of-service, and at best +may have little effect other than perhaps quirky behavior, using an +ASSERT() will allow violation of assumptions to be detected as soon as +possible, while not causing undue degradation in production +hypervisors. However, in situations where continuing could cause +privilege escalation or information leaks, using an ASSERT() can +introduce security vulnerabilities. + +Note however that domain_crash() has its own traps: callers far up the +call stack may not realize that the domain is now dying as a result of +an innocuous-looking operation, particularly if somewhere on the +callstack between the initial function call and the failure, no error +is returned. Using domain_crash() requires careful inspection and +documentation of the code to make sure all callers at the stack handle +a newly-dead domain gracefully.