From patchwork Wed Jan 10 11:18:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516001 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2085.outbound.protection.outlook.com [40.107.220.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4E6E46BA6; Wed, 10 Jan 2024 11:19:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="zOL0+dXM" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hDbyLqhfeRlUe6XVMn/NkOR6gBX0701nVfEP2gCrD+PEJWFAwICwJ4UOzofOY8hLthSfbjwZovDmz4olk3sTtNKXE4VG0sG1WofYaRPr3dyrtLN5xKi1AKTdC7/Toioo4cYSLYJq8Q7S6aj0lIlY4NahQ4O/h1Z5pwOypgAfHKExGMXGYmdWMc7Si8JSSn8IkGgxNPqZYLTcoccHAJgIv96BJqU8XZIvJZgw8v1r9ZJdXEGURsbydcVb3j8jpI5Rjc9BfN1tV07xT2Eyjs0JBxk5kbZO5zI/OEhDtoaL67CrSJ7V/HFZHc/xFGEK144QRhm0qV1Vvin3UaWWr7jmxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=s7YHBUwUNfex96uK8+Htccp+8JTFV7ES+6ywHgrEbLU=; b=XNyTdpZNBWNEP6ntWudjMq3A/5Ua9gJEWF65dn9FJccj59YmpEiSefgIo4WT3lM9aM31C13cGYXHpLYk2Zh0xyY/txz46Td/TMpb5RLUByGozXesBTlpqnS7bx2BKs6EXZqcEsIfZS8srGG7q9WVXsKhwKb6locW0BSCP0C5ss65EI9g1qnxBqElVkKa+dZSkUf644M1QCF5kWDRBXSjwOUYSGBluNH/kqBWLkgbpYxI1jqG+jwO82T0TkW9Vsuu2IHVrFMMivTzRgnWEUg7NGaDXeSELt5mxNMGDGhlzNEpTVMGW3WWIuBXzlLX/PPwnRxRUUv6d+Z97fuCYAKdzg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=s7YHBUwUNfex96uK8+Htccp+8JTFV7ES+6ywHgrEbLU=; b=zOL0+dXMa9JQOTo89SOHJUGMg8qETbaMIxkIVsv8H5B2NgjkN/jbbly9Ae6KKlWJ5x3ErZeWOWYldlJKtkqCEIATvOZUTEZFg2xoZuLGQ3FByzUHTbdJbipVOlcPAQMyaf73q/9uG2V+rUuQwUjgGPToaFoMYcnZm/4LEn8BWwA= Received: from CH0P223CA0006.NAMP223.PROD.OUTLOOK.COM (2603:10b6:610:116::24) by SJ0PR12MB6879.namprd12.prod.outlook.com (2603:10b6:a03:484::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17; Wed, 10 Jan 2024 11:18:59 +0000 Received: from DS2PEPF00003441.namprd04.prod.outlook.com (2603:10b6:610:116:cafe::9b) by CH0P223CA0006.outlook.office365.com (2603:10b6:610:116::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:18:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS2PEPF00003441.mail.protection.outlook.com (10.167.17.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:18:58 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:18:53 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 1/9] doc: Add document for apparmor refcount management Date: Wed, 10 Jan 2024 16:48:48 +0530 Message-ID: <20240110111856.87370-1-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PEPF00003441:EE_|SJ0PR12MB6879:EE_ X-MS-Office365-Filtering-Correlation-Id: 2c29807f-f19b-45f3-2362-08dc11cdf0e8 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Pt9XogawPvuDSR1f8royO+72RqIzILzn6oll2yXTs7GukYn6PNH4NW9x9DlzjRC8i8UJQBWQpdUTe+TfJEhfsJIRAmEmMwfiLBSTOhw+giR4h6VMGMUO0KgRyqFBR/09XiyUnsKeGejWdcsfFBldzMhjuVYL872HXhA+DvSGvDfJTywguxEchemd6sTKvucVTYk1T+H57Sr/cO7xuWtdmPtWXiLcNVVOaAOiqNGYU8HvI3WPmlaLnpqkOsAy0ovmluvb9H9ae3NvRRtyfpFCBb/zRpOzoMoEMe26zrmktgUn2xiYWfRdXNnfYWGczJmJZP4k2arYQA42GsnKBWYj+wnaznvBPpLajVNY3gsQBsKVlL2ysO3r9Q238VcnKtRW7QHOhremE1AiOvFJlY8EVPP59PIDj3KLixepqEaleszeJ+NpGB9wJhq7IQHBfaNXWOHgP/rR/CvhK/IYVHcKzs6TF30CBtwLz0osZvoi6D41wPdfkGl7aCYS/ohgzo5f5ItIN7kpOyjRPXIkZmb46Uxzj8J/awKhcNNskFVME4q3c9rf3WcG8lQ+8j1SwPBPtTd4bNzpHVTJ7pkyDpeE2YUjhtBXmFVf/KnuGXA0daRStKkoFAa65MNZxg5uvYEa2NU3FloaFHpmlcRYWwOHzSrHlqsfVQFOpDdsJlvFhafHI+a1x2z7F5J6Ti4pJvPSjRDDSEw+5M2rfsDZ6b99MrrqNrjDoI6ni7D6Yz8HV3we0uGaG+jQuQwWB3tPqA1UnO5P5Mjf7+2bcKvozk+XtQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(136003)(396003)(376002)(346002)(230922051799003)(64100799003)(82310400011)(451199024)(1800799012)(186009)(36840700001)(40470700004)(46966006)(30864003)(7416002)(2906002)(5660300002)(41300700001)(83380400001)(40480700001)(2616005)(40460700003)(26005)(1076003)(426003)(336012)(16526019)(47076005)(478600001)(966005)(7696005)(6666004)(81166007)(356005)(86362001)(36860700001)(82740400003)(36756003)(70206006)(110136005)(8936002)(8676002)(70586007)(54906003)(316002)(4326008)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:18:58.6855 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2c29807f-f19b-45f3-2362-08dc11cdf0e8 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS2PEPF00003441.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR12MB6879 Add a document to describe refcount management of AppArmor kernel objects. Signed-off-by: Neeraj Upadhyay --- .../admin-guide/LSM/ApparmorRefcount.rst | 351 ++++++++++++++++++ Documentation/admin-guide/LSM/index.rst | 1 + 2 files changed, 352 insertions(+) create mode 100644 Documentation/admin-guide/LSM/ApparmorRefcount.rst diff --git a/Documentation/admin-guide/LSM/ApparmorRefcount.rst b/Documentation/admin-guide/LSM/ApparmorRefcount.rst new file mode 100644 index 000000000000..8132f1b661bb --- /dev/null +++ b/Documentation/admin-guide/LSM/ApparmorRefcount.rst @@ -0,0 +1,351 @@ +============================ +AppArmor Refcount Management +============================ + +Introduction +============ + +AppArmor task confinement is based on task profiles. Task profiles +specify the access rules - list of files which the task is allowed +to open/read/write, socket bind, mount, signal, ptrace and other +capabilities of the task. + +A sample raw task profile (typically present in /etc/apparmor.d/) +would look like this: + +:: + + 1. profile test_app /usr/bin/test_app { + 2. @{sys}/devices/** r, + 3. /var/log/testapp/access.log w, + 4. /var/log/testapp/error.log w, + 5. + 6. /lib/ld-*.so* mrix, + 7. + 8. ^hat { + 9. /dev/pts/* rw, + 10. } + 11. + 12. change_profile -> restricted_access_profile, + 13. + 14. /usr/*bash cx -> local_profile, + 15. + 16. profile local_profile { + 17. ... + 18. } + 19. } + + +Above example shows a sample profile. A quick description of +each line is given below: + +1 Defines a profile with name ``test_app`` and attachment specification + ``/usr/bin/test_app``. The attachment specification is used for + associating the application with a profile, during launch. + +2,3, 4 + Specifies read and write access to various paths. + +6 Read access for the so and inherit profile transition specification. + +8 Hat profile. Used for running a portion of the program with different + permissions, compared to the other portions of the program. For example, + to run unauthenticated traffic and authenticated traffic in separate + profiles in OpenSSH; running user supplied CGI scripts in separate + profile in Apache. + +12 Change profile rules, to switch child process to a profile, different + from the parent process, on exec. + +14 Profile transition for processes started from the current procees. For + example, transition to a different profile for ``ls``, which is invoked + from a shell program. + + +Objects and their Refcount Management +===================================== + +There are various object resources within AppArmor + +- Namespaces + + There is a root namespace associated for apparmorfs. This is the default + namespace, to which all profiles are associated with. + + Profiles can be associated with a different namespaces (for chroot, + containers env). + + Namespaces are represented using ``struct aa_ns``. Some of the relevant + fields are:: + + struct aa_policy base + struct aa_profile *unconfined + struct list_head sub_ns + struct aa_ns *parent + + ``struct aa_policy`` contains a list of profiles associated with this ns. + + ``unconfined`` profile manages refcount for this namespace. It is also + used as the default profile for tasks in this namespace and a proxy label, + when profiles are removed. + + ``sub_ns`` is the list of child namespaces. + + ``parent`` Parent namespace, for this namespace. + + A parent and its child sub namespaces keep reference to each other:: + + +---------------------+ + | | + | root_ns | + | | + +---------------------+ + ^ / \ ^ + / / \ \ + parent / / \ \ parent + / / sub_ns \ \ + / / \ \ + / / \ \ + / v v \ + +-----------+ +-----------+ + | | | | + | ns1 | | ns2 | + | | | | + +-----------+ +-----------+ + + Here, ``root_ns`` is the root apparmor namespace. It maintains a + reference to all child namespace which are present in ``->sub_ns``. + The child namespaces ``ns1``, ``ns2`` keep a reference to their + ``parent``, which is the ``root_ns``. + + +- Profiles + + Profiles are represented as ``struct aa_profile`` + + Some of the fields of interest are:: + + struct aa_policy base + struct aa_profile __rcu *parent + struct aa_ns *ns + struct aa_loaddata *rawdata + struct aa_label label + + ``base`` - Maintains the list of child subprofiles - hats + + ``parent`` - If subprofile, pointer to the parent profile + + ``ns`` - Parent namespace + + ``rawdata`` - Used for matching profile data, for profile updates + + ``label`` - Refcount object + + A profile keeps a reference to the namespace it is associated with. + In addition, there is a reference kept for all profiles in + ``base.profiles`` of a namespace:: + + +-----------------------------+ + | | + | root_ns | + | | + +-----------------------------+ + ^ / ^ | + / / ns | | + parent / / | | + / / sub_ns | |base.profiles + / / | | + / / | | + / v | v + +-----------+ +-----------+ + | | | | + | ns1 | | A | + | | | | + +-----------+ +-----------+ + base | ^ + .profiles| | parent + v | + +-----------+ + | | + | P | + | | + +-----------+ + + For subprofiles, a refcount is kept for the ``->parent`` profile. + For each child in ``->base.profiles``, a refcount is kept:: + + +--------------+ + | | + | root_ns | + | | + +-------^------+ + base. | | + profiles v |ns + +---------------+ + | | + ^| A |^ + parent / | | \parent + / +---------------+ \ + / / base.profiles\ \ + / / v \ + +------v-+ +----\---+ + | | | | + | B | | C | + | | | | + +--------+ +--------+ + + +- Labels + + Label manages refcount for the ``aa_profile`` objects. It is + represented as ``struct aa_label``. Some of the fields are:: + + struct kref count + struct aa_proxy *proxy + long flags + int size + struct aa_profile *vec[] + + ``count`` - Refcounting for the enclosing object. + ``proxy`` - Redirection of stale profiles + ``flags`` - state (STALE), type (NS, PROFILE) + ``vec`` - if ``size`` > 1, for compound labels (for stacked profiles) + + + For compound/stack labels, there is a reference kept, for all + the stack profiles:: + + +--------+ +---------+ +-------+ + | A | | B | | C | + | | | | | | + +-----^--+ +---------+ +-------+ + ^ \ ^ ^ + \ \ | | + \ \ +---------------+ | + \ \ | A//&:ns1:B | | + \ \| | | + \ +---------------+ | + \ | + \ | + \ +-------------------+ + \|A//&:ns1:B//&:ns2:C| + | | + +-------------------+ + +- Proxy + + A proxy is associated with a label, and is used for redirecting + running tasks to new profile on profile change. Proxies are + represented as ``struct aa_proxy``:: + + struct aa_label __rcu *label + + ``label`` - Redirect to the latest profile label. + + While a label is not stale, its proxy points to the same label. + On profile updates, the proxy, the label is marked as stale, + and label is redirected to the new profile label:: + + +------------+ +-----------+ + | | | | + | old | -------->| P1 | + | | <--------| | + +------------+ +-----------+ + + + +------------+ +------------+ + | | | | + | old |-------->| P1 | + | | | | + +------------+ +--^---------+ + | | + +------------+ | | + | |-----------/ | + | new |<-------------/ + | | + +------------+ + +Lifecycle of the Apparmor Kernel Objects +======================================== + +#. Init + + #. During AppArmor init, root ns (RNS:1) and its unconfined + profile are created (RNS:1). If initialization completes + successfully, the ``root_ns`` initial ref is never destroyed + (?). + + #. Usespace init scripts load the current set of defined profiles + into kernel, typically through ``apparmor_parser`` command. + + The loaded profiles have an initial refcount (P1:1 , P2:1). + A profile P1, which is in default namespace keeps a reference + to root ns (RNS:2). If a profile P2 is in a different namespace, + NS1, that namespace object is allocated (NS1:1) and the namespace + is added to ``sub_ns`` of ``root_ns`` (NS1:2). The child namespace + NS1 keeps a reference to parent ``root_ns`` (RNS:3). P2 keeps a + reference to NS1 (NS1:2). The root ns keeps a reference to P1 in + ``->profiles`` list (P1:2). NS1 keeps a reference to P2 in its + ``->profiles`` list (P2:2). In addition, label proxies keep + reference to P1 and P2 (P1:3, P2:3). + +#. Runtime + + #. As part of the bprm cred updates (``apparmor_bprm_creds_for_exec()``), + the current task T1 is attached to a profile (P1), based on the best + attachment match rule. T1 keeps a refcount for P1, while the current + ``cred`` is active (P1:4). + + #. If P1 is replaced with a new profile P3, P1 is removed from the root + ns profiles list (P1:3), proxy is redirected to P3 (P1:2), and the + initial label is released (P1:1) and P1's label is marked stale. + + #. Any T1 accesses, which have a apparmor hook, would reference the + current task's cred label:: + + __begin_current_label_crit_section() + struct aa_label *label = cred_label(cred); + + if (label_is_stale(label)) + label = aa_get_newest_label(label); + + return label; + + aa_get_newest_label(struct aa_label __rcu **l) + return aa_get_label_rcu(&l->proxy->label); + + aa_get_label_rcu(struct aa_label __rcu **l) + rcu_read_lock(); + do { + c = rcu_dereference(*l); + } while (c && !kref_get_unless_zero(&c->count)); + rcu_read_unlock(); + + #. On task exit and cref freeing, the last reference for P1 is + released (P1:0). + +#. Release + + Below is the set of release operations, based on the label's + parent object type. + + #. If ns is not assigned (early init error exit), do not wait for + RCU grace period. Otherwise use ``call_rcu()`` + + #. If label is associated with a namespace (unconfined label) + #. Drop Parent ns reference. + + #. If label is associated with a profile + #. Drop parent profile reference. + #. Drop ns reference. + + #. Drop all vector profile references for stacked profiles. + + +Links +===== + +Userspace tool - https://gitlab.com/apparmor/apparmor + Profile syntax - parser/apparmor.d.pod + Sample change hats - changehat/ + Other documentation - libraries/libapparmor/doc diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst index a6ba95fbaa9f..c608db9e7107 100644 --- a/Documentation/admin-guide/LSM/index.rst +++ b/Documentation/admin-guide/LSM/index.rst @@ -41,6 +41,7 @@ subdirectories. :maxdepth: 1 apparmor + ApparmorRefcount LoadPin SELinux Smack From patchwork Wed Jan 10 11:18:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516002 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2076.outbound.protection.outlook.com [40.107.94.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F6694776D; Wed, 10 Jan 2024 11:19:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Hn9UXXll" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BTGDN2iCbj4sizMrdRSuSepl+jM69t07ryhFh/Q7PTnw2p4Wg2jVKnwOIJYTRyZv46fRM5hqdaaysWMUMAH8dKzD3JFyILWSoNaQWXapc6dBeR0kzdL07IQvzRzPLAQsmSnbYxr3pNhHHTlgCDWeVftsvQYH7I5SaS54haBQVlWnVyDmRC80cLqHgtZ8swFm1JQYGvpaZ0YOJlJIaqcm1bTI/iAzNi1EjZvhxEt9Q6zbPT2nK+B0JhjnXQryHm1pgkSMsxPFGZ2waouN3cb5p45zS1/2iDFVij4BsCrfUc+DYNwkcb3XgzkV5X5du5GWxhnd3Ziwqb5LZuB7H0t3aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x8309bERNeam57LB3VLtITaMwWCCFUEfy1P+5EE0MPM=; b=F3lkxqKvXu5u/2CDK3z/NQgAIvGrLFzPwD8BYpEE12JqQrKTK4VLqQT9McDFWqthBmETVlfYg5Xh1r2UJSs2XmHATQBdbf40RYvFn9IQ/q0KrDaGkPX3M1wHePp2eJm+owTwo0AtPIhdAIwBgTgLAh5t2tYfs3Jc/Ymg8WC9+KrE7h58k2kjEEnhlHe3A3SfKhLN0OmILGi1HM9WZYO70tTC9XZHn17rqtYkzmIjaz/7s1XHszNC3lt08wN/lTHX8H8ktJGw/rwTznoVYdC58w9apXSCju/wmy3EEWNV8tNclfIorK/ZUlYJg7z0gvi7H175W+/ntSJKu+TviQ0q9A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x8309bERNeam57LB3VLtITaMwWCCFUEfy1P+5EE0MPM=; b=Hn9UXXll8NAzD+LtvpZiPKfw4FZblZLdQrnm6Y8it9FyqN+52FG+C4HkZVcD6de3fvl/AgJOlKLBEhhCTYkpOx4/KA7/VPozCzNm0YuKjQymKQj5dfuQ3H23iCzhnhq/kzteBe9/HiIjGPXKwJqndWtSw82EN9j9z9JjUAwH/Ao= Received: from CH0PR04CA0013.namprd04.prod.outlook.com (2603:10b6:610:76::18) by DM6PR12MB4481.namprd12.prod.outlook.com (2603:10b6:5:2af::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18; Wed, 10 Jan 2024 11:19:49 +0000 Received: from DS2PEPF00003441.namprd04.prod.outlook.com (2603:10b6:610:76:cafe::42) by CH0PR04CA0013.outlook.office365.com (2603:10b6:610:76::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17 via Frontend Transport; Wed, 10 Jan 2024 11:19:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS2PEPF00003441.mail.protection.outlook.com (10.167.17.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:19:49 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:19:44 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 2/9] apparmor: Switch labels to percpu refcount in atomic mode Date: Wed, 10 Jan 2024 16:48:49 +0530 Message-ID: <20240110111856.87370-2-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PEPF00003441:EE_|DM6PR12MB4481:EE_ X-MS-Office365-Filtering-Correlation-Id: 794a58a3-4a24-41ee-8390-08dc11ce0f50 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IWtZN/YHGxn+AuwULhv+BpkuyIQKKyednFrMGE1JkRvjJK4d0C08hG44YVxCWkhTRphOBhzCkj2vjMVaE4mYyirucAPw+UoHWXoe+/KBRCFxyGu13RLmDPFYjBPC/nQLJxJvtRkDO4IzwnNkFEj4RbiP067IJkWz6PheizTJ0FsT87jGrxbRm9uARVJpgpQIHE5yHAaCiSyAdBkj8GJNOcyBBegFdTC73rStM+Ji1EIPm+UMAbudRFQa2MMphmDfZvZcxrhajv0D7DwKmx7+fjnrp0w+zsaFHaGjHDs6vO0ip7TRX+iifbqTIWmKt3VJU/16mJ8XpMoU67gKXjJqtT1Kj4ymN1NbQqPLz+f9Er270v1PGom+jhFXluSzOCRDnUCSGiZTBzUaXJG2ESYSgvTF2eTtHb6vEx9M6TreHlj1k/BRComp2fjjcRdLylSyR5W2/VWTlKFhwSh5DgyMPVo8GQkvue7QCn1hhxIaYtiskuggNWX1nkGtQ3ZFPRiwECzE7ANkTe/m4QYbU8yyHbfrfHASnhcn77AuKtUra0iUaGp0V9xZcyzqzWRNa8Z+/wjSGMhL6V8c0sPtEwCrxeDNdI9D82aCb/hMu128DtPzkJXeTsVvZsJlV+BPucIAZpOIT5/y8vmMOHkasHk8ZmaWaGkad1eHJY2D9LyzIRpJJ55kcj1XaFZIMVb42VQbPN5lfha7X/FwsX/kr1BG1NpHm4Td7l948vBvOi3HIFvjKcRYoarKK+w6nKwWlw5qnNs/ERQFNWFPnjrVJfjVNQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(376002)(396003)(346002)(136003)(39860400002)(230922051799003)(1800799012)(451199024)(64100799003)(82310400011)(186009)(36840700001)(46966006)(40470700004)(40480700001)(40460700003)(83380400001)(41300700001)(70586007)(356005)(36756003)(81166007)(86362001)(70206006)(82740400003)(36860700001)(47076005)(16526019)(426003)(336012)(1076003)(26005)(2616005)(4326008)(2906002)(110136005)(316002)(478600001)(7696005)(54906003)(8936002)(5660300002)(7416002)(8676002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:19:49.6856 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 794a58a3-4a24-41ee-8390-08dc11ce0f50 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS2PEPF00003441.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4481 In preparation of using percpu refcount for labels, this patch replaces label kref with percpu refcount. The percpu ref is initialized to atomic mode, as using percpu mode, requires tracking ref kill points. As the atomic counter is in a different cacheline now, rearrange some of the fields - flags, proxy; to optimize some of the fast paths for unconfined labels. In addition to the requirement to cleanup the percpu ref using percpu_ref_exit() in label destruction path, other potential impact from this patch could be: - Increase in memory requirement (for per cpu counters) for each label. - Displacement of aa_label struct members to different cacheline, as percpu ref takes 2 pointers space. - Moving of the atomic counter outside of the cacheline of the aa_label struct. Signed-off-by: Neeraj Upadhyay --- security/apparmor/include/label.h | 16 ++++++++-------- security/apparmor/include/policy.h | 8 ++++---- security/apparmor/label.c | 11 ++++++++--- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/security/apparmor/include/label.h b/security/apparmor/include/label.h index 2a72e6b17d68..4b29a4679c74 100644 --- a/security/apparmor/include/label.h +++ b/security/apparmor/include/label.h @@ -121,12 +121,12 @@ struct label_it { * @ent: set of profiles for label, actual size determined by @size */ struct aa_label { - struct kref count; + struct percpu_ref count; + long flags; + struct aa_proxy *proxy; struct rb_node node; struct rcu_head rcu; - struct aa_proxy *proxy; __counted char *hname; - long flags; u32 secid; int size; struct aa_profile *vec[]; @@ -276,7 +276,7 @@ void __aa_labelset_update_subtree(struct aa_ns *ns); void aa_label_destroy(struct aa_label *label); void aa_label_free(struct aa_label *label); -void aa_label_kref(struct kref *kref); +void aa_label_percpu_ref(struct percpu_ref *ref); bool aa_label_init(struct aa_label *label, int size, gfp_t gfp); struct aa_label *aa_label_alloc(int size, struct aa_proxy *proxy, gfp_t gfp); @@ -373,7 +373,7 @@ int aa_label_match(struct aa_profile *profile, struct aa_ruleset *rules, */ static inline struct aa_label *__aa_get_label(struct aa_label *l) { - if (l && kref_get_unless_zero(&l->count)) + if (l && percpu_ref_tryget(&l->count)) return l; return NULL; @@ -382,7 +382,7 @@ static inline struct aa_label *__aa_get_label(struct aa_label *l) static inline struct aa_label *aa_get_label(struct aa_label *l) { if (l) - kref_get(&(l->count)); + percpu_ref_get(&(l->count)); return l; } @@ -402,7 +402,7 @@ static inline struct aa_label *aa_get_label_rcu(struct aa_label __rcu **l) rcu_read_lock(); do { c = rcu_dereference(*l); - } while (c && !kref_get_unless_zero(&c->count)); + } while (c && !percpu_ref_tryget(&c->count)); rcu_read_unlock(); return c; @@ -442,7 +442,7 @@ static inline struct aa_label *aa_get_newest_label(struct aa_label *l) static inline void aa_put_label(struct aa_label *l) { if (l) - kref_put(&l->count, aa_label_kref); + percpu_ref_put(&l->count); } diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h index 75088cc310b6..5849b6b94cea 100644 --- a/security/apparmor/include/policy.h +++ b/security/apparmor/include/policy.h @@ -329,7 +329,7 @@ static inline aa_state_t ANY_RULE_MEDIATES(struct list_head *head, static inline struct aa_profile *aa_get_profile(struct aa_profile *p) { if (p) - kref_get(&(p->label.count)); + percpu_ref_get(&(p->label.count)); return p; } @@ -343,7 +343,7 @@ static inline struct aa_profile *aa_get_profile(struct aa_profile *p) */ static inline struct aa_profile *aa_get_profile_not0(struct aa_profile *p) { - if (p && kref_get_unless_zero(&p->label.count)) + if (p && percpu_ref_tryget(&p->label.count)) return p; return NULL; @@ -363,7 +363,7 @@ static inline struct aa_profile *aa_get_profile_rcu(struct aa_profile __rcu **p) rcu_read_lock(); do { c = rcu_dereference(*p); - } while (c && !kref_get_unless_zero(&c->label.count)); + } while (c && !percpu_ref_tryget(&c->label.count)); rcu_read_unlock(); return c; @@ -376,7 +376,7 @@ static inline struct aa_profile *aa_get_profile_rcu(struct aa_profile __rcu **p) static inline void aa_put_profile(struct aa_profile *p) { if (p) - kref_put(&p->label.count, aa_label_kref); + percpu_ref_put(&p->label.count); } static inline int AUDIT_MODE(struct aa_profile *profile) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index c71e4615dd46..aa9e6eac3ecc 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -336,6 +336,7 @@ void aa_label_destroy(struct aa_label *label) rcu_assign_pointer(label->proxy->label, NULL); aa_put_proxy(label->proxy); } + percpu_ref_exit(&label->count); aa_free_secid(label->secid); label->proxy = (struct aa_proxy *) PROXY_POISON + 1; @@ -369,9 +370,9 @@ static void label_free_rcu(struct rcu_head *head) label_free_switch(label); } -void aa_label_kref(struct kref *kref) +void aa_label_percpu_ref(struct percpu_ref *ref) { - struct aa_label *label = container_of(kref, struct aa_label, count); + struct aa_label *label = container_of(ref, struct aa_label, count); struct aa_ns *ns = labels_ns(label); if (!ns) { @@ -408,7 +409,11 @@ bool aa_label_init(struct aa_label *label, int size, gfp_t gfp) label->size = size; /* doesn't include null */ label->vec[size] = NULL; /* null terminate */ - kref_init(&label->count); + if (percpu_ref_init(&label->count, aa_label_percpu_ref, PERCPU_REF_INIT_ATOMIC, gfp)) { + aa_free_secid(label->secid); + return false; + } + RB_CLEAR_NODE(&label->node); return true; From patchwork Wed Jan 10 11:18:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516003 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2084.outbound.protection.outlook.com [40.107.220.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2636481BC; Wed, 10 Jan 2024 11:20:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="f/U/jC6g" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X0LILTzv7XtqgeCDn1TyZd79oi82tcmgd2KmXrTHbjqkOfWBPa/Uk1TpSsGyj4LPyY0EJYZJw/3VGY7kSPtHGaFtvbH38LjW0+CHm6H0NEUA5BvVvxBD48UDz3q4pbhst63iTctqdQWebf15f/QxGmQtI7XSYWqMANCLbbjqblEtuykcQNgFpNEqBPm43PspVSgAJZjXN7Z+ueqBEMZbXNd5wq0Xs+/T+0rhKVEtjRelytBfdZrd0IxRFk/D8fjnsn4K6Ugy5gujtNmj7R4B3xIv7PChDyWmZjvwL6/7f6uwFTEMaq9d7EaYOnJ8wAy6G/Sy8J113qkPhDtHj4UYKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=l9IM31eGiOCOGoNM2OrwIrmEWhcfbN98ozVHvP1JWhI=; b=JVPB7QkcFsqkIIiT4KYZ8hrOW4XkRINF9e5aM6fdG0w3ZsUuDqjzDo3bI0R18lXBNlm17x6VuNYL3t4MWveMcIVIG1pIpj7wqa664xlXepudR2SLF41p96Sac+800QVPEkdQYmPD8qrXJZFYX8wJ+k6tZF/Jw0q4cIbOZUXAEH+FLSbQ5YqIAMqc6kIbIHOlOV2z7+FtGGGE7/IO7sE2qUO9ZPGAf0Ty3WnuQlQ4yzwSr9Ls490Dm7EdCO6GhU6muEnYaqHPMNi3/flspoAmXMbnDrnvG43Dnf9qU4W8EUWH+pP/oJi8fe6gi5gqC6qZrV4xYvBPeBeKlL86dvb3og== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=l9IM31eGiOCOGoNM2OrwIrmEWhcfbN98ozVHvP1JWhI=; b=f/U/jC6gJPz/mIK65LgzBiY9w7PSzaY/itsekSG+4CnDukzT4Qsr+vR5/3uC7LG+emy8agk6gwjcM5GrQeG33ZHqKFIbvVAnDcmaBn3TZOKV4Sz0S1cMEXEONsAdFsloX2BSNSNg4X4IlbBfOAvX8enHyp+CZh+/fc1uIhzDqaE= Received: from CH5PR02CA0006.namprd02.prod.outlook.com (2603:10b6:610:1ed::16) by LV3PR12MB9329.namprd12.prod.outlook.com (2603:10b6:408:21c::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17; Wed, 10 Jan 2024 11:20:39 +0000 Received: from DS2PEPF00003442.namprd04.prod.outlook.com (2603:10b6:610:1ed:cafe::c6) by CH5PR02CA0006.outlook.office365.com (2603:10b6:610:1ed::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:20:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS2PEPF00003442.mail.protection.outlook.com (10.167.17.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:20:39 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:20:34 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 3/9] apparmor: Switch unconfined namespaces refcount to percpu mode Date: Wed, 10 Jan 2024 16:48:50 +0530 Message-ID: <20240110111856.87370-3-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PEPF00003442:EE_|LV3PR12MB9329:EE_ X-MS-Office365-Filtering-Correlation-Id: 705eecff-5954-47cc-51c6-08dc11ce2cd3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: v14kXEP3V1WNM1eDrhqqxNa43JAzRPnNDsFvY9ucNplJF2PEx3/lyqrEcKRbpUCWHPIc3F0wv8yxA6OoiFMogaeaBt6wAc98E69jJXQmt4o1mhSPc6U2vn+a3Qrlo2kGqfZaBOa3gamah2P04Hb4mbLwK/6FwJRFhowv+X2Mys/U0KlyagZnoth5R9X73XqqyXU4k4JQX+q0QhifDXBo8usazeOO7p4ovRZjbZ7+rFQhsK49FJsUInpPgoZavWnOBqtlCt4h5zpNWyPyKTORFcdnOChwcHqqYMjCDNISFaxPzVeN6zcOn8QJIITdarbuX6iYg6AINVxQzLftbdMUHolvw5aEYmcXppcoofY1p59TtNqyhJs0CZoHuorUFFIGdUGli14LxryLKCsHkDxVJGJa8kjA0K0OouO6W2IaqiHRGzjF7UbyyGqGDc0FK/7syhCV99uAmHAkPuuRz9t422H6bhasYQjkFiR04jLuROtCtMKuyihBaWmQHaIoRU6Is2dFGt7xutjyrfkPpDWn1FFgaasO5trC2EjI77Dlb8L9xuNzrunGsQpyNAmWTgxGTEdusr9QDJdddooRYWzXuFbEdz+kNirsiNqiMRteGXr7OLaotxPns81aZgz8gih2jQB/5Igrf5mqGrcoemtsCR/kufohjjipNPDpRit/QWObp5k2LDUZsUfvejJnsHR0RpNDwLRnJL5v6K0fI8yc0awHIwbfVMUXCzcRK253/00Vi3DQtWBeUm2pbu5Q0i1Rcqjd6LzFpfmfDEQMcTeuMg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(396003)(346002)(376002)(136003)(39860400002)(230922051799003)(451199024)(82310400011)(1800799012)(186009)(64100799003)(46966006)(40470700004)(36840700001)(47076005)(83380400001)(16526019)(426003)(1076003)(40480700001)(40460700003)(26005)(336012)(2616005)(7696005)(6666004)(478600001)(356005)(8936002)(8676002)(36756003)(110136005)(54906003)(4326008)(70206006)(70586007)(316002)(86362001)(81166007)(82740400003)(36860700001)(5660300002)(7416002)(2906002)(41300700001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:20:39.2124 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 705eecff-5954-47cc-51c6-08dc11ce2cd3 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS2PEPF00003442.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9329 Switch unconfined labels to percpu mode, to avoid high memory contention on refcount get and put operations, when multiple cpus try to perform these operations at the same time. Unconfined label for root and sub namespaces are killed at the point of aa_free_root_ns(). Though labels/profiles in various namespaces could potentially still be active after this point, aa_free_root_ns() is not typically called when apparmor enforcement is enabled. Signed-off-by: Neeraj Upadhyay --- security/apparmor/include/policy.h | 24 ++++++++++++++++++++++++ security/apparmor/include/policy_ns.h | 24 ++++++++++++++++++++++++ security/apparmor/policy_ns.c | 6 ++++-- 3 files changed, 52 insertions(+), 2 deletions(-) diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h index 5849b6b94cea..1e3b29ba6c03 100644 --- a/security/apparmor/include/policy.h +++ b/security/apparmor/include/policy.h @@ -379,6 +379,30 @@ static inline void aa_put_profile(struct aa_profile *p) percpu_ref_put(&p->label.count); } +/** + * aa_switch_ref_profile - switch percpu-ref mode for profile @p + * @p: profile (MAYBE NULL) + */ +static inline void aa_switch_ref_profile(struct aa_profile *p, bool percpu) +{ + if (p) { + if (percpu) + percpu_ref_switch_to_percpu(&p->label.count); + else + percpu_ref_switch_to_atomic_sync(&p->label.count); + } +} + +/** + * aa_kill_ref_profile - percpu-ref kill for profile @p + * @p: profile (MAYBE NULL) + */ +static inline void aa_kill_ref_profile(struct aa_profile *p) +{ + if (p) + percpu_ref_kill(&p->label.count); +} + static inline int AUDIT_MODE(struct aa_profile *profile) { if (aa_g_audit != AUDIT_NORMAL) diff --git a/security/apparmor/include/policy_ns.h b/security/apparmor/include/policy_ns.h index d646070fd966..f3db01c5e193 100644 --- a/security/apparmor/include/policy_ns.h +++ b/security/apparmor/include/policy_ns.h @@ -127,6 +127,30 @@ static inline void aa_put_ns(struct aa_ns *ns) aa_put_profile(ns->unconfined); } +/** + * aa_switch_ref_ns - switch percpu-ref mode for @ns + * @ns: namespace to switch percpu-ref mode of + * + * Switch percpu-ref mode of @ns between percpu and atomic + */ +static inline void aa_switch_ref_ns(struct aa_ns *ns, bool percpu) +{ + if (ns) + aa_switch_ref_profile(ns->unconfined, percpu); +} + +/** + * aa_kill_ref_ns - do percpu-ref kill for @ns + * @ns: namespace to perform percpu-ref kill for + * + * Do percpu-ref kill of @ns refcount + */ +static inline void aa_kill_ref_ns(struct aa_ns *ns) +{ + if (ns) + aa_kill_ref_profile(ns->unconfined); +} + /** * __aa_findn_ns - find a namespace on a list by @name * @head: list to search for namespace on (NOT NULL) diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c index 1f02cfe1d974..ca633cfbd936 100644 --- a/security/apparmor/policy_ns.c +++ b/security/apparmor/policy_ns.c @@ -124,6 +124,7 @@ static struct aa_ns *alloc_ns(const char *prefix, const char *name) goto fail_unconfined; /* ns and ns->unconfined share ns->unconfined refcount */ ns->unconfined->ns = ns; + aa_switch_ref_ns(ns, true); atomic_set(&ns->uniq_null, 0); @@ -336,7 +337,7 @@ void __aa_remove_ns(struct aa_ns *ns) /* remove ns from namespace list */ list_del_rcu(&ns->base.list); destroy_ns(ns); - aa_put_ns(ns); + aa_kill_ref_ns(ns); } /** @@ -377,6 +378,7 @@ int __init aa_alloc_root_ns(void) } kernel_t = &kernel_p->label; root_ns->unconfined->ns = aa_get_ns(root_ns); + aa_switch_ref_ns(root_ns, true); return 0; } @@ -392,5 +394,5 @@ void __init aa_free_root_ns(void) aa_label_free(kernel_t); destroy_ns(ns); - aa_put_ns(ns); + aa_kill_ref_ns(ns); } From patchwork Wed Jan 10 11:18:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516004 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2040.outbound.protection.outlook.com [40.107.236.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E681481D6; Wed, 10 Jan 2024 11:21:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="ntpzGnGi" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CwIEtErzN4qt1T36iJqsR1VX2fCtoyRhOpY1UTe08EeQosayY2pikpBSBT97/eq9BFnn90HkqoCrcMEn80Sp1RAZbWPKNpTwJFCWdMx/hvpChUNiqhPdalbF7DH6c26sssLReKRk7Sb0zGQScHtQoigaRGkd7kXseIrLTqOBFBAAgoA6qrDubji+xSlYTA0gjt0TSrvdrnJff6sh9qLraiikCxKoLtrro04pYmp5Ck1ebXl/FrYubXNjZgw67ssrZpnBcv7MznMq1KtvCDyVcYFTa/G9I/ZRdHT+M1o71qgS9Vx+1beY7iz2oMcBnTdOqT+gKip+e7Du0KnBADGYDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qqLIlC44fG9l3ORd4q9RiH8jMihPPd98yhdxmQ3nUvI=; b=S/nN+JTGjYftRGZ78BF0tZzJSVpviuehdvudYaQ3C7jndO2wwbVnlH5ONt3EE+azt93x0D+E7nGHL1G/KAG3aYRh5ndCt9RQ5C//f1Y4flJsf7otoT1B0J2OYMKA6//P1BcWC7bFYDbvROd1DBw04HXTHUjpsewf/JNhByGLaXd8dfMZh4DZjYL+IGQ1Bwqn9DuHUSQvsbcnd3APgBPa3H/zE8cx+vOYAUdOhm8QpGbHZeIAW+r6nib1PLdQXfA86EzEUEab2lma4Sb9mkPaU+fXGrOQoUpnYmxHBkKGJxMalN6xlV7YLq1HLl8KQuF39JKFf5yR2bPdyFc/eE+BIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qqLIlC44fG9l3ORd4q9RiH8jMihPPd98yhdxmQ3nUvI=; b=ntpzGnGi1OkwjgfyIPtUbzEovxoRuau80ENxAstht7wlbR596NjnpxUMfQ/5bJr99Z0hUBOIV4tXQU5xrLSJSpCoH3eXP7B8/HVi74a4mKmkzadSni5zceSJ7+Kk1fTGB5hOIHDxDf5I3hGsOAZ3hqZV/etpzkZ5bVGVsr8+N2I= Received: from DM6PR02CA0113.namprd02.prod.outlook.com (2603:10b6:5:1b4::15) by SN7PR12MB7130.namprd12.prod.outlook.com (2603:10b6:806:2a2::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18; Wed, 10 Jan 2024 11:21:28 +0000 Received: from DS2PEPF00003444.namprd04.prod.outlook.com (2603:10b6:5:1b4:cafe::b7) by DM6PR02CA0113.outlook.office365.com (2603:10b6:5:1b4::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:21:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS2PEPF00003444.mail.protection.outlook.com (10.167.17.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.13 via Frontend Transport; Wed, 10 Jan 2024 11:21:28 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:21:23 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 4/9] apparmor: Add infrastructure to reclaim percpu labels Date: Wed, 10 Jan 2024 16:48:51 +0530 Message-ID: <20240110111856.87370-4-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PEPF00003444:EE_|SN7PR12MB7130:EE_ X-MS-Office365-Filtering-Correlation-Id: 6a3e4873-0678-4f29-4376-08dc11ce4a42 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IUZ+ifiGX75DN6nvElocYRnHCXlefuvEHkuoQnS1SWuqfeL412Amnv+Yo++ilFEchh2MQgqwzqUwuwLTzFislrGYoNH4wvVvI2pEE/WPRKDfhfiYrqh/uAxcOhsWJFgjISPUVHW2nkHWHlqK7yLDdH7ptRv4ZuYKLoR8c9zYVZ283mQo5lsgS37LDq9+VbZYSX0cXT1MbVzQ5jE6MOAh95Xqn83zN/OIKwzpP+67jKusCiUTMJ9fDKYtXyiOiycJ68pr8UFT8GBo+10RtJr81Xsvnspj99IHsfaBDAr+e8cZNwuD5R1X1lK2HupjuDtB0BI2OaCpkBcT5bNd6OdO6c1+Ad49joAeuX2Eq30VmRBVUvJgcmBAt5KpNo5l0wxIu0exnCETIxkplCAVRXbx5ZIoy0+IT3/zroz7m8JBWxU28u/9CDJnvbQJrNYe3FGVm8edAf1fePbhGMIOcutrnUMWA6XHKEs8iOuMv45Pwjf68se6WybWxP+TkNodEIiSYUqhuCpzIbqSSqvdtGFx2+AkA3qkBLK6QQlJMocZ9qaKJp3IYmPXJs7kB6DIIC4h32K1T8ATnlV/lGXJ2BnN7T9rA41MWinKlQ3HLwDYwfPVUrZiaBQjIZYJeIi6bQy1+AmoH9qHU76moGh+6+t0NZbmQh0ubpXPrq6Xxm8ekBaN3y09X1lxpGGSENEXaIYMH5cZSCaja1C5xA5cSw3GXlYh7JE0x9uduBroZTfEfN0knLfzDH66A6bpcLevpdaPgoeGu8wEaKaaciVbsmw9Dg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(136003)(396003)(346002)(376002)(39860400002)(230922051799003)(186009)(451199024)(82310400011)(1800799012)(64100799003)(36840700001)(40470700004)(46966006)(40460700003)(40480700001)(478600001)(426003)(336012)(26005)(16526019)(2616005)(6666004)(7696005)(1076003)(86362001)(83380400001)(110136005)(70206006)(70586007)(36860700001)(54906003)(316002)(47076005)(82740400003)(5660300002)(36756003)(41300700001)(356005)(7416002)(81166007)(2906002)(4326008)(8936002)(8676002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:21:28.5934 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6a3e4873-0678-4f29-4376-08dc11ce4a42 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS2PEPF00003444.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB7130 Nginx performance testing with Apparmor enabled (with nginx running in unconfined profile), on kernel versions 6.1 and 6.5 show significant drop in throughput scalability, when Nginx workers are scaled to higher number of CPUs across various L3 cache domains. Below is one sample data on the throughput scalability loss, based on results on AMD Zen4 system with 96 CPUs with SMT core count 2; so, overall, 192 CPUs: Config Cache Domains apparmor=off apparmor=on scaling eff (%) scaling eff (%) 8C16T 1 100% 100% 16C32T 2 95% 94% 24C48T 3 94% 93% 48C96T 6 92% 88% 96C192T 12 85% 68% There is a significant drop in scaling efficiency, when we move to 96 CPUs/192 SMT threads. Perf tool shows most of the contention coming from below 6.56% nginx [kernel.vmlinux] [k] apparmor_current_getsecid_subj 6.22% nginx [kernel.vmlinux] [k] apparmor_file_open The majority of the CPU cycles is found to be due to memory contention in atomic_fetch_add and atomic_fetch_sub operations from kref_get() and kref_put() operations on label. A part of the contention was remove with commit 2516fde1fa00 ("apparmor: Optimize retrieving current task secid"), which is part of 6.7-rc1 release. After including this commit, the scaling efficiency improved as shown below: Config Cache Domains apparmor=on apparmor=on (patched) scaling eff (%) scaling eff (%) 8C16T 1 100% 100% 16C32T 2 97% 93% 24C48T 3 94% 92% 48C96T 6 88% 88% 96C192T 12 65% 79% However, the scaling efficiency impact is still significant even after including the commit. In addition, the performance impact is even higher when we move to >192 CPUs. In addition, the memory contention impact would increase whem there is a high frequency of label update operations and labels are marked stale more frequently. This patch adds a mechanism to manage reclaim of apparmor labels, when they are working in percpu mode. Using percpu refcount for apparmor label refcount helps solve the throughput scalability drop problem seen on nginx. Config Cache Domains apparmor=on (percpuref) scaling eff (%) 8C16T 1 100% 16C32T 2 96% 24C48T 3 94% 48C96T 6 93% 96C192T 12 90% Below is the sequence of operations, which shows the refcount management with this approach: 1. During label initialization, the percpu ref is initialized in atomic mode. This is done to ensure that, for cases where the label hasn't gone live (->ns isn't assigned), mostly during initialization error paths. 2. Labels are switched to percpu mode at various points - when a label is added to labelset tree, when a unconfined profile has been assigned a namespace. 3. As part of the initial prototype, only the in tree labels are managed by the kworker. These labels are added to a lockless list. The unconfined labels invoke a percpu_ref_kill() operation when the namespace is destroyed. 4. The kworker does a periodic scan of all the labels in the llist. It does below sequence of operations: a. Enqueue a dummy node to mark the start of scan. This dummy node is used as start point of scan and ensures that we there is no additional synchronization required with new label node additions to the llist. Any new labels will be processed in next run of the kworker. SCAN START PTR | v +----------+ +------+ +------+ +------+ | | | | | | | | | head ------> dummy|--->|label |--->| label|--->NULL | | | node | | | | | +----------+ +------+ +------+ +------+ New label addition: SCAN START PTR | v +----------+ +------+ +------+ +------+ +------+ | | | | | | | | | | | head |--> label|--> dummy|--->|label |--->| label|--->NULL | | | | | node | | | | | +----------+ +------+ +------+ +------+ +------+ b. Traverse through the llist, starting from dummy->next. If the node is a dummy node, mark it free. If the node is a label node, do, i) Switch the label ref to atomic mode. The ref switch wait for the existing percpu_ref_get() and percpu_ref_put() operations to complete, by waiting for a RCU grace period. Once the switch is complete, from this point onwards, any percpu_ref_get(), percpu_ref_put() operations use atomic operations. ii) Drop the initial reference, which was taken while adding the label node to the llist. iii) Use a percpu_ref_tryget() increment operation on the ref, to see if we dropped the last ref count. if we dropped the last count, we remove the node from the llist. All of these operations are done inside a RCU critical section, to avoid race with the release operations, which can potentially trigger, as soon as we drop the initial ref count. iv) If we didn't drop the last ref, switch back the counter to percpu mode. Signed-off-by: Neeraj Upadhyay --- security/apparmor/include/label.h | 3 + security/apparmor/lsm.c | 145 ++++++++++++++++++++++++++++++ 2 files changed, 148 insertions(+) diff --git a/security/apparmor/include/label.h b/security/apparmor/include/label.h index 4b29a4679c74..0fc4879930dd 100644 --- a/security/apparmor/include/label.h +++ b/security/apparmor/include/label.h @@ -125,6 +125,7 @@ struct aa_label { long flags; struct aa_proxy *proxy; struct rb_node node; + struct llist_node reclaim_node; struct rcu_head rcu; __counted char *hname; u32 secid; @@ -465,4 +466,6 @@ static inline void aa_put_proxy(struct aa_proxy *proxy) void __aa_proxy_redirect(struct aa_label *orig, struct aa_label *new); +void aa_label_reclaim_add_label(struct aa_label *label); + #endif /* __AA_LABEL_H */ diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index e490a7000408..cf8429f5c88e 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -64,6 +64,143 @@ static LIST_HEAD(aa_global_buffers); static DEFINE_SPINLOCK(aa_buffers_lock); static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers); +static struct workqueue_struct *aa_label_reclaim_wq; +static void aa_label_reclaim_work_fn(struct work_struct *work); + +/* + * Dummy llist nodes, for lockless list traveral and deletions by + * the reclaim worker, while nodes are added from normal label + * insertion paths. + */ +struct aa_label_reclaim_node { + bool inuse; + struct llist_node node; +}; + +/* + * We need two dummy head nodes for lockless list manipulations from reclaim + * worker - first dummy node will be used in current reclaim iteration; + * the second one will be used in next iteration. Next iteration marks + * the first dummy node as free, for use in following iteration. + */ +#define AA_LABEL_RECLAIM_NODE_MAX 2 + +#define AA_MAX_LABEL_RECLAIMS 100 +#define AA_LABEL_RECLAIM_INTERVAL_MS 5000 + +static LLIST_HEAD(aa_label_reclaim_head); +static struct llist_node *last_reclaim_label; +static struct aa_label_reclaim_node aa_label_reclaim_nodes[AA_LABEL_RECLAIM_NODE_MAX]; +static DECLARE_DELAYED_WORK(aa_label_reclaim_work, aa_label_reclaim_work_fn); + +void aa_label_reclaim_add_label(struct aa_label *label) +{ + percpu_ref_get(&label->count); + llist_add(&label->reclaim_node, &aa_label_reclaim_head); +} + +static bool aa_label_is_reclaim_node(struct llist_node *node) +{ + return &aa_label_reclaim_nodes[0].node <= node && + node <= &aa_label_reclaim_nodes[AA_LABEL_RECLAIM_NODE_MAX - 1].node; +} + +static struct llist_node *aa_label_get_reclaim_node(void) +{ + int i; + struct aa_label_reclaim_node *rn; + + for (i = 0; i < AA_LABEL_RECLAIM_NODE_MAX; i++) { + rn = &aa_label_reclaim_nodes[i]; + if (!rn->inuse) { + rn->inuse = true; + return &rn->node; + } + } + + return NULL; +} + +static void aa_label_put_reclaim_node(struct llist_node *node) +{ + struct aa_label_reclaim_node *rn = container_of(node, struct aa_label_reclaim_node, node); + + rn->inuse = false; +} + +static void aa_put_all_reclaim_nodes(void) +{ + int i; + + for (i = 0; i < AA_LABEL_RECLAIM_NODE_MAX; i++) + aa_label_reclaim_nodes[i].inuse = false; +} + +static void aa_label_reclaim_work_fn(struct work_struct *work) +{ + struct llist_node *pos, *first, *head, *prev, *next; + struct llist_node *reclaim_node; + struct aa_label *label; + int cnt = 0; + bool held; + + first = aa_label_reclaim_head.first; + if (!first) + goto queue_reclaim_work; + + if (last_reclaim_label == NULL || last_reclaim_label->next == NULL) { + reclaim_node = aa_label_get_reclaim_node(); + WARN_ONCE(!reclaim_node, "Reclaim heads exhausted\n"); + if (unlikely(!reclaim_node)) { + head = first->next; + if (!head) { + aa_put_all_reclaim_nodes(); + goto queue_reclaim_work; + } + prev = first; + } else { + llist_add(reclaim_node, &aa_label_reclaim_head); + prev = reclaim_node; + head = prev->next; + } + } else { + prev = last_reclaim_label; + head = prev->next; + } + + last_reclaim_label = NULL; + llist_for_each_safe(pos, next, head) { + /* Free reclaim node, which is present in the list */ + if (aa_label_is_reclaim_node(pos)) { + prev->next = pos->next; + aa_label_put_reclaim_node(pos); + continue; + } + + label = container_of(pos, struct aa_label, reclaim_node); + percpu_ref_switch_to_atomic_sync(&label->count); + rcu_read_lock(); + percpu_ref_put(&label->count); + held = percpu_ref_tryget(&label->count); + if (!held) + prev->next = pos->next; + rcu_read_unlock(); + if (!held) + continue; + percpu_ref_switch_to_percpu(&label->count); + cnt++; + if (cnt == AA_MAX_LABEL_RECLAIMS) { + last_reclaim_label = pos; + break; + } + prev = pos; + } + +queue_reclaim_work: + queue_delayed_work(aa_label_reclaim_wq, &aa_label_reclaim_work, + msecs_to_jiffies(AA_LABEL_RECLAIM_INTERVAL_MS)); +} + /* * LSM hook functions */ @@ -2277,6 +2414,14 @@ static int __init apparmor_init(void) aa_free_root_ns(); goto buffers_out; } + + aa_label_reclaim_wq = alloc_workqueue("aa_label_reclaim", + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0); + WARN_ON(!aa_label_reclaim_wq); + if (aa_label_reclaim_wq) + queue_delayed_work(aa_label_reclaim_wq, &aa_label_reclaim_work, + AA_LABEL_RECLAIM_INTERVAL_MS); + security_add_hooks(apparmor_hooks, ARRAY_SIZE(apparmor_hooks), &apparmor_lsmid); From patchwork Wed Jan 10 11:18:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516005 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2054.outbound.protection.outlook.com [40.107.223.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C49133AC1B; Wed, 10 Jan 2024 11:22:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="XhgXFvMT" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Wb8VQJ7ghzzhy3jqXG+fRpKCQmEljT+fkQw4JOqfqcMxE6jD+oektfSYE+rb1K7374R80fTYeg4IAGbnd2A1VGSEdQGyWZ8mC/pzRfB+54TB41zyJNt8IyNdZJhP8qbxXHot7CVbEwcI4rLTW0AhEcdPMngOzgW4z+E6kUWqCOfkCRrfiwiUHsecA/Uw8WvJzs1sY4UbhC7+O3zzDzYBVrvP/OdgKljIyYR0B9H0MfKCRQp7EefYa1XY749vgOQVF8nwOZcgh8PbFoqwkm+zVkzelNDH98ClFf+vBqt6z45tJxnDQacB7bkoTIUFDCrHlOaNZCIuqLsOT16hxX+Eqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0rr1bPkkatKxkHaiM++UqjLMrfR19DsIFy2nmaWSHlo=; b=LEpmWi3HAf77LLR00WhSxvFzWykUnW9inXgo+LI8WvNbT137SVCdaJtVioSaYeG5kMB5tE+hCz5gRoj9vU6Ynb4zlSgXuYgC9cqLFnI2ICOAvxWHfd8dWoqy5duvg6b6JsGvuSJ70ILpTWBSZt40G3eBmCMiPhvCfYZ2Khh9OAiGnG8mU6Wq+rGVUuf0dDQKjVCybCPw/TxYb0RPujn+5eOwo4gJ+TdfTV/KFyHxgXo5W1H9Sh2ptA1biJKRaKly2bJrw1BAy2qcTp8ebP2v69R/YCvK2imD0JGsBWrBX5eSm3Qg8wuRjqM3j+UxI3rhsVIP6qqlb0ke1Mj3X7B7Cw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0rr1bPkkatKxkHaiM++UqjLMrfR19DsIFy2nmaWSHlo=; b=XhgXFvMTfro+qzWKknwEr/vazmrW6MF1RTMv4rN1cOPmr/sxwBak2PI9Q5eI6BvJ4l6dG1ri8ZDzhuvKLmpyHhOp0PwKVfL9edWBlmVo+Bvme4V9xmsgLpmT9NMt996vTrjex89h8gkqpbGYp+YdgFWoo0vS78OWUEcyCoS0Qxc= Received: from CY5PR19CA0005.namprd19.prod.outlook.com (2603:10b6:930:15::27) by DM6PR12MB4878.namprd12.prod.outlook.com (2603:10b6:5:1b8::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18; Wed, 10 Jan 2024 11:22:24 +0000 Received: from CY4PEPF0000EE37.namprd05.prod.outlook.com (2603:10b6:930:15:cafe::eb) by CY5PR19CA0005.outlook.office365.com (2603:10b6:930:15::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:22:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE37.mail.protection.outlook.com (10.167.242.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:22:24 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:22:19 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 5/9] apparmor: Switch intree labels to percpu mode Date: Wed, 10 Jan 2024 16:48:52 +0530 Message-ID: <20240110111856.87370-5-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE37:EE_|DM6PR12MB4878:EE_ X-MS-Office365-Filtering-Correlation-Id: ece1ef0e-692e-47e8-8735-08dc11ce6b99 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xIgFn61mU9HrI1F0+Il5rxpxkDzhX3MdDgvh7RBt4dkQdiK3CZWd9ZGBm1v9hscAMAZvEJFiHEZdoaSBMExMTruBunO/p7H859/tVEe3vud5JEvbnZDkddrcDOWmxRDPkcx1nTXJ4GIGAaC6TZbtgrGjcvxpwgzE01aBPoUp2yI21ajr3XbXRJf9Z8LebtWgtW//OxxlosRUCEhH7oTVN10MY30sNCICE1V/hKBB40TIKIZ3aTOfhEkD9olXEOXq5V2N8r91hAT7wSRrYJOeuIo7MCkVhFyxjFNHz3gC0Oycwv4dvv64q64CyR5vf2x185JpWg2rpEr13M2f4wjAO9lcswwPwsYLgKxcmHPSeJMtrT50z9yMDfMHSLrE7cuZ/6mZT2NSK8/dZlhK9IqAHkd3E6FNYtFGPStNfeqZEwv1ph7GBKMsirJyN3mKdm+ONMviZjECC+X/OCl+mLFxJ0rc8mjKopkbOWRK9optNd4iQpNGyZFgr9hlgE94b5Ukw1sxd6FfOAWWPJ1A0x9zDoC7SvwUMqaK/S0Rn/BH+gBJK5B8MYFVYsNkJfzmoTLF7i9mzEXc0RPOq/zk8/k5pD1paM2Pyp0NEhpe7vUyw5BpCgv894Q/4i5YT4Rx8c2Yju4XECir5fGDN2IMk/7Lfhfr+UdNXa3LRrFk9Xlh8tUimvYnZEOd+8a2RkOfSBFsjRyQBD4egLY2Rn9hPY+veDYLuhLrF0wg2VXJ2usSZTDNvNfq3aUk9w4MtxWAwwRs9frKAYtZlIxktbIkAOveFw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(136003)(396003)(376002)(346002)(39860400002)(230922051799003)(82310400011)(451199024)(186009)(1800799012)(64100799003)(46966006)(40470700004)(36840700001)(40480700001)(40460700003)(16526019)(478600001)(7696005)(426003)(336012)(6666004)(2616005)(1076003)(26005)(82740400003)(81166007)(86362001)(356005)(36756003)(41300700001)(70586007)(4326008)(83380400001)(36860700001)(5660300002)(4744005)(7416002)(2906002)(47076005)(8936002)(8676002)(110136005)(70206006)(54906003)(316002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:22:24.4832 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ece1ef0e-692e-47e8-8735-08dc11ce6b99 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE37.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4878 Now that we have an infrastructure to reclaim the percpu labels, switch intree labels to percpu mode, to improve the ref scalability. Signed-off-by: Neeraj Upadhyay --- security/apparmor/label.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index aa9e6eac3ecc..1299262f54e1 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -710,6 +710,8 @@ static struct aa_label *__label_insert(struct aa_labelset *ls, rb_link_node(&label->node, parent, new); rb_insert_color(&label->node, &ls->root); label->flags |= FLAG_IN_TREE; + percpu_ref_switch_to_percpu(&label->count); + aa_label_reclaim_add_label(label); return aa_get_label(label); } From patchwork Wed Jan 10 11:18:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516010 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2086.outbound.protection.outlook.com [40.107.244.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73A1447F47; Wed, 10 Jan 2024 11:23:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="g4pmwG90" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CPpUsjQiqGMieOGAXLdE6cLaxzQiDVh7mkdOwNr0z7mdvU2vUfRjRNDQ8UaAuSBo/zwMPreqN3W5FqUOMpMIPkpxCgIvYtYuKaMTWY/0WU+QcUZVyRgtYMifgnUcUIDmk3AcbxqMy1pqAC03wKleGs9w/XRuiH8klEuXuniZoEyAMEvcilOKhyFgYydYJ1Ag/BJw0Ik4aWBZ2OYYJFAetc8OzgcfusD5YXRJJ8W5+5J6iA9ChL/dw24oxrAPAGLQerC1kniN01Zi11sLJ4BFT3498hVtmXuCpPyWAmUQfwmguYCPJbq2PXo4iLpJWadOb+CxQtTli8l3XQF4Vcf0dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ET+kDAsVbWM0r5RQud/v2d7x8jOqCw38OrCzb1jkP5w=; b=ZojKk/dtnCUQsGj1oKp9E7xeMrp+mACkcMpGcf2saymKAvo7Mpf5dTDVGYbRbzNnKbAjlhJfsFDiRIcQHezeoDalC508NBMpsigQMITQDy0PABGShcalJ8lvquSkmWoqmGx8rQ9m6NVxmMDBvoK/3/vaLPYeJs7jAM6SsN7V7NvLjeshrJzFwOq3MDrgavajhIzyo2ZLSow/1Jos07fTHEEhF6oTS2UpBZXcQ+zFpAr5831BMUOq4yZd2svorRu3+3z4EWMSdHsOMTSCMe1Avv4XTQkS/sx4B6M45cUb8AhDe0quQgNPr6ilVTDoM1c9cSoOWRuCv4cS4eEIoPtthg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ET+kDAsVbWM0r5RQud/v2d7x8jOqCw38OrCzb1jkP5w=; b=g4pmwG900ELb/3ZglIY/qJGBs1zS2apIXPx1gSokhHbuHzVPoMemdwhb5vJbv7XqSqeQZNIH2uT2nExbR4EK4+uMPyf5cnLbYfjJW49yUgst3jCjfhzYsV9RVbR/o5MXYISIQNheECSDjbUpHiH+W8NERSvyNo7P2DlDaXjXvhE= Received: from CYXPR02CA0079.namprd02.prod.outlook.com (2603:10b6:930:ce::27) by SJ2PR12MB9087.namprd12.prod.outlook.com (2603:10b6:a03:562::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.23; Wed, 10 Jan 2024 11:23:18 +0000 Received: from CY4PEPF0000EE36.namprd05.prod.outlook.com (2603:10b6:930:ce:cafe::6a) by CYXPR02CA0079.outlook.office365.com (2603:10b6:930:ce::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17 via Frontend Transport; Wed, 10 Jan 2024 11:23:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE36.mail.protection.outlook.com (10.167.242.42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:23:18 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:23:12 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 6/9] apparmor: Initial prototype for optimizing ref switch Date: Wed, 10 Jan 2024 16:48:53 +0530 Message-ID: <20240110111856.87370-6-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE36:EE_|SJ2PR12MB9087:EE_ X-MS-Office365-Filtering-Correlation-Id: c956257c-fb59-40d2-1f1e-08dc11ce8ba5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NJdWUJsgoWK6UC+YzPNp16ME1ZmGVMpWamoItPPkYJB8NfFby2lXpDRrXj2owhv+P+9WBLDk7+mZXrXfoQ6a3JRuSMCJx9/2wGTBxAl1gA4dWbESyhQRd2x649IuiKefE8MzLURBitm7Ajafwv8yYl5drOJ7eZktwEYaBLvFWr0qoqNmmyQM1QN3DBWKcidZcQSggtAwYrN9lKFod3APBowdr3sa2ryldCKTHJNChcje8NHTYXs3a9Lm2+HVWTT1LkiAKSCpApMNoiWFHrHAIK4/uayLPJYp5hKr0d9Li97kRTpyHH/evcGELDNns6ZdQOeYwXsfdBBI13o+dJylKqNpRc/9JOcVOVyE71HNEBB3ckBE4gMt9lvKQBFsnkRco34UplsmzTjdtGU+dx/NtOTfwU3KTc9zzRp1WELDEAF1Xf9pGmCkKeiPbccLHwNiDOQuFz+5LOzIEO7xAvXTbDBJnYN2/MTJxrW+p4AogRjY+4aX6hpxy2XGKxSsDQAqp8dJDfrZ2Zldi+9Y5/JtPAR66aOyNhorK6CiFIebojlYgatu/okwfcYLy/RCi4E8DO2dD/Gh1LqXF29odCAH9+yw0MrLGymG9p4AranhBY6gAC9qR6xx1q0xVcFhNMJQr5XTxvCv+73uI1myLqZZfVI5vwN6QrV1We+i/jTLEcte+4CuB2IIxtJ8U3Qj2oO8RiGWD1jaB+zGp0TQyhN/XKB1fOHvy5f9PBKpFt74fWVixOExgxRK8AK6pudu6G/u19iqGZPKpltM/KR3y948hg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(376002)(346002)(396003)(136003)(39860400002)(230922051799003)(1800799012)(451199024)(82310400011)(64100799003)(186009)(36840700001)(40470700004)(46966006)(40480700001)(40460700003)(83380400001)(356005)(41300700001)(70586007)(70206006)(86362001)(36756003)(81166007)(36860700001)(47076005)(82740400003)(16526019)(1076003)(336012)(26005)(2616005)(426003)(54906003)(2906002)(7416002)(478600001)(110136005)(6666004)(316002)(5660300002)(8676002)(4326008)(30864003)(8936002)(7696005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:23:18.2463 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c956257c-fb59-40d2-1f1e-08dc11ce8ba5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE36.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB9087 This patches adds a prototype for optimizing the atomic window, during label scan by switching to an immortal percpu ref. Below is the sequence of operations to do this: 1. Ensure that both immortal ref and label ref are in percpu mode. Reinit the immortal ref in percpu mode. Swap percpu and atomic counters of label refcount and immortal ref percpu-ref +-------------------+ +-------+ | percpu-ctr-addr1 | | label | --------->|-------------------| +----------------+ +-------+ | data |--->| Atomic counter1| +-------------------+ +----------------+ +-------+ +-------------------+ |ImmLbl |---------->| percpu-ctr-addr2 | +----------------+ +-------+ |-------------------|--->| Atomic counter2| | data | +----------------+ +-------------------+ label ->percpu-ctr-addr = percpu-ctr-addr2 ImmLbl ->percpu-ctr-addr = percpu-ctr-addr1 label ->data->count = Atomic counter2 ImmLbl ->data->count = Atomic counter1 2. Check the counters collected in immortal label, by switch it to atomic mode. 3. If the count is 0, do, a. Switch immortal counter to percpu again, giving it an initial count of 1. b. Swap the label and immortal counters again. The immortal ref now has the counter values from new percpu ref get and get operations on the label ref, from the point when we did the initial swap operation. c. Transfer the percpu counts in immortal ref to atomic counter of label percpu refcount. d. Kill immortal ref, for reinit on next iteration. e. Switch label percpu ref to atomic mode. f. If the counter is 1, drop the initial ref. 4. If the count is not 0, terminate the operations and re-swap the counters. a. Switch immortal counter to percpu again, giving it an initial count of 1. b. Swap the label and immortal counters again. The immortal ref now has the counter values from new percpu ref get and get operations on the label ref, from the point when we did the initial swap operation. c. Transfer the percpu counts in immortal ref to atomic counter of label percpu refcount. d. Kill immortal ref, for reinit on next iteration. Using this approach, we ensure that, label ref users do not switch to atomic mode, while there are active references on the label. However, this approach requires multiple percpu ref mode switches and adds high overhead and complexity to the scanning code. Signed-off-by: Neeraj Upadhyay --- include/linux/percpu-refcount.h | 2 + lib/percpu-refcount.c | 93 +++++++++++++++++++++++++++++ security/apparmor/lsm.c | 101 ++++++++++++++++++++++++++++---- 3 files changed, 185 insertions(+), 11 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index d73a1c08c3e3..9e30c458cc00 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -131,6 +131,8 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, void percpu_ref_resurrect(struct percpu_ref *ref); void percpu_ref_reinit(struct percpu_ref *ref); bool percpu_ref_is_zero(struct percpu_ref *ref); +void percpu_ref_swap_percpu_sync(struct percpu_ref *ref1, struct percpu_ref *ref2); +void percpu_ref_transfer_percpu_count(struct percpu_ref *ref1, struct percpu_ref *ref2); /** * percpu_ref_kill - drop the initial ref diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 668f6aa6a75d..36814446db34 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -477,3 +477,96 @@ void percpu_ref_resurrect(struct percpu_ref *ref) spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } EXPORT_SYMBOL_GPL(percpu_ref_resurrect); + +static void percpu_ref_swap_percpu_rcu(struct rcu_head *rcu) +{ + struct percpu_ref_data *data = container_of(rcu, + struct percpu_ref_data, rcu); + struct percpu_ref *ref = data->ref; + + data->confirm_switch(ref); + data->confirm_switch = NULL; + wake_up_all(&percpu_ref_switch_waitq); + +} + +static void __percpu_ref_swap_percpu(struct percpu_ref *ref, percpu_ref_func_t *confirm_switch) +{ + ref->data->confirm_switch = confirm_switch ?: + percpu_ref_noop_confirm_switch; + call_rcu_hurry(&ref->data->rcu, + percpu_ref_swap_percpu_rcu); +} + +/** + * percpuref_swap_percpu_sync - Swap percpu counter of one ref with other + * @ref1: First perpcu_ref to swap the counter + * @ref2: Second percpu_ref for counter swap + */ +void percpu_ref_swap_percpu_sync(struct percpu_ref *ref1, struct percpu_ref *ref2) +{ + unsigned long __percpu *percpu_count; + unsigned long flags; + struct percpu_ref_data *data1 = ref1->data; + struct percpu_ref_data *data2 = ref2->data; + unsigned long percpu_cnt_ptr1 = ref1->percpu_count_ptr; + unsigned long percpu_cnt_ptr2 = ref2->percpu_count_ptr; + atomic_long_t count1 = ref1->data->count; + atomic_long_t count2 = ref2->data->count; + + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + wait_event_lock_irq(percpu_ref_switch_waitq, + !data1->confirm_switch && !data2->confirm_switch, + percpu_ref_switch_lock); + if (!__ref_is_percpu(ref1, &percpu_count) || + !__ref_is_percpu(ref2, &percpu_count)) { + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + return; + } + WRITE_ONCE(ref1->percpu_count_ptr, percpu_cnt_ptr2); + WRITE_ONCE(ref2->percpu_count_ptr, percpu_cnt_ptr1); + + __percpu_ref_swap_percpu(ref1, NULL); + __percpu_ref_swap_percpu(ref2, NULL); + ref1->data->count = count2; + ref2->data->count = count1; + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + wait_event(percpu_ref_switch_waitq, !ref1->data->confirm_switch && + !ref2->data->confirm_switch); +} + +/** + * percpu_ref_transfer_percpu_count - Transfer percpu counts of one ref to other + * @ref1: perpcu_ref to transfer the counters to + * @ref2: percpu_ref to transfer the counters from + * + * The per cpu counts of ref2 are transferred to the atomic counter of ref1. + * The ref2 is expected to be inactive. + */ +void percpu_ref_transfer_percpu_count(struct percpu_ref *ref1, struct percpu_ref *ref2) +{ + unsigned long __percpu *percpu_count = percpu_count_ptr(ref2); + struct percpu_ref_data *data1 = ref1->data; + struct percpu_ref_data *data2 = ref2->data; + unsigned long count = 0; + unsigned long flags; + int cpu; + + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + wait_event_lock_irq(percpu_ref_switch_waitq, + !data1->confirm_switch && !data2->confirm_switch, + percpu_ref_switch_lock); + + if (!__ref_is_percpu(ref1, &percpu_count) || + !__ref_is_percpu(ref2, &percpu_count)) { + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + return; + } + + for_each_possible_cpu(cpu) { + count += *per_cpu_ptr(percpu_count, cpu); + *per_cpu_ptr(percpu_count, cpu) = 0; + } + atomic_long_add((long)count, &ref1->data->count); + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); +} diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index cf8429f5c88e..d0d4ebad1e26 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -92,6 +92,7 @@ static LLIST_HEAD(aa_label_reclaim_head); static struct llist_node *last_reclaim_label; static struct aa_label_reclaim_node aa_label_reclaim_nodes[AA_LABEL_RECLAIM_NODE_MAX]; static DECLARE_DELAYED_WORK(aa_label_reclaim_work, aa_label_reclaim_work_fn); +static struct percpu_ref aa_label_reclaim_ref; void aa_label_reclaim_add_label(struct aa_label *label) { @@ -135,14 +136,18 @@ static void aa_put_all_reclaim_nodes(void) for (i = 0; i < AA_LABEL_RECLAIM_NODE_MAX; i++) aa_label_reclaim_nodes[i].inuse = false; } +static void aa_release_reclaim_ref_noop(struct percpu_ref *ref) +{ +} static void aa_label_reclaim_work_fn(struct work_struct *work) { struct llist_node *pos, *first, *head, *prev, *next; + static bool reclaim_ref_dead_once; struct llist_node *reclaim_node; struct aa_label *label; int cnt = 0; - bool held; + bool held, ref_is_zero; first = aa_label_reclaim_head.first; if (!first) @@ -178,16 +183,72 @@ static void aa_label_reclaim_work_fn(struct work_struct *work) } label = container_of(pos, struct aa_label, reclaim_node); - percpu_ref_switch_to_atomic_sync(&label->count); - rcu_read_lock(); - percpu_ref_put(&label->count); - held = percpu_ref_tryget(&label->count); - if (!held) - prev->next = pos->next; - rcu_read_unlock(); - if (!held) - continue; - percpu_ref_switch_to_percpu(&label->count); + if (reclaim_ref_dead_once) + percpu_ref_reinit(&aa_label_reclaim_ref); + + /* + * Switch counters of label ref and reclaim ref. + * Label's refcount becomes 1 + * Percpu refcount has the current refcount value + * of the label percpu_ref. + */ + percpu_ref_swap_percpu_sync(&label->count, &aa_label_reclaim_ref); + + /* Switch reclaim ref to percpu, to check for 0 */ + percpu_ref_switch_to_atomic_sync(&aa_label_reclaim_ref); + + /* + * Release a count (original label percpu ref had an extra count, + * from the llist addition). + * When all percpu references have been released, this should + * be the initial count, which gets dropped. + */ + percpu_ref_put(&aa_label_reclaim_ref); + /* + * Release function of reclaim ref is noop; we store the result + * for later processing after common code. + */ + if (percpu_ref_is_zero(&aa_label_reclaim_ref)) + ref_is_zero = true; + + /* + * Restore back initial count. Switch reclaim ref to + * percpu, for switching back the label percpu and + * atomic counters. + */ + percpu_ref_get(&aa_label_reclaim_ref); + percpu_ref_switch_to_percpu(&aa_label_reclaim_ref); + /* + * Swap the refs again. Label gets all old counts + * in its atomic counter after this operation. + */ + percpu_ref_swap_percpu_sync(&label->count, &aa_label_reclaim_ref); + + /* + * Transfer the percpu counts, which got added, while this + * switch was going on. The counters are accumulated into + * the label ref's atomic counter. + */ + percpu_ref_transfer_percpu_count(&label->count, &aa_label_reclaim_ref); + + /* Kill reclaim ref for reinitialization, for next iteration */ + percpu_ref_kill(&aa_label_reclaim_ref); + reclaim_ref_dead_once = true; + + /* If refcount of label ref was found to be 0, reclaim it now! */ + if (ref_is_zero) { + percpu_ref_switch_to_atomic_sync(&label->count); + rcu_read_lock(); + percpu_ref_put(&label->count); + held = percpu_ref_tryget(&label->count); + if (!held) + prev->next = pos->next; + rcu_read_unlock(); + if (!held) + continue; + percpu_ref_switch_to_percpu(&label->count); + } + cnt++; if (cnt == AA_MAX_LABEL_RECLAIMS) { last_reclaim_label = pos; @@ -2136,6 +2197,16 @@ static int __init set_init_ctx(void) return 0; } +static int __init clear_init_ctx(void) +{ + struct cred *cred = (__force struct cred *)current->real_cred; + + set_cred_label(cred, NULL); + aa_put_label(ns_unconfined(root_ns)); + + return 0; +} + static void destroy_buffers(void) { union aa_buffer *aa_buf; @@ -2422,6 +2493,14 @@ static int __init apparmor_init(void) queue_delayed_work(aa_label_reclaim_wq, &aa_label_reclaim_work, AA_LABEL_RECLAIM_INTERVAL_MS); + if (!percpu_ref_init(&aa_label_reclaim_ref, aa_release_reclaim_ref_noop, + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { + AA_ERROR("Failed to allocate label reclaim percpu ref\n"); + aa_free_root_ns(); + clear_init_ctx(); + goto buffers_out; + } + security_add_hooks(apparmor_hooks, ARRAY_SIZE(apparmor_hooks), &apparmor_lsmid); From patchwork Wed Jan 10 11:18:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516011 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2045.outbound.protection.outlook.com [40.107.92.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78F513AC1B; Wed, 10 Jan 2024 11:23:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="vqrEIsEY" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Yhy+gsqAtwdW+Tx8GKcIp6Rrs8PW+m3jn97oaAYsS7676tbqrvsc80JFdiYzU6LJcDdI7lst+9UG82zoWXoeS6CY1E/zwBTO6SRVsyJu2gpuoPOkPixPHFQewwzusmfvBzoyvXewHQp5VE5CUleoBxqkMK1TrlRQQs2StoWtnGbdqYmCumnOYLPisY0Mc30STo9xU4bkvj1nfCR2gFEoZkwP/TYwlV1Qst0okfFQ0AzmbXdli5XjPv6m2qF8EruzdIgixIiRV2BDXKphx7DP88LAQVjyC6WnXj1pJz0m/SozJ3dCzb4O6jN9ad7e5KdKH5DgZvP+mb3d/lBQAwQ38A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GKOfcYzr6aIMXZfzvl1jxcNmRnh0a8Ahz66pFK708H0=; b=lryO3bryTCpOQJUBC17QF4T3t7C9YnN2nLUkXDxxUphO+BeASsXLEggNUk+C4p3V02xIGY6uG2S0qWNlKvueo5oNyCfBBtMvcC2yzOE/CrvHA5D0weDM2JaaC/JIMcVKxf2FhrAxLjkjZ3GkhVQ4nbJPIKKJOFpIWRFFtA298pibtpduQ/SA0XqRXENl4I6aQCTg4kUL0hE4F8Ci6eloZQgnMNviXLQaExG7ImBrTaZ0YwNZ6J6BzryMVERSJXKgrBeiap8Wqti/SOZyNNIKhWEV14SFLUKKpQe3UfPRgDKZV9/DRt0NdNFAvdNmL5iFa7EBocLFWpzl4334guudIw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GKOfcYzr6aIMXZfzvl1jxcNmRnh0a8Ahz66pFK708H0=; b=vqrEIsEYvDdOG+Yf+mK+7jUZU+2lMAdry0NDyF1hc0Kw8jelYGbsTEU+Rr1i4o464CMOHq6Erpf82YKx5PRWoMlFa3A+5a65Px/gJT2wAA4FRMtVjuA5DGjfwuLNl7iDwCPreDuDXaDQaKQM78Ko6a88nbm0qCXljVbOu4VogyA= Received: from CY5PR15CA0119.namprd15.prod.outlook.com (2603:10b6:930:68::27) by SA1PR12MB8841.namprd12.prod.outlook.com (2603:10b6:806:376::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.23; Wed, 10 Jan 2024 11:23:51 +0000 Received: from CY4PEPF0000EE37.namprd05.prod.outlook.com (2603:10b6:930:68:cafe::91) by CY5PR15CA0119.outlook.office365.com (2603:10b6:930:68::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.17 via Frontend Transport; Wed, 10 Jan 2024 11:23:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE37.mail.protection.outlook.com (10.167.242.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:23:51 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:23:45 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 7/9] percpu-rcuref: Add basic infrastructure Date: Wed, 10 Jan 2024 16:48:54 +0530 Message-ID: <20240110111856.87370-7-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE37:EE_|SA1PR12MB8841:EE_ X-MS-Office365-Filtering-Correlation-Id: 51909158-1f67-4bf5-791a-08dc11ce9f85 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2MaRpC1axWmVwgDQfK1AlMDYkNBjk4tAsF7Q4ce5F7GglhpDkJ7zWtACQeypxwUBIYdNwL+SgLJ0t+4XDu7xKddWw4cQWi0mpxPuupX7xB54rF4iH/ntBRKpHl0xdaguhOUS7zWAjQK/zOHwrI7OjkNw64DTJNGcNMeV7VCBtjQw0Ld6Q+9ErlY8TgChOzvmGisV6TQq182whuo3sGpdHATfq7t0su7mvC1vlDG6SdCgZkaMsTEzWpQ1ilt3EAzfGaTm3ADpWBHXKuXMRNLePn/6bahTxcITX09dpreyUi6gPsULFkiRo89nYimlp3VuO/MO7r/z40BdWNpkWuT2xXjS38uvfQYd3NjqaSO9Bx81CkzMJvY8TaV1u6TfcLRxAm6HHNV1/QzGGiOh7OpR+Hr0HJMzjiuxekgvPAM6amYuxJcn63Os8aH8nNlIT+8zcNS7yaDauh2G9BufTN0dLixbbtShXWjyyH6LXnpXI2qR4oIEKXxZsaZcKZFQaXyqOlrZeUC7C6IR8JCcL0EOuk6PKW06KjFIkzDF92Xl7Sw9V5dC1vvZEpzBBedGe5B/lJjjfFaIt82IgiX9fXAgVksp6fHBCHT9eTH/RJbHrCQuG+DqY1uqOWW2I1lZon/VINlRYm4QmbFWfGWmpXmSVMLlLtsWWKxp7qEH7Nyev96GZ/x6QRuIYdNk0fND1ZkQTSrJCXWU5vFNA1+KsTdF/zZGOqsaTd/JZx76UytCFa/wfbo0zeTzA5DJ4fXaIDcMth+PIX2gWSHzcaQWjhMtig== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(396003)(39860400002)(346002)(376002)(136003)(230922051799003)(64100799003)(82310400011)(451199024)(1800799012)(186009)(36840700001)(40470700004)(46966006)(40460700003)(40480700001)(1076003)(426003)(336012)(2616005)(478600001)(7696005)(6666004)(26005)(82740400003)(356005)(86362001)(36756003)(81166007)(2906002)(41300700001)(30864003)(47076005)(5660300002)(83380400001)(7416002)(54906003)(36860700001)(4326008)(110136005)(8936002)(316002)(70206006)(16526019)(8676002)(70586007)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:23:51.5933 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 51909158-1f67-4bf5-791a-08dc11ce9f85 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE37.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB8841 Add infrastructure for managing reclaims of percpu refs, which use RCU grace period before reclaiming the referenced objects. The refcount management of percpu rcuref is same as normal percpu refcount, with the only difference that, instead of a explicit shutdown percpu_ref_kill() operation by the user, the initial ref is managed by a kworker. The ref can be initialized to start either in managed or unmanaged mode. In managed mode, the ref is a set of percpu counters. There is an extra reference acquired for the llist node and provides the notion of initial ref in percpu refcount. During normal operation, users ref get() and put() operations increment/decrement the percpu counters. There is no check for drop-to-zero while in percpu mode. Periodically, the manager kworker thread scans all percpu rcurefs. It switches ref to centralized atomic counter mode and checks whether the object has no references left. The ref is dropped if there are no references. Otherwise, the ref is switched back to percpu mode again. During this ref scan, there is a window where ref operates in atomic mode. This window spans one RCU grace period. There is a provision to start a percpu rcuref in unmanaged mode. This is provided for cases, where there is a need to avoid dependency on kworker and RCU grace period. In addition, unmanaged mode can be used for a ref, for which the release function initially does not wait for RCU grace period, for example when the enclosing object initialization fails, and there is a rollback operation in error paths. Later, when object initialization is complete, ref can be switched to percpu managed mode. Signed-off-by: Neeraj Upadhyay --- .../admin-guide/kernel-parameters.txt | 8 + include/linux/percpu-rcurefcount.h | 115 ++++++ lib/Makefile | 2 +- lib/percpu-rcurefcount.c | 336 ++++++++++++++++++ 4 files changed, 460 insertions(+), 1 deletion(-) create mode 100644 include/linux/percpu-rcurefcount.h create mode 100644 lib/percpu-rcurefcount.c diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index e0891ac76ab3..b2536c4223c1 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4576,6 +4576,14 @@ allocator. This parameter is primarily for debugging and performance comparison. + percpu-rcurefcount.ref_scan_interval= [KNL] + Interval (in ms) between 2 scans of percpu rcu ref + managed refs. + + percpu-rcurefcount.max_ref_scan_count= [KNL] + Count of the maximum number of pcpu refs scanned during + one scan of managed refs. + pirq= [SMP,APIC] Manual mp-table setup See Documentation/arch/x86/i386/IO-APIC.rst. diff --git a/include/linux/percpu-rcurefcount.h b/include/linux/percpu-rcurefcount.h new file mode 100644 index 000000000000..6022aee1f76e --- /dev/null +++ b/include/linux/percpu-rcurefcount.h @@ -0,0 +1,115 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Percpu refcounts with RCU protected release operation. + * + * Percpu rcuref is similar to percpu refs. However, they are specialized for + * use cases, where the release of the object is protected by a RCU grace + * period. + * + * The initial ref is managed by the reclaim logic; so, users do not need to + * keep track of their initial ref. This is particularly useful, when object's + * has references active, beyond the release of the initial reference. + * + * The current implementation is just a wrapper around the percpu refcount + * implementation, to reuse the existing percpu and atomic ref switch + * management. Switching to a standalone implementation might be required + * if percpuref implementation switches to a non-rcu managed read sections. + */ + +#ifndef _LINUX_PERCPU_RCUREFCOUNT_H +#define _LINUX_PERCPU_RCUREFCOUNT_H + +#include + +struct percpu_rcuref; + +struct percpu_rcuref { + struct percpu_ref pcpu_ref; + struct llist_node node; +}; + +int __must_check percpu_rcuref_init(struct percpu_rcuref *rcuref, + percpu_ref_func_t *release, gfp_t gfp); +int __must_check percpu_rcuref_init_unmanaged(struct percpu_rcuref *rcuref, + percpu_ref_func_t *release, gfp_t gfp); +int percpu_rcuref_manage(struct percpu_rcuref *rcuref); +bool percpu_rcuref_is_zero(struct percpu_rcuref *rcuref); +void percpu_rcuref_exit(struct percpu_rcuref *rcuref); + +/** + * percpu_rcuref_get_many - increment a percpu rcuref count + * @rcuref: percpu_rcuref to get + * @nr: number of references to get + * + * Analogous to percpu_ref_get_many(). + */ +static inline void percpu_rcuref_get_many(struct percpu_rcuref *rcuref, unsigned long nr) +{ + percpu_ref_get_many(&rcuref->pcpu_ref, nr); +} + +/** + * percpu_rcuref_get - increment a percpu rcuref count + * @rcuref: percpu_rcuref to get + * + * Analogous to percpu_ref_get(). + * + */ +static inline void percpu_rcuref_get(struct percpu_rcuref *rcuref) +{ + percpu_rcuref_get_many(rcuref, 1); +} + +/** + * percpu_rcuref_tryget_many - try to increment a percpu rcuref count + * @rcuref: percpu_rcuref to try-get + * @nr: number of references to get + * + * Increment a percpu rcuref count by @nr unless its count already reached zero. + * Returns %true on success; %false on failure. + * + */ +static inline bool percpu_rcuref_tryget_many(struct percpu_rcuref *rcuref, + unsigned long nr) +{ + return percpu_ref_tryget_many(&rcuref->pcpu_ref, nr); +} + +/** + * percpu_rcuref_tryget - try to increment a percpu rcuref count + * @rcuref: percpu_rcuref to try-get + * + * Increment a percpu rcurefcount unless its count already reached zero. + * Returns %true on success; %false on failure. + * + */ +static inline bool percpu_rcuref_tryget(struct percpu_rcuref *rcuref) +{ + return percpu_rcuref_tryget_many(rcuref, 1); +} + +/** + * percpu_rcuref_put_many - decrement a percpu rcuref count + * @rcuref: percpu_rcuref to put + * @nr: number of references to put + * + * Decrement the refcount, and if 0, call the release function (which was passed + * to percpu_rcuref_init()) + */ +static inline void percpu_rcuref_put_many(struct percpu_rcuref *rcuref, unsigned long nr) +{ + percpu_ref_put_many(&rcuref->pcpu_ref, nr); +} + +/** + * percpu_rcuref_put - decrement a percpu rcuref count + * @rcuref: percpu_rcuref to put + * + * Decrement the refcount, and if 0, call the release function (which was passed + * to percpu_ref_init()) + */ +static inline void percpu_rcuref_put(struct percpu_rcuref *rcuref) +{ + percpu_rcuref_put_many(rcuref, 1); +} +#endif diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..11da2c586591 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -46,7 +46,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ list_sort.o uuid.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ - percpu-refcount.o rhashtable.o base64.o \ + percpu-refcount.o percpu-rcurefcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ generic-radix-tree.o bitmap-str.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o diff --git a/lib/percpu-rcurefcount.c b/lib/percpu-rcurefcount.c new file mode 100644 index 000000000000..d0f2d5e88f98 --- /dev/null +++ b/lib/percpu-rcurefcount.c @@ -0,0 +1,336 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include + +static LLIST_HEAD(pcpu_rcuref_head); + +/* + * The refcount management of percpu rcuref is same as + * normal percpu refcount, with the only difference that, + * instead of a explicit shutdown percpu_ref_kill() operation + * by the user, the initial ref is managed by a kworker. + * + * The ref can be initialized to start either in managed or + * unmanaged mode. In managed mode, the ref is a set of percpu + * counters. There is an extra reference acquired for the llist + * node and provides the notion of initial ref in percpu refcount. + * + * During normal operation, users ref get() and put() operations + * increment/decrement the percpu counters. There is no check + * for drop-to-zero while in percpu mode. + * + * Periodically, the manager kworker thread scans all percpu + * rcurefs. It switches ref to centralized atomic counter mode + * and checks whether the object has no references left. The ref is + * dropped if there are no references. Otherwise, the ref is switched + * back to percpu mode again. During this ref scan, there is a + * window where ref operates in atomic mode. This window spans + * one RCU grace period. + * + * There is a provision to start a percpu rcuref in unmanaged mode. + * This is provided for cases, where there is a need to avoid + * dependency on kworker and RCU grace period. In addition, + * unmanaged mode can be used for a ref, for which the release + * function initially does not wait for RCU grace period, for + * example when the enclosing object initialization fails, and + * there is a rollback operation in error paths. Later, when + * object initialization is complete, ref can be switched to + * percpu managed mode. + */ +/** + * percpu_rcuref_init - initialize a percpu rcuref count + * @rcuref: percpu_rcuref to initialize + * @release: function which will be called when refcount hits 0 + * @gfp: allocation mask to use + * + * Initializes @rcuref. @rcuref starts out in percpu mode with a refcount of 2. + * The initial ref is managed by the pcpu rcuref release worker kthread. + * The second reference is for the user. + * + * Note that @release must not sleep - it can block release of other + * pcpu rcurefs. + */ +int percpu_rcuref_init(struct percpu_rcuref *rcuref, percpu_ref_func_t *release, gfp_t gfp) +{ + int ret; + + ret = percpu_ref_init(&rcuref->pcpu_ref, release, + PERCPU_REF_ALLOW_REINIT, gfp); + if (ret) + return ret; + percpu_ref_get(&rcuref->pcpu_ref); + llist_add(&rcuref->node, &pcpu_rcuref_head); + return 0; +} +EXPORT_SYMBOL_GPL(percpu_rcuref_init); + +/** + * percpu_rcuref_init_unmanaged - initialize a percpu rcuref count in + * unmanaged (atomic) mode. + * @rcuref: percpu_rcuref to initialize + * @release: function which will be called when refcount hits 0 + * @gfp: allocation mask to use + * + * Initializes @rcuref. @rcuref starts out in unmanaged/atomic mode + * with a refcount of 1. + * The initial ref is passed to the user and ref management is + * auto, the last put operation releases the ref. + * The ref may be initialized in this mode, to avoid dependency + * on workqueue and RCU, for early boot code; and for cases where + * a ref starts as non-RCU release and switches to RCU grace period + * based release of the reference. The percpu_rcuref_manage() call + * can be used to switch this ref to managed mode, while the ref + * is active. This operation is non-reversible, and the ref remains + * in managed mode, for its lifeline, until it is released by the + * pcpu release kworker. + * + * Note that @release must not sleep - if the ref is switched to + * managed mode, it can block release of other pcpu rcurefs. + */ +int percpu_rcuref_init_unmanaged(struct percpu_rcuref *rcuref, + percpu_ref_func_t *release, gfp_t gfp) +{ + int ret; + + ret = percpu_ref_init(&rcuref->pcpu_ref, release, PERCPU_REF_INIT_ATOMIC, gfp); + if (!ret) + init_llist_node(&rcuref->node); + return ret; +} +EXPORT_SYMBOL_GPL(percpu_rcuref_init_unmanaged); + +/** + * percpu_rcuref_manage - Switch an unmanaged ref to percpu mode. + * + * @rcuref: percpu_rcuref to initialize + * @release: function which will be called when refcount hits 0 + * @gfp: allocation mask to use + * + */ +int percpu_rcuref_manage(struct percpu_rcuref *rcuref) +{ + if (WARN_ONCE(!percpu_rcuref_tryget(rcuref), "Percpu rcuref is not active\n")) + return -1; + if (WARN_ONCE(llist_on_list(&rcuref->node), "Percpu rcuref already managed\n")) { + percpu_rcuref_put(rcuref); + return -2; + } + percpu_ref_switch_to_percpu(&rcuref->pcpu_ref); + /* Ensure ordering of percpu mode switch and node scan */ + smp_mb(); + llist_add(&rcuref->node, &pcpu_rcuref_head); + return 0; +} +EXPORT_SYMBOL_GPL(percpu_rcuref_manage); + +/** + * percpu_rcuref_is_zero - test whether a percpu rcuref count reached zero + * @rcuref: percpu_rcuref to test + * + * Returns %true if @ref reached zero. + */ +bool percpu_rcuref_is_zero(struct percpu_rcuref *rcuref) +{ + return percpu_ref_is_zero(&rcuref->pcpu_ref); +} +EXPORT_SYMBOL_GPL(percpu_rcuref_is_zero); + +/** + * percpu_rcuref_exit - undo percpu_rcuref_init() + * @rcuref: percpu_rcuref to exit + * + * This function exits @rcuref. The caller is responsible for ensuring that + * @rcuref is no longer in active use. The usual places to invoke this + * function from are the @rcuref->release() callback or in init failure path + * where percpu_rcuref_init() succeeded but other parts of the initialization + * of the embedding object failed. + */ +void percpu_rcuref_exit(struct percpu_rcuref *rcuref) +{ + percpu_ref_exit(&rcuref->pcpu_ref); + init_llist_node(&rcuref->node); +} + +#define DEFAULT_PCPU_RCUREF_SCAN_INTERVAL_MS 5000 +/* Interval duration between two ref scans. */ +static ulong ref_scan_interval = DEFAULT_PCPU_RCUREF_SCAN_INTERVAL_MS; +module_param(ref_scan_interval, ulong, 0444); + +#define DEFAULT_PCPU_RCUREF_MAX_SCAN_COUNT 100 +/* Number of pcpu refs scanned in one iteration of worker execution. */ +static int max_ref_scan_count = DEFAULT_PCPU_RCUREF_MAX_SCAN_COUNT; +module_param(max_ref_scan_count, int, 0444); + +static void percpu_rcuref_release_work_fn(struct work_struct *work); + +/* + * Sentinel llist nodes, for lockless list traveral and deletions by + * the pcpu rcuref release worker, while nodes are added from normal + * from percpu_rcuref_init() and percpu_rcuref_manage(). + * + * Sentinel node marks the head of list traversal for the current + * iteration of kworker execution. + */ +struct pcpu_rcuref_sen_node { + bool inuse; + struct llist_node node; +}; + +/* + * We need two sentinel nodes for lockless list manipulations from release + * worker - first node will be used in current reclaim iteration.The second + * node will be used in next iteration. Next iteration marks the first node + * as free, for use in following iteration. + */ +#define PCPU_RCUREF_SEN_NODES_COUNT 2 + +/* Track last processed percpu rcuref node */ +static struct llist_node *last_pcu_rcuref_node; + +static struct pcpu_rcuref_sen_node + pcpu_rcuref_sen_nodes[PCPU_RCUREF_SEN_NODES_COUNT]; + +static DECLARE_DELAYED_WORK(percpu_rcuref_release_work, + percpu_rcuref_release_work_fn); + +static bool percpu_rcuref_is_sen_node(struct llist_node *node) +{ + return &pcpu_rcuref_sen_nodes[0].node <= node && + node <= &pcpu_rcuref_sen_nodes[PCPU_RCUREF_SEN_NODES_COUNT - 1].node; +} + +static struct llist_node *percpu_rcuref_get_sen_node(void) +{ + int i; + struct pcpu_rcuref_sen_node *sn; + + for (i = 0; i < PCPU_RCUREF_SEN_NODES_COUNT; i++) { + sn = &pcpu_rcuref_sen_nodes[i]; + if (!sn->inuse) { + sn->inuse = true; + return &sn->node; + } + } + + return NULL; +} + +static void percpu_rcuref_put_sen_node(struct llist_node *node) +{ + struct pcpu_rcuref_sen_node *sn = container_of(node, struct pcpu_rcuref_sen_node, node); + + sn->inuse = false; +} + +static void percpu_rcuref_put_all_sen_nodes_except(struct llist_node *node) +{ + int i; + + for (i = 0; i < PCPU_RCUREF_SEN_NODES_COUNT; i++) { + if (&pcpu_rcuref_sen_nodes[i].node == node) + continue; + pcpu_rcuref_sen_nodes[i].inuse = false; + init_llist_node(&pcpu_rcuref_sen_nodes[i].node); + } +} + +static struct workqueue_struct *percpu_rcuref_wq; + +static void percpu_rcuref_release_work_fn(struct work_struct *work) +{ + struct llist_node *pos, *first, *head, *prev, *next; + struct percpu_rcuref *rcuref; + struct llist_node *sen_node; + int count = 0; + bool held; + + first = READ_ONCE(pcpu_rcuref_head.first); + if (!first) + goto queue_release_work; + + if (last_pcu_rcuref_node == NULL || last_pcu_rcuref_node->next == NULL) { +retry_sentinel_get: + sen_node = percpu_rcuref_get_sen_node(); + /* + * All sentinel nodes are in use? This should not happen, as we + * require only one sentinel for the start of list traversal and + * other sentinel node is freed during the traversal. + */ + if (WARN_ONCE(!sen_node, "Percpu RCU ref sentinel nodes exhausted\n")) { + /* Use first node as the sentinel node */ + head = first->next; + if (!head) { + struct llist_node *ign_node = NULL; + /* + * We exhausted sentinel nodes. However, there aren't + * enough nodes in the llist. So, we have leaked + * sentinel nodes. Reclaim sentinels and retry. + */ + if (percpu_rcuref_is_sen_node(first)) + ign_node = first; + percpu_rcuref_put_all_sen_nodes_except(ign_node); + goto retry_sentinel_get; + } + prev = first; + } else { + llist_add(sen_node, &pcpu_rcuref_head); + prev = sen_node; + head = prev->next; + } + } else { + prev = last_pcu_rcuref_node; + head = prev->next; + } + + last_pcu_rcuref_node = NULL; + llist_for_each_safe(pos, next, head) { + /* Free sentinel node which is present in the list */ + if (percpu_rcuref_is_sen_node(pos)) { + prev->next = pos->next; + percpu_rcuref_put_sen_node(pos); + continue; + } + + rcuref = container_of(pos, struct percpu_rcuref, node); + percpu_ref_switch_to_atomic_sync(&rcuref->pcpu_ref); + /* + * Drop the ref while in RCU read critical section, to + * prevent obj free while we manipulating node. + */ + rcu_read_lock(); + percpu_ref_put(&rcuref->pcpu_ref); + held = percpu_ref_tryget(&rcuref->pcpu_ref); + if (!held) { + prev->next = pos->next; + init_llist_node(pos); + } + rcu_read_unlock(); + if (!held) + continue; + percpu_ref_switch_to_percpu(&rcuref->pcpu_ref); + count++; + if (count == max_ref_scan_count) { + last_pcu_rcuref_node = pos; + break; + } + prev = pos; + } + +queue_release_work: + queue_delayed_work(percpu_rcuref_wq, &percpu_rcuref_release_work, + ref_scan_interval); +} + +static __init int percpu_rcuref_setup(void) +{ + percpu_rcuref_wq = alloc_workqueue("percpu_rcuref", + WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0); + if (!percpu_rcuref_wq) + return -ENOMEM; + + queue_delayed_work(percpu_rcuref_wq, &percpu_rcuref_release_work, + ref_scan_interval); + return 0; +} +early_initcall(percpu_rcuref_setup); From patchwork Wed Jan 10 11:18:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516012 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2040.outbound.protection.outlook.com [40.107.94.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 816BC47F73; Wed, 10 Jan 2024 11:24:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="5azIPoKK" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=amWZMkyVANAxQpBUYnispvfWqX//xVpVgX6kAHPI/fyGR44G3TWSUxWXHDDJRJ+EeEJDXXQChtOnpljdSzfmcUqza2i9Wjpj/hZTbUhdpfPjStLtSCk1C52avIzZhxf797eeq3XBD+6yoDigYcnDH1ci7xuxpC4oiw91jIt5ANf/HD2F0aTNSDC7xSlhf99noirCII1UgqZkvtDHVNotNVQt/yYri+imOM9pR3drzjiEMjhXjSF4L6X2bmzLBg9OsK6MTFaj14lipK15eIh2gWKXBEtVfwwgKtlxJ1LfAi22n35a/pko99EoxGs+tqUHF2VDA1f/WIVPOqvWZIKQzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UltRSwdK6JIZBcVF/T6VTIZcForwpIUmoqVOL1VNIKk=; b=AaQqY8Uphv2qGtXnYOmgdbbqvj4ckljHKk+ZBY+OWT6C8ZigJvtA1eBr7oJjPxYBgjW4Pbr6flAKlj5F3CYguty6vyzzOkoRzMBW5eNdQXZN8jeSRkD0k48XFQmN4sUZ6bvB6pQZMYGqKm/scLG9tpDzLV9j8om37OT7uW8ZfjUQW4Z2zBsDPvs0j3u0Ig4hDwgGxgjqgdbWqa783Iz7RrK2yG/gerdQ+yNKfUi9uYGIhVbh1ZLVdcsi8UcJUHXNi0xuIs9jLJLdEC7ThmVw88d4MkYwHwUBhmfruiKouCC2qF9gy0bRDYZ2TFoH4tKgkaEocG3z6bUkI37bZmszZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UltRSwdK6JIZBcVF/T6VTIZcForwpIUmoqVOL1VNIKk=; b=5azIPoKKAGdi+r+XXylfE/SohCcw9HK3uaAZLgtmLKfbp3wbCPRq+kkFRtdo4SHM2MrT28PJH323k4snygiykyyxQNloK/BKKGD9zsi9cXvAhbgbS0/xT+eTI0JftB5Dd0Cxti2MJsELDJbabIrIx9OrDzeRKgpqhWkH4teSTEo= Received: from CYZPR11CA0001.namprd11.prod.outlook.com (2603:10b6:930:8d::28) by PH7PR12MB9068.namprd12.prod.outlook.com (2603:10b6:510:1f4::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.23; Wed, 10 Jan 2024 11:24:20 +0000 Received: from CY4PEPF0000EE31.namprd05.prod.outlook.com (2603:10b6:930:8d:cafe::5e) by CYZPR11CA0001.outlook.office365.com (2603:10b6:930:8d::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:24:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE31.mail.protection.outlook.com (10.167.242.37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.14 via Frontend Transport; Wed, 10 Jan 2024 11:24:20 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:24:14 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 8/9] apparmor: Switch labels to percpu rcurefcount in unmanaged mode Date: Wed, 10 Jan 2024 16:48:55 +0530 Message-ID: <20240110111856.87370-8-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE31:EE_|PH7PR12MB9068:EE_ X-MS-Office365-Filtering-Correlation-Id: a32288dd-8937-4f4c-9923-08dc11ceb089 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: OEJGFVpxwvOm8kfJFGTxC7iMq5MuzkpehckbSQHPq38OZK1w8PRsbN8kQVEDqqMURpZXZQpWSe6N0o9IJxQTzVPs6OG+FItr1zrtTd+5eMzdee6C6hqNRngQiLksKVCmlwAFWAZYGqvuYcCKOvGGBMt80He2dAXimT2FqKhs6nJVnxUov4WZtuxIZXvfMN58wMGwAzZhaa6mNTppxKtrqmMdL9ClOCmWLki+2UclAYNDXAHx78iFm94RNLd1izshAPVH9mOr6KVuverb/yS24uxjJX140I2s2fFasRAiCppxM5gVefEifuS/VPFKTbGmPOruRradRZCG7nWaEKOaOt3hQlNOUsar0KbV4pRGF9C8RrLHlT2qQxxXEVoUqQBCIDy4EZ+POjRkrQSsriMjI3L62oHDmhyLgFThKjKTBzagx/QSQbXOh1e9rLglqd5MOZJPq60BY0hpcHpgKTpxchUBnQYq07bFdopes/MN+ofVOPS2XMj2RObwvQ+g/4p9vLhCsxf06WE10ZMMfvDIIIiY+gc7smEPq6P1MHeDngVH6GQECfwnUkzC4/PCCD4PzYlNclfGXOTfCt+ZueGhj0molf8aVvuoBlE+WHn4uuCkJsEqNcYsz9TUtm5HPZRIncgZ6528NH41NYVjevd9hFosKnL5LmGrJ/IDaYgS+vxglWOBsneyHg1j7R+5qLdeQYxweUo5y2oVMIeYmD4/ii+e4ryAHi58sAim887FLfMUvhYzwgkznU3WW5PltBTVaLy4jNPVAk3Puwxcb4nC8w== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(136003)(346002)(376002)(39860400002)(396003)(230922051799003)(186009)(451199024)(82310400011)(64100799003)(1800799012)(36840700001)(46966006)(40470700004)(83380400001)(316002)(426003)(1076003)(2616005)(336012)(26005)(16526019)(82740400003)(36860700001)(47076005)(8936002)(8676002)(5660300002)(30864003)(7416002)(2906002)(54906003)(478600001)(7696005)(4326008)(6666004)(110136005)(70206006)(70586007)(41300700001)(356005)(36756003)(81166007)(86362001)(40480700001)(40460700003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:24:20.0791 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a32288dd-8937-4f4c-9923-08dc11ceb089 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE31.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB9068 Replaces label kref with percpu rcurefcount. The percpu rcuref is initialized in unmanaged/atomic mode, as labels do not use RCU grace period based release for labels which do not have a namespace associated with them yet. Subsequent patch moves the managed/percpu mode, at points where rcu grace period based cleanup is guaranteed. Signed-off-by: Neeraj Upadhyay --- include/linux/percpu-refcount.h | 2 - lib/percpu-refcount.c | 93 ----------- security/apparmor/include/label.h | 14 +- security/apparmor/include/policy.h | 32 +--- security/apparmor/include/policy_ns.h | 24 --- security/apparmor/label.c | 8 +- security/apparmor/lsm.c | 224 -------------------------- security/apparmor/policy_ns.c | 6 +- 8 files changed, 15 insertions(+), 388 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 9e30c458cc00..d73a1c08c3e3 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -131,8 +131,6 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, void percpu_ref_resurrect(struct percpu_ref *ref); void percpu_ref_reinit(struct percpu_ref *ref); bool percpu_ref_is_zero(struct percpu_ref *ref); -void percpu_ref_swap_percpu_sync(struct percpu_ref *ref1, struct percpu_ref *ref2); -void percpu_ref_transfer_percpu_count(struct percpu_ref *ref1, struct percpu_ref *ref2); /** * percpu_ref_kill - drop the initial ref diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 36814446db34..668f6aa6a75d 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -477,96 +477,3 @@ void percpu_ref_resurrect(struct percpu_ref *ref) spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } EXPORT_SYMBOL_GPL(percpu_ref_resurrect); - -static void percpu_ref_swap_percpu_rcu(struct rcu_head *rcu) -{ - struct percpu_ref_data *data = container_of(rcu, - struct percpu_ref_data, rcu); - struct percpu_ref *ref = data->ref; - - data->confirm_switch(ref); - data->confirm_switch = NULL; - wake_up_all(&percpu_ref_switch_waitq); - -} - -static void __percpu_ref_swap_percpu(struct percpu_ref *ref, percpu_ref_func_t *confirm_switch) -{ - ref->data->confirm_switch = confirm_switch ?: - percpu_ref_noop_confirm_switch; - call_rcu_hurry(&ref->data->rcu, - percpu_ref_swap_percpu_rcu); -} - -/** - * percpuref_swap_percpu_sync - Swap percpu counter of one ref with other - * @ref1: First perpcu_ref to swap the counter - * @ref2: Second percpu_ref for counter swap - */ -void percpu_ref_swap_percpu_sync(struct percpu_ref *ref1, struct percpu_ref *ref2) -{ - unsigned long __percpu *percpu_count; - unsigned long flags; - struct percpu_ref_data *data1 = ref1->data; - struct percpu_ref_data *data2 = ref2->data; - unsigned long percpu_cnt_ptr1 = ref1->percpu_count_ptr; - unsigned long percpu_cnt_ptr2 = ref2->percpu_count_ptr; - atomic_long_t count1 = ref1->data->count; - atomic_long_t count2 = ref2->data->count; - - spin_lock_irqsave(&percpu_ref_switch_lock, flags); - wait_event_lock_irq(percpu_ref_switch_waitq, - !data1->confirm_switch && !data2->confirm_switch, - percpu_ref_switch_lock); - if (!__ref_is_percpu(ref1, &percpu_count) || - !__ref_is_percpu(ref2, &percpu_count)) { - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); - return; - } - WRITE_ONCE(ref1->percpu_count_ptr, percpu_cnt_ptr2); - WRITE_ONCE(ref2->percpu_count_ptr, percpu_cnt_ptr1); - - __percpu_ref_swap_percpu(ref1, NULL); - __percpu_ref_swap_percpu(ref2, NULL); - ref1->data->count = count2; - ref2->data->count = count1; - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); - wait_event(percpu_ref_switch_waitq, !ref1->data->confirm_switch && - !ref2->data->confirm_switch); -} - -/** - * percpu_ref_transfer_percpu_count - Transfer percpu counts of one ref to other - * @ref1: perpcu_ref to transfer the counters to - * @ref2: percpu_ref to transfer the counters from - * - * The per cpu counts of ref2 are transferred to the atomic counter of ref1. - * The ref2 is expected to be inactive. - */ -void percpu_ref_transfer_percpu_count(struct percpu_ref *ref1, struct percpu_ref *ref2) -{ - unsigned long __percpu *percpu_count = percpu_count_ptr(ref2); - struct percpu_ref_data *data1 = ref1->data; - struct percpu_ref_data *data2 = ref2->data; - unsigned long count = 0; - unsigned long flags; - int cpu; - - spin_lock_irqsave(&percpu_ref_switch_lock, flags); - wait_event_lock_irq(percpu_ref_switch_waitq, - !data1->confirm_switch && !data2->confirm_switch, - percpu_ref_switch_lock); - - if (!__ref_is_percpu(ref1, &percpu_count) || - !__ref_is_percpu(ref2, &percpu_count)) { - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); - return; - } - - for_each_possible_cpu(cpu) { - count += *per_cpu_ptr(percpu_count, cpu); - *per_cpu_ptr(percpu_count, cpu) = 0; - } - atomic_long_add((long)count, &ref1->data->count); - spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); -} diff --git a/security/apparmor/include/label.h b/security/apparmor/include/label.h index 0fc4879930dd..3feb3a65a00c 100644 --- a/security/apparmor/include/label.h +++ b/security/apparmor/include/label.h @@ -14,6 +14,7 @@ #include #include #include +#include #include "apparmor.h" #include "lib.h" @@ -121,11 +122,10 @@ struct label_it { * @ent: set of profiles for label, actual size determined by @size */ struct aa_label { - struct percpu_ref count; + struct percpu_rcuref count; long flags; struct aa_proxy *proxy; struct rb_node node; - struct llist_node reclaim_node; struct rcu_head rcu; __counted char *hname; u32 secid; @@ -374,7 +374,7 @@ int aa_label_match(struct aa_profile *profile, struct aa_ruleset *rules, */ static inline struct aa_label *__aa_get_label(struct aa_label *l) { - if (l && percpu_ref_tryget(&l->count)) + if (l && percpu_rcuref_tryget(&l->count)) return l; return NULL; @@ -383,7 +383,7 @@ static inline struct aa_label *__aa_get_label(struct aa_label *l) static inline struct aa_label *aa_get_label(struct aa_label *l) { if (l) - percpu_ref_get(&(l->count)); + percpu_rcuref_get(&(l->count)); return l; } @@ -403,7 +403,7 @@ static inline struct aa_label *aa_get_label_rcu(struct aa_label __rcu **l) rcu_read_lock(); do { c = rcu_dereference(*l); - } while (c && !percpu_ref_tryget(&c->count)); + } while (c && !percpu_rcuref_tryget(&c->count)); rcu_read_unlock(); return c; @@ -443,7 +443,7 @@ static inline struct aa_label *aa_get_newest_label(struct aa_label *l) static inline void aa_put_label(struct aa_label *l) { if (l) - percpu_ref_put(&l->count); + percpu_rcuref_put(&l->count); } @@ -466,6 +466,4 @@ static inline void aa_put_proxy(struct aa_proxy *proxy) void __aa_proxy_redirect(struct aa_label *orig, struct aa_label *new); -void aa_label_reclaim_add_label(struct aa_label *label); - #endif /* __AA_LABEL_H */ diff --git a/security/apparmor/include/policy.h b/security/apparmor/include/policy.h index 1e3b29ba6c03..5b2473a09103 100644 --- a/security/apparmor/include/policy.h +++ b/security/apparmor/include/policy.h @@ -329,7 +329,7 @@ static inline aa_state_t ANY_RULE_MEDIATES(struct list_head *head, static inline struct aa_profile *aa_get_profile(struct aa_profile *p) { if (p) - percpu_ref_get(&(p->label.count)); + percpu_rcuref_get(&(p->label.count)); return p; } @@ -343,7 +343,7 @@ static inline struct aa_profile *aa_get_profile(struct aa_profile *p) */ static inline struct aa_profile *aa_get_profile_not0(struct aa_profile *p) { - if (p && percpu_ref_tryget(&p->label.count)) + if (p && percpu_rcuref_tryget(&p->label.count)) return p; return NULL; @@ -363,7 +363,7 @@ static inline struct aa_profile *aa_get_profile_rcu(struct aa_profile __rcu **p) rcu_read_lock(); do { c = rcu_dereference(*p); - } while (c && !percpu_ref_tryget(&c->label.count)); + } while (c && !percpu_rcuref_tryget(&c->label.count)); rcu_read_unlock(); return c; @@ -376,31 +376,7 @@ static inline struct aa_profile *aa_get_profile_rcu(struct aa_profile __rcu **p) static inline void aa_put_profile(struct aa_profile *p) { if (p) - percpu_ref_put(&p->label.count); -} - -/** - * aa_switch_ref_profile - switch percpu-ref mode for profile @p - * @p: profile (MAYBE NULL) - */ -static inline void aa_switch_ref_profile(struct aa_profile *p, bool percpu) -{ - if (p) { - if (percpu) - percpu_ref_switch_to_percpu(&p->label.count); - else - percpu_ref_switch_to_atomic_sync(&p->label.count); - } -} - -/** - * aa_kill_ref_profile - percpu-ref kill for profile @p - * @p: profile (MAYBE NULL) - */ -static inline void aa_kill_ref_profile(struct aa_profile *p) -{ - if (p) - percpu_ref_kill(&p->label.count); + percpu_rcuref_put(&p->label.count); } static inline int AUDIT_MODE(struct aa_profile *profile) diff --git a/security/apparmor/include/policy_ns.h b/security/apparmor/include/policy_ns.h index f3db01c5e193..d646070fd966 100644 --- a/security/apparmor/include/policy_ns.h +++ b/security/apparmor/include/policy_ns.h @@ -127,30 +127,6 @@ static inline void aa_put_ns(struct aa_ns *ns) aa_put_profile(ns->unconfined); } -/** - * aa_switch_ref_ns - switch percpu-ref mode for @ns - * @ns: namespace to switch percpu-ref mode of - * - * Switch percpu-ref mode of @ns between percpu and atomic - */ -static inline void aa_switch_ref_ns(struct aa_ns *ns, bool percpu) -{ - if (ns) - aa_switch_ref_profile(ns->unconfined, percpu); -} - -/** - * aa_kill_ref_ns - do percpu-ref kill for @ns - * @ns: namespace to perform percpu-ref kill for - * - * Do percpu-ref kill of @ns refcount - */ -static inline void aa_kill_ref_ns(struct aa_ns *ns) -{ - if (ns) - aa_kill_ref_profile(ns->unconfined); -} - /** * __aa_findn_ns - find a namespace on a list by @name * @head: list to search for namespace on (NOT NULL) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index 1299262f54e1..f28dec1c3e70 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -336,7 +336,7 @@ void aa_label_destroy(struct aa_label *label) rcu_assign_pointer(label->proxy->label, NULL); aa_put_proxy(label->proxy); } - percpu_ref_exit(&label->count); + percpu_rcuref_exit(&label->count); aa_free_secid(label->secid); label->proxy = (struct aa_proxy *) PROXY_POISON + 1; @@ -372,7 +372,7 @@ static void label_free_rcu(struct rcu_head *head) void aa_label_percpu_ref(struct percpu_ref *ref) { - struct aa_label *label = container_of(ref, struct aa_label, count); + struct aa_label *label = container_of(ref, struct aa_label, count.pcpu_ref); struct aa_ns *ns = labels_ns(label); if (!ns) { @@ -409,7 +409,7 @@ bool aa_label_init(struct aa_label *label, int size, gfp_t gfp) label->size = size; /* doesn't include null */ label->vec[size] = NULL; /* null terminate */ - if (percpu_ref_init(&label->count, aa_label_percpu_ref, PERCPU_REF_INIT_ATOMIC, gfp)) { + if (percpu_rcuref_init_unmanaged(&label->count, aa_label_percpu_ref, gfp)) { aa_free_secid(label->secid); return false; } @@ -710,8 +710,6 @@ static struct aa_label *__label_insert(struct aa_labelset *ls, rb_link_node(&label->node, parent, new); rb_insert_color(&label->node, &ls->root); label->flags |= FLAG_IN_TREE; - percpu_ref_switch_to_percpu(&label->count); - aa_label_reclaim_add_label(label); return aa_get_label(label); } diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index d0d4ebad1e26..e490a7000408 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -64,204 +64,6 @@ static LIST_HEAD(aa_global_buffers); static DEFINE_SPINLOCK(aa_buffers_lock); static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers); -static struct workqueue_struct *aa_label_reclaim_wq; -static void aa_label_reclaim_work_fn(struct work_struct *work); - -/* - * Dummy llist nodes, for lockless list traveral and deletions by - * the reclaim worker, while nodes are added from normal label - * insertion paths. - */ -struct aa_label_reclaim_node { - bool inuse; - struct llist_node node; -}; - -/* - * We need two dummy head nodes for lockless list manipulations from reclaim - * worker - first dummy node will be used in current reclaim iteration; - * the second one will be used in next iteration. Next iteration marks - * the first dummy node as free, for use in following iteration. - */ -#define AA_LABEL_RECLAIM_NODE_MAX 2 - -#define AA_MAX_LABEL_RECLAIMS 100 -#define AA_LABEL_RECLAIM_INTERVAL_MS 5000 - -static LLIST_HEAD(aa_label_reclaim_head); -static struct llist_node *last_reclaim_label; -static struct aa_label_reclaim_node aa_label_reclaim_nodes[AA_LABEL_RECLAIM_NODE_MAX]; -static DECLARE_DELAYED_WORK(aa_label_reclaim_work, aa_label_reclaim_work_fn); -static struct percpu_ref aa_label_reclaim_ref; - -void aa_label_reclaim_add_label(struct aa_label *label) -{ - percpu_ref_get(&label->count); - llist_add(&label->reclaim_node, &aa_label_reclaim_head); -} - -static bool aa_label_is_reclaim_node(struct llist_node *node) -{ - return &aa_label_reclaim_nodes[0].node <= node && - node <= &aa_label_reclaim_nodes[AA_LABEL_RECLAIM_NODE_MAX - 1].node; -} - -static struct llist_node *aa_label_get_reclaim_node(void) -{ - int i; - struct aa_label_reclaim_node *rn; - - for (i = 0; i < AA_LABEL_RECLAIM_NODE_MAX; i++) { - rn = &aa_label_reclaim_nodes[i]; - if (!rn->inuse) { - rn->inuse = true; - return &rn->node; - } - } - - return NULL; -} - -static void aa_label_put_reclaim_node(struct llist_node *node) -{ - struct aa_label_reclaim_node *rn = container_of(node, struct aa_label_reclaim_node, node); - - rn->inuse = false; -} - -static void aa_put_all_reclaim_nodes(void) -{ - int i; - - for (i = 0; i < AA_LABEL_RECLAIM_NODE_MAX; i++) - aa_label_reclaim_nodes[i].inuse = false; -} -static void aa_release_reclaim_ref_noop(struct percpu_ref *ref) -{ -} - -static void aa_label_reclaim_work_fn(struct work_struct *work) -{ - struct llist_node *pos, *first, *head, *prev, *next; - static bool reclaim_ref_dead_once; - struct llist_node *reclaim_node; - struct aa_label *label; - int cnt = 0; - bool held, ref_is_zero; - - first = aa_label_reclaim_head.first; - if (!first) - goto queue_reclaim_work; - - if (last_reclaim_label == NULL || last_reclaim_label->next == NULL) { - reclaim_node = aa_label_get_reclaim_node(); - WARN_ONCE(!reclaim_node, "Reclaim heads exhausted\n"); - if (unlikely(!reclaim_node)) { - head = first->next; - if (!head) { - aa_put_all_reclaim_nodes(); - goto queue_reclaim_work; - } - prev = first; - } else { - llist_add(reclaim_node, &aa_label_reclaim_head); - prev = reclaim_node; - head = prev->next; - } - } else { - prev = last_reclaim_label; - head = prev->next; - } - - last_reclaim_label = NULL; - llist_for_each_safe(pos, next, head) { - /* Free reclaim node, which is present in the list */ - if (aa_label_is_reclaim_node(pos)) { - prev->next = pos->next; - aa_label_put_reclaim_node(pos); - continue; - } - - label = container_of(pos, struct aa_label, reclaim_node); - if (reclaim_ref_dead_once) - percpu_ref_reinit(&aa_label_reclaim_ref); - - /* - * Switch counters of label ref and reclaim ref. - * Label's refcount becomes 1 - * Percpu refcount has the current refcount value - * of the label percpu_ref. - */ - percpu_ref_swap_percpu_sync(&label->count, &aa_label_reclaim_ref); - - /* Switch reclaim ref to percpu, to check for 0 */ - percpu_ref_switch_to_atomic_sync(&aa_label_reclaim_ref); - - /* - * Release a count (original label percpu ref had an extra count, - * from the llist addition). - * When all percpu references have been released, this should - * be the initial count, which gets dropped. - */ - percpu_ref_put(&aa_label_reclaim_ref); - /* - * Release function of reclaim ref is noop; we store the result - * for later processing after common code. - */ - if (percpu_ref_is_zero(&aa_label_reclaim_ref)) - ref_is_zero = true; - - /* - * Restore back initial count. Switch reclaim ref to - * percpu, for switching back the label percpu and - * atomic counters. - */ - percpu_ref_get(&aa_label_reclaim_ref); - percpu_ref_switch_to_percpu(&aa_label_reclaim_ref); - /* - * Swap the refs again. Label gets all old counts - * in its atomic counter after this operation. - */ - percpu_ref_swap_percpu_sync(&label->count, &aa_label_reclaim_ref); - - /* - * Transfer the percpu counts, which got added, while this - * switch was going on. The counters are accumulated into - * the label ref's atomic counter. - */ - percpu_ref_transfer_percpu_count(&label->count, &aa_label_reclaim_ref); - - /* Kill reclaim ref for reinitialization, for next iteration */ - percpu_ref_kill(&aa_label_reclaim_ref); - reclaim_ref_dead_once = true; - - /* If refcount of label ref was found to be 0, reclaim it now! */ - if (ref_is_zero) { - percpu_ref_switch_to_atomic_sync(&label->count); - rcu_read_lock(); - percpu_ref_put(&label->count); - held = percpu_ref_tryget(&label->count); - if (!held) - prev->next = pos->next; - rcu_read_unlock(); - if (!held) - continue; - percpu_ref_switch_to_percpu(&label->count); - } - - cnt++; - if (cnt == AA_MAX_LABEL_RECLAIMS) { - last_reclaim_label = pos; - break; - } - prev = pos; - } - -queue_reclaim_work: - queue_delayed_work(aa_label_reclaim_wq, &aa_label_reclaim_work, - msecs_to_jiffies(AA_LABEL_RECLAIM_INTERVAL_MS)); -} - /* * LSM hook functions */ @@ -2197,16 +1999,6 @@ static int __init set_init_ctx(void) return 0; } -static int __init clear_init_ctx(void) -{ - struct cred *cred = (__force struct cred *)current->real_cred; - - set_cred_label(cred, NULL); - aa_put_label(ns_unconfined(root_ns)); - - return 0; -} - static void destroy_buffers(void) { union aa_buffer *aa_buf; @@ -2485,22 +2277,6 @@ static int __init apparmor_init(void) aa_free_root_ns(); goto buffers_out; } - - aa_label_reclaim_wq = alloc_workqueue("aa_label_reclaim", - WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_FREEZABLE, 0); - WARN_ON(!aa_label_reclaim_wq); - if (aa_label_reclaim_wq) - queue_delayed_work(aa_label_reclaim_wq, &aa_label_reclaim_work, - AA_LABEL_RECLAIM_INTERVAL_MS); - - if (!percpu_ref_init(&aa_label_reclaim_ref, aa_release_reclaim_ref_noop, - PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) { - AA_ERROR("Failed to allocate label reclaim percpu ref\n"); - aa_free_root_ns(); - clear_init_ctx(); - goto buffers_out; - } - security_add_hooks(apparmor_hooks, ARRAY_SIZE(apparmor_hooks), &apparmor_lsmid); diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c index ca633cfbd936..1f02cfe1d974 100644 --- a/security/apparmor/policy_ns.c +++ b/security/apparmor/policy_ns.c @@ -124,7 +124,6 @@ static struct aa_ns *alloc_ns(const char *prefix, const char *name) goto fail_unconfined; /* ns and ns->unconfined share ns->unconfined refcount */ ns->unconfined->ns = ns; - aa_switch_ref_ns(ns, true); atomic_set(&ns->uniq_null, 0); @@ -337,7 +336,7 @@ void __aa_remove_ns(struct aa_ns *ns) /* remove ns from namespace list */ list_del_rcu(&ns->base.list); destroy_ns(ns); - aa_kill_ref_ns(ns); + aa_put_ns(ns); } /** @@ -378,7 +377,6 @@ int __init aa_alloc_root_ns(void) } kernel_t = &kernel_p->label; root_ns->unconfined->ns = aa_get_ns(root_ns); - aa_switch_ref_ns(root_ns, true); return 0; } @@ -394,5 +392,5 @@ void __init aa_free_root_ns(void) aa_label_free(kernel_t); destroy_ns(ns); - aa_kill_ref_ns(ns); + aa_put_ns(ns); } From patchwork Wed Jan 10 11:18:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neeraj Upadhyay X-Patchwork-Id: 13516013 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on2046.outbound.protection.outlook.com [40.107.95.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A19AF48799; Wed, 10 Jan 2024 11:24:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="jPcYROyM" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LfnBEcCW0cE2Lr/3NdRIu/jHTnxB9iGT+CJTu9iiUIT4QP7fiQmiV3E6XhhBoeIJQcjb9rNQlhVTmS5yi9iePam1al9px91cap9ma5Hyq42QeguFJHJ56Uk7YyrVxwNDIe3MLdtPgPLX6kt71hSa78GqFhQLv6u7fBx3PRYqA0m1aiMpktd9MseYEpVgymvs8sa2kZDvZ9WIrBfZtXJW3t9IHLskXBqnzkd5jqTEE8q7LrbbcHKP0gYa0oJWpQVi3uZ2yabE2q6VZXEBOBE4h/8jUZFY6/+im3Eo4q1lJxjZWnfOYEKOmrhrSeIHlco7l/WGbxuDEF8LK4d9vAKYLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=iG00rFkrp4XVcLDSfHHaNLmmoNhFzMJPg4tqE6Nr/SY=; b=hTCV3aLzh+U64ZwkXQcGNA/UiLGYIkv8HW0rsMTobD8vmDOK34VzEr6Vr+Pne7mHG/xNDVh/1uZraRY3fuc4NAX6XTatbmPcyailxsQnBK9Fdo/7t4At0TYkR6x/6Vj7evA+lkPFmMfAsnzV73e9Q63Lw+DsVznLsyaMwXu0vDRWSz8m7n/samsei9XuMjJ4mm4HIEsvi+3+kYRk4Aujx26uXo8lY7d1HuEg3MAP5X/mK/JP3LhKuNC+xsgA5u/URHgR0LGBUd/XLyw87g7UxWklRNsUX/vWcLKBE2gqB1epd/ZQuFs3U3ttvI7H6cFYWhUQynJLviw3+m7/jo0c/w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=canonical.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=iG00rFkrp4XVcLDSfHHaNLmmoNhFzMJPg4tqE6Nr/SY=; b=jPcYROyM5jTE0Ev2/0Py96L2Zh/r3EGZ8DF3REsAZEDwB87I8amJylLo/dO10oKIdBEIpM++9zUiT0DHGV9TobdCkY0obkVwI4eVHGzS1uVGbXHSn+za73ikmQEn/VTwxLi7KTMgb2DnIu7EM8bovvxF6pEZvmMyu8sK8Xk0m2k= Received: from CYXPR03CA0041.namprd03.prod.outlook.com (2603:10b6:930:d2::12) by CO6PR12MB5491.namprd12.prod.outlook.com (2603:10b6:303:13b::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18; Wed, 10 Jan 2024 11:24:49 +0000 Received: from CY4PEPF0000EE35.namprd05.prod.outlook.com (2603:10b6:930:d2:cafe::2) by CYXPR03CA0041.outlook.office365.com (2603:10b6:930:d2::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7181.18 via Frontend Transport; Wed, 10 Jan 2024 11:24:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE35.mail.protection.outlook.com (10.167.242.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7181.13 via Frontend Transport; Wed, 10 Jan 2024 11:24:48 +0000 Received: from BLR-L-NUPADHYA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Wed, 10 Jan 2024 05:24:43 -0600 From: Neeraj Upadhyay To: , , , CC: , , , , , , , , , , , Neeraj Upadhyay Subject: [RFC 9/9] apparmor: Switch unconfined and in tree labels to managed ref mode Date: Wed, 10 Jan 2024 16:48:56 +0530 Message-ID: <20240110111856.87370-9-Neeraj.Upadhyay@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-security-module@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE35:EE_|CO6PR12MB5491:EE_ X-MS-Office365-Filtering-Correlation-Id: fecc1373-b57e-461b-f40c-08dc11cec1ab X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5zbHXc+tv80xpbAOC64vONBJ710EUW5JEZAjpDTqsbtvuiJKvJNgoU/6N6JPKdoAOC5l6ZdsgiB8jZ/obK/dnPFjN+1+tslz1RK2nFKOO6cS3X7fZGNZkY9hmmPkX8/w80xEXXilhG36njcngsnNQDK35Bk3AnMFj84/BvCK8kL7JRpoSOxQPhHSQSjgCfkOIuIVRMzlfOOO7N7dOExEmA+2VyEtaHPlpF4XOpsM0of+fhoGQN/UtnKob3TudpKehFUKc2E/FjVPr+7wyAwCrfphu4LDWG7c8S4U7xSGIBsT+izBUl3GdHB9KA8E8YEgsx49XL+xp6j5TNd0JXdQ1SveJJIGuQjVlQhidst8u0gQJdL0+NXngyXg14pHdz/4WhtMsjW7KO9SK50zX0c9w1m0kkaQxebROVZ/+YwezTW68rWqfw/s2Qt4PAhxiH8xVOTZfbxk+J+S5UqXOb+rkmocqZbYAIxxEJo+t0CC/ggOPSEo6XACeUN/6DvrNdwKzfPcY1It8psYWMKNbaYfh46SRIAXxSi8/YBWJyKg1dqElAZNZi3BEaezhxkEtX5pxFl8/pmA2w96+mlfT9KIw1bT1Zm/kXr8JfxXyN2Ua9cBfAgUx4qrjV4Mt5Yk8wvJaEQXi0z4XoD/44TPn0mz+sHQf8k0mTUm5AqLgAIJyiATAltO7AK2r4giCz3UVDUtzmuGxCKlYyzKGIiUhnGFIuXIxeMzqQ5730ZBMQPQbvyllgjYt/o8jJagLaiJcjtPM/sJfAqOyV/K4OS2WNeTvA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(346002)(39860400002)(396003)(136003)(376002)(230922051799003)(64100799003)(1800799012)(82310400011)(186009)(451199024)(40470700004)(46966006)(36840700001)(2616005)(1076003)(26005)(336012)(7696005)(478600001)(16526019)(426003)(6666004)(36860700001)(83380400001)(47076005)(5660300002)(2906002)(7416002)(41300700001)(4326008)(70586007)(70206006)(110136005)(8676002)(8936002)(54906003)(316002)(81166007)(356005)(86362001)(82740400003)(36756003)(40460700003)(40480700001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jan 2024 11:24:48.9037 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fecc1373-b57e-461b-f40c-08dc11cec1ab X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE35.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR12MB5491 Switch unconfined and in-tree labels to percpu managed mode of percpu rcuref. This helps avoid memory contention in ref get and put operations. Signed-off-by: Neeraj Upadhyay --- security/apparmor/label.c | 1 + security/apparmor/policy_ns.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/security/apparmor/label.c b/security/apparmor/label.c index f28dec1c3e70..57fcd5b3e48a 100644 --- a/security/apparmor/label.c +++ b/security/apparmor/label.c @@ -710,6 +710,7 @@ static struct aa_label *__label_insert(struct aa_labelset *ls, rb_link_node(&label->node, parent, new); rb_insert_color(&label->node, &ls->root); label->flags |= FLAG_IN_TREE; + percpu_rcuref_manage(&label->count); return aa_get_label(label); } diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c index 1f02cfe1d974..ff261b119c53 100644 --- a/security/apparmor/policy_ns.c +++ b/security/apparmor/policy_ns.c @@ -124,6 +124,7 @@ static struct aa_ns *alloc_ns(const char *prefix, const char *name) goto fail_unconfined; /* ns and ns->unconfined share ns->unconfined refcount */ ns->unconfined->ns = ns; + percpu_rcuref_manage(&ns->unconfined->label.count); atomic_set(&ns->uniq_null, 0); @@ -377,6 +378,7 @@ int __init aa_alloc_root_ns(void) } kernel_t = &kernel_p->label; root_ns->unconfined->ns = aa_get_ns(root_ns); + percpu_rcuref_manage(&root_ns->unconfined->label.count); return 0; }