From patchwork Thu May 28 13:54:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Schatzberg X-Patchwork-Id: 11575877 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4DB6912 for ; Thu, 28 May 2020 13:55:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7785A20814 for ; Thu, 28 May 2020 13:55:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ci2fVkIp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7785A20814 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9CB578001A; Thu, 28 May 2020 09:55:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 954E880010; Thu, 28 May 2020 09:55:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F84A8001A; Thu, 28 May 2020 09:55:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id 636B280010 for ; Thu, 28 May 2020 09:55:08 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 061621CFCE for ; Thu, 28 May 2020 13:55:08 +0000 (UTC) X-FDA: 76866274296.11.kiss13_fa48cbaa4658 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id D5256180F8B80 for ; Thu, 28 May 2020 13:55:07 +0000 (UTC) X-Spam-Summary: 2,0,0,f274c6969604e8d9,d41d8cd98f00b204,schatzberg.dan@gmail.com,,RULES_HIT:41:355:379:387:541:973:982:988:989:1260:1311:1314:1345:1437:1515:1535:1543:1711:1730:1747:1777:1792:1801:2198:2199:2393:2553:2559:2562:2903:2910:3138:3139:3140:3141:3142:3355:3653:3865:3866:3867:3868:3870:3871:3872:3874:4118:4250:4605:5007:6119:6261:6653:6742:6743:7903:8603:9201:9413:10004:11026:11658:11914:12043:12291:12296:12297:12517:12519:12663:12679:12895:13149:13161:13229:13230:13894:14096:14181:14394:14687:14721:21433:21444:21450:21451:21627:21666:21740:21939:30003:30054:30055:30070:30090,0,RBL:209.85.219.65:@gmail.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: kiss13_fa48cbaa4658 X-Filterd-Recvd-Size: 7659 Received: from mail-qv1-f65.google.com (mail-qv1-f65.google.com [209.85.219.65]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 28 May 2020 13:55:07 +0000 (UTC) Received: by mail-qv1-f65.google.com with SMTP id r3so12955193qve.1 for ; Thu, 28 May 2020 06:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=GeTtPJIfmEr7+1bwUpo3zXUUHr8dPGmrm3c30JGLdGU=; b=ci2fVkIpO6vUrWQ+KPwvraonqaR1XyJziqp5JUW4sIW1Gu/NLvjcGlVKd/PaxQrttM ZGUhgzcdPxuWfC8HD+qVZ1dm11OS375pLyKbskIpwk+ddCBSsfzEOdG8PzPm1mb1ELun 19U9X5BBJBlZhfOe5C14CJgk5ZhmI4ETE2BuPdfgUWNCKQa75nuuFWKOs506SqCTip9x PyWUeyZN5xK3HqfgrMWvVmwiW/Uv8NchdURxQWhkh+d+gJlB37nyXWmNI9VXJT87kYrv cLiLY8xpj+QPlxPt8c5RFGZ0HzNllWT/Ds/UcdzPsQRc6uNzzOkoN/kE0MXx6e8XsmBm fYyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=GeTtPJIfmEr7+1bwUpo3zXUUHr8dPGmrm3c30JGLdGU=; b=tV7avrL8VoFfaTaAD/nOOJ2cldnUS8lauln/JL9lA6+WOouMHc3HBQ3YKP/Fy3CMzx empmykAmgaqDq9d4bt4wb33Q+kkH6wGv4IOVCqQZwP7mPLP4qh9ZZuVER5TbA9bQHj8I 8bMQGbzKHRj9n7ZY7cRjERc61Aut0AGOOyAZLetSh6FwR7ky2yqHJBFCKRwE02rtFSsx tq1+zf+mcQQPiP739pCBrBe4DDbMpKtbYTGcBTJGSu+fiOZAvj+QzfK2HvDuABNccYNH IH4YUHuU5ktICpa00tZw7hpStEyFTNvn1z/WxDeXcwByFA7JKsW4tUl+enylYwwo3bx6 iQjg== X-Gm-Message-State: AOAM531z3oVBYPZ0H5i0nkpiSAlql6YGlorfzM6ekjUJJQwm6UfzJ3sW WR10MsWwDAywj4bevsxryik= X-Google-Smtp-Source: ABdhPJw0I4eQ626R1lrou885QwlV5+buIMLht9Z1fnUA0WfY+zKV07/wltIpwOCR15oRJkrnVW9Tzg== X-Received: by 2002:ad4:4c4f:: with SMTP id cs15mr2987571qvb.117.1590674106652; Thu, 28 May 2020 06:55:06 -0700 (PDT) Received: from dschatzberg-fedora-PC0Y6AEN.thefacebook.com ([2620:10d:c091:480::1:1cb7]) by smtp.gmail.com with ESMTPSA id l186sm4890889qkf.89.2020.05.28.06.55.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 May 2020 06:55:05 -0700 (PDT) From: Dan Schatzberg To: Cc: Dan Schatzberg , Jens Axboe , Alexander Viro , Jan Kara , Amir Goldstein , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Thomas Gleixner , "Peter Zijlstra (Intel)" , Ingo Molnar , Mathieu Desnoyers , Andrea Arcangeli , linux-block@vger.kernel.org (open list:BLOCK LAYER), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:FILESYSTEMS (VFS and infrastructure)), cgroups@vger.kernel.org (open list:CONTROL GROUP (CGROUP)), linux-mm@kvack.org (open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)) Subject: [PATCH v6 0/4] Charge loop device i/o to issuing cgroup Date: Thu, 28 May 2020 09:54:35 -0400 Message-Id: <20200528135444.11508-1-schatzberg.dan@gmail.com> X-Mailer: git-send-email 2.21.3 MIME-Version: 1.0 X-Rspamd-Queue-Id: D5256180F8B80 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Much of the discussion about this has died down. There's been a concern raised that we could generalize infrastructure across loop, md, etc. This may be possible, in the future, but it isn't clear to me how this would look like. I'm inclined to fix the existing issue with loop devices now (this is a problem we hit at FB) and address consolidation with other cases if and when those need to be addressed. Changes since V6: * Added separate spinlock for worker synchronization * Minor style changes Changes since V5: * Fixed a missing css_put when failing to allocate a worker * Minor style changes Changes since V4: Only patches 1 and 2 have changed. * Fixed irq lock ordering bug * Simplified loop detach * Added support for nesting memalloc_use_memcg Changes since V3: * Fix race on loop device destruction and deferred worker cleanup * Ensure charge on shmem_swapin_page works just like getpage * Minor style changes Changes since V2: * Deferred destruction of workqueue items so in the common case there is no allocation needed Changes since V1: * Split out and reordered patches so cgroup charging changes are separate from kworker -> workqueue change * Add mem_css to struct loop_cmd to simplify logic The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=/sys/fs/cgroup/test.slice LOOP_DEV=/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; dd if=/dev/zero of=/tmp/file bs=1M count=256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=/dev/zero of=$LOOP_DEV bs=1M count=256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. Dan Schatzberg (3): loop: Use worker per cgroup instead of kworker mm: Charge active memcg when no mm is set loop: Charge i/o to mem and blk cg Johannes Weiner (1): mm: support nesting memalloc_use_memcg() drivers/block/loop.c | 244 ++++++++++++++++++++++----- drivers/block/loop.h | 15 +- fs/buffer.c | 6 +- fs/notify/fanotify/fanotify.c | 5 +- fs/notify/inotify/inotify_fsnotify.c | 5 +- include/linux/memcontrol.h | 6 + include/linux/sched/mm.h | 28 +-- kernel/cgroup/cgroup.c | 1 + mm/memcontrol.c | 11 +- mm/shmem.c | 4 +- 10 files changed, 246 insertions(+), 79 deletions(-)