From patchwork Tue Jan 7 12:03:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928772 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FC18191F66; Tue, 7 Jan 2025 12:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251679; cv=none; b=M6V8hAH+ld7cPBrZnUkfAb+h1OnvfiG2uY53fpZEyGLnQNLh6GsV7y07PVhxR+iCZKbr5tz8l6AQrf72bySsYp6Il0spYyNELe4JSUd8aUD7PchbeAaTAsF1xxTqFGIAxkAN74KvaMop5fBOvKOx7pLDjGIMrtZ20iyS2eENhTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251679; c=relaxed/simple; bh=t74MpXxg79PL3InuSBEprS+afUhwIQZ/M1fsDDoOEjE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fJA+T98HQ6m3btkL3iyFd7igSEq+ms6s+JiLOt29mVK61Io+Zn6wZe5WE9gyMTfIJsfrFHNOx0lrtSnX79ZQ2r7056hH0zTyiU1CgIVgQSgTDEuLwCbX39sT4uxqKbXl40v2OEZhE/oKF1GEzFDXhFNlVjm4Sj7FymgtPeEJG0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PzHuFaop; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PzHuFaop" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21670dce0a7so47679625ad.1; Tue, 07 Jan 2025 04:07:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251678; x=1736856478; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N3pwQMCsFZzhnkUtRX/8Nct+k631l2JaW7dQ7nyVgqE=; b=PzHuFaop0pCjv9Wv7XZrtodMVyfF35zUg6hflMYwO1PXo6Tr3z+hC/gZqhZd1AuXOw phgc0lzC7jNxqZLWs47JpxdT3nX2W+p+dBQagQaF3aq+82adWEyLRZznRvGu7HFfVCKk 4cmlxI9S3TRBZahEiIEI4LRi2S3xFciqJWrbhfW5LBkWDn9uLcGCe270e7CmF2HCyVst EeDYJmB+dniCwCX3lFAwwYpb8jsm6CEksqX45xBaCHjt+ex0F86Q7VWHuyJ1DtnuQFy8 XngvBdGSPzft5n94/q9HHUkaeindhu0equgHOutKYbk9V1p2fj4b15ffVZ5fmTi2JCux YI9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251678; x=1736856478; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N3pwQMCsFZzhnkUtRX/8Nct+k631l2JaW7dQ7nyVgqE=; b=FSeVXZPetFL32+rh1f67ppjRHSijt+SKQUPQKYrgEtDM0zWDP4JEvVBTlecDZH0jFB BhLrebsX5i/53LRqtZJQD9Cj2zDhsBScRDrq4T5WBV2g0uwgTgBKPSsVSlvd7B2PuWon B4IMxHfifXXiS77mirfFfJGdscLFc+Gyi9GJ3NX2OwbvrZcGVwybKg9N22sa8PdbHQEn S0vBTY4FxkQRhPC3h+xE2FXxlASitEYf1pDP5k8AAnO6ToEnpqXGUpNc3Nznp/5pz/ql C/CmtiXFdZonNDOCTMOJpo3oFhEvD4Sp8/uxfr+Q6ZUubtVROkQGVian4UEq/cUhTRKL Oo8Q== X-Forwarded-Encrypted: i=1; AJvYcCX9ejMlCbbqCexxEVrh9ZRZLUUxijQRaeZN6NgC9THsm1tyd2bi1ZKenNSG8O4dtXB3D+uaBzrnG7fqbw==@vger.kernel.org X-Gm-Message-State: AOJu0YyfWXbLu2FFcMN951oYtpAlrMHApdvp9qzOIH60O9FlP58avVqV S8uzxUnsP8h6awc0s8Be23ate7WhsPXNLxHbGebI3NtZxdRmRMnS X-Gm-Gg: ASbGncuM9jcmpJpgkWW02OmPPIeiKbSmCEcLbFOEdhxq6wN+48BNK/HWCmN/e0T6m19 deP32JcOnVaWSNGobAzwteEWg3ZXlroFpvQ5tBNuKLkMOLmpqZJdGz1CQH1qeZGYkcgqpFZo04K 8QgnSX67LSj/QYLLFnqyTBV3zBKltFtbcdvqBnGZ3rpP8YJO1rEL9vA7HR+IQFbl5iudY/pqyC7 Ate35VsGmkyOvNCGyaZAsv3mu3uiaMtKfJJalP7pPKPm5DobK3L+eyHE3Jwm1uhdH+i X-Google-Smtp-Source: AGHT+IGyG4eEk8yJR9j/6KIdO5nb0Lre/awjpKAvS61aR1Vcz2EjpxKhHWfrdZOLA7gRheUjf/jukg== X-Received: by 2002:a05:6a00:4486:b0:725:e015:908d with SMTP id d2e1a72fcca58-72abdd4f29cmr76333568b3a.1.1736251677766; Tue, 07 Jan 2025 04:07:57 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:07:57 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Date: Tue, 7 Jan 2025 20:03:52 +0800 Message-ID: <20250107120417.1237392-2-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Remove two unused fields(`io_addr` & `max_io_sz`) from `struct ublk_queue`. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 934ab9332c80..77ce3231eba4 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -143,8 +143,6 @@ struct ublk_queue { struct llist_head io_cmds; - unsigned long io_addr; /* mapped vm address */ - unsigned int max_io_sz; bool force_abort; bool timeout; bool canceling; From patchwork Tue Jan 7 12:03:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928773 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02A561EE00E; Tue, 7 Jan 2025 12:08:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251683; cv=none; b=VFO9Y9hkVoBQInqsuJCHmhdbc6OQ+5M+x43nto+6hGUlS+aeQSADG6aHY/i/+oR9qdDgLuTJ2oHhQggzrG8MmU2HZj+qtfEtP0dQoILqc4MciwJq+3R2U10SCaKPLCc2tJUA3SqRjQfP0YyMTzYtay9+t47Nx/8WhvJZRVgKz/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251683; c=relaxed/simple; bh=S6fCJ2dABAQnEpnmkpc3MnkoO2+ZSkdVwy88fnthDAc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SJihS0bDxPbc76eWj7sqx2Ez96tEYWPg+GoPyjSReXyMDrwrY0mMwiUV3KFfwqfhgm3niEm6zkeNzUMGPu4sGWXedcAZmzSdfRWyvDz5kJfLJJyzgMn3Jj/evLdIaaHN3mXAQy3iPs9tUrzlcH8jVCyfUy7a+U8TFYptLVetTBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lY9ShYjX; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lY9ShYjX" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-21669fd5c7cso228813145ad.3; Tue, 07 Jan 2025 04:08:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251681; x=1736856481; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9nY7Yr65sXJ0bfSa8yv+g/jvRh1BAnFP0+HXza/gQjg=; b=lY9ShYjXhJyoVwqOwsAdS5aB8eIIofqM0q3UwRxE08zY3TCKLJlfo/9E2xjMNDcNNV 92hRWJKPHILg7S3WSlyNN+MRRt9MZe2M9oq/z5YzPEkncqYpfiZFljKlTI9VEtRzrbtp q52gAZdFUvv09lIJAqQRXctPmYziq/I2W/J1fqgqCyOlAOeqHan/zgIXkni41vEOeZeh ZnpTfocqcuhl9GuKU+lQphEi011pKfXdw++CVAcWaqAoO0LboP2C1kHKEyPSIoza3Pt1 ukqG/vX704RZVZZW2apWXzU3Zp8r9cH9KHgn8SlMJ1TQ1SaHbB9oiB2WuyyNpFDKgvR+ SHFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251681; x=1736856481; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9nY7Yr65sXJ0bfSa8yv+g/jvRh1BAnFP0+HXza/gQjg=; b=g6D+u4zvwoLx8S5x8QcOckp5ESVlRjK89kpa3BIJvWLgs0YZaBShSdh/FbtVGFHn53 iliw88QmxFoN1OZOBX/7+BPqmOIEQc425CJ8JE2klKAYOvl9qbEk9o6mNeydzlvzHJKJ GYXxIQ6gjFslGJOXsQY56ogqUeaKzspDyKBa2ehO//WkzEMYjsPxdzN5CpKbsxftxDa8 tXf5T0olWAPZekUwZ1qNoGFx5spofHgFdTN2DyXUNs5+//fs1Bf1u//QVnRRqQ3pk1aK xV35bK+FxyO4TZ5yvCFsvYAPTUgJxyseAz06M1iSalFwGqPQai8e+7QiWFdArpY0Rl8V c24Q== X-Forwarded-Encrypted: i=1; AJvYcCXzzXZSAlVfRfxlJ4JJASjov6C/aqZx6s/Y1j7+oaj9Bk+3pR3T98cHo/Jp1Givu80npRb32LdjH/iMfA==@vger.kernel.org X-Gm-Message-State: AOJu0Yz97X7ZcLu/Tfjy2yfZGU/yT7xtoxC1y82crKe245/rAiruAWK+ rn3l5nB08+pDU/wa77fnIwpm0elC+ZK0DLUqQOM9fvXCKrGZD8ze X-Gm-Gg: ASbGnctLoXCkZJuP/Melcz1PlKuA2DlGjEwGKCnaKso7Dd0AFmBFpuxNtCcXLYKM4mz 3QQBxERVKMJxEFS9F3Qo4d8v4HwUB9v3a547DTgbfHr9Wh2vT12Gc6+GIxStu7z0rnQw5QsGxU3 98bWfJUGM7rA9bIFYBlYy222OZuBbFwOUefo7NXs/yqzrhXh6mj+6c2bVijdCi60GRe8b6jxb51 GCq1fzsGr0SKJ0LDw1aewaVpHZb8VWYL/cELnws74LUvTsXfn2IN8obaabqyZ+oHFBh X-Google-Smtp-Source: AGHT+IEo5B9afYHXAr98ia7+X6t/pKWd6xfN3NLc77AJfsbVKQxDXoVhcnGAcR2XLs/Y5+Fun+jnUg== X-Received: by 2002:a05:6a00:23c1:b0:72a:bc54:be9e with SMTP id d2e1a72fcca58-72abde30d9dmr79487080b3a.15.1736251680927; Tue, 07 Jan 2025 04:08:00 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.07.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:00 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Date: Tue, 7 Jan 2025 20:03:53 +0800 Message-ID: <20250107120417.1237392-3-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Convert several `bool` type fields into bitfields of `ublk_queue`, so that we can remove one padding and save one 4 bytes in `ublk_queue`. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 77ce3231eba4..00363e8affc6 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -143,10 +143,10 @@ struct ublk_queue { struct llist_head io_cmds; - bool force_abort; - bool timeout; - bool canceling; - bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ + unsigned short force_abort:1; + unsigned short timeout:1; + unsigned short canceling:1; + unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ unsigned short nr_io_ready; /* how many ios setup */ spinlock_t cancel_lock; struct ublk_device *dev; @@ -1257,7 +1257,7 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq) if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { if (!ubq->timeout) { send_sig(SIGKILL, ubq->ubq_daemon, 0); - ubq->timeout = true; + ubq->timeout = 1; } return BLK_EH_DONE; @@ -1459,7 +1459,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq) spin_unlock(&ubq->cancel_lock); return false; } - ubq->canceling = true; + ubq->canceling = 1; spin_unlock(&ubq->cancel_lock); spin_lock(&ub->lock); @@ -1609,7 +1609,7 @@ static void ublk_unquiesce_dev(struct ublk_device *ub) * can move on. */ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) - ublk_get_queue(ub, i)->force_abort = true; + ublk_get_queue(ub, i)->force_abort = 1; blk_mq_unquiesce_queue(ub->ub_disk->queue); /* We may have requeued some rqs in ublk_quiesce_queue() */ @@ -1672,7 +1672,7 @@ static void ublk_nosrv_work(struct work_struct *work) blk_mq_quiesce_queue(ub->ub_disk->queue); ub->dev_info.state = UBLK_S_DEV_FAIL_IO; for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - ublk_get_queue(ub, i)->fail_io = true; + ublk_get_queue(ub, i)->fail_io = 1; } blk_mq_unquiesce_queue(ub->ub_disk->queue); } @@ -2744,8 +2744,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) put_task_struct(ubq->ubq_daemon); /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */ ubq->ubq_daemon = NULL; - ubq->timeout = false; - ubq->canceling = false; + ubq->timeout = 0; + ubq->canceling = 0; for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; @@ -2844,7 +2844,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub, blk_mq_quiesce_queue(ub->ub_disk->queue); ub->dev_info.state = UBLK_S_DEV_LIVE; for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { - ublk_get_queue(ub, i)->fail_io = false; + ublk_get_queue(ub, i)->fail_io = 0; } blk_mq_unquiesce_queue(ub->ub_disk->queue); } From patchwork Tue Jan 7 12:03:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928774 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E5DD1E493F; Tue, 7 Jan 2025 12:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251687; cv=none; b=TmgukVFPxNbV+QAcMwzzZDp4jEliqZLznFlx4zkVznXZ4xXjv9Q96x5VYGZrTazzVDu2dyIBGcm3rxOOfcLNyayuIUCEY6nu2tM0a+VKatKNYAz3Nr+WImgUzDPs8h32aqjh9Y+WeXOMTYWd2FJfIBn4IsRXBS868PgQ9YIycfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251687; c=relaxed/simple; bh=59DPAChHlbSsLHeLthqZD/vVzm1UxDZoVPtQtJMjM8g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IhrClqGSmnuzo8JywR7LCwrQr283lJVFm6aa9jc8fe/wdkG4epn5W2foFAHmco2XeVGcZ+NdQZnJLMOyaSoFr/fgBS+S7RbfUlM0ZmX5G+2AebGuFHuQPxr5+MIDWcsAGSFm9Q2Abmo7BzpEyXDhI1860yP56NuW/sa0TCF3Hrc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EfoZBXsX; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EfoZBXsX" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-2166f1e589cso5838305ad.3; Tue, 07 Jan 2025 04:08:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251684; x=1736856484; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5d781KXFd/53/DYOF0dg8Emjd7VZv5JgYBNfxNXTRwM=; b=EfoZBXsXnM2U/gXV9UM2Gw+krYQs7v2zk08CkKIGPPF0JUjiPNPPDIHy7oOK++3wUN REv6GjQ10QlKVy83CX+bWDN4P1CypzqB4qGDn7IaxeNSsn1elw9xiDJyHbWRvaRjgFjJ jeMA/Guq73ag+ac8puY/q2wAQy//kyRjRwSf6EYB3BbE7b7ZxpHekbp2Uo86ypxDjWtr Rx1tjTZ/NN/0mK/FVF7SwWvAn7pz6++o5qM039quZ2v39JPDEY0Gd1y7JkBarKvnR5K8 GxDZRpB+FygBBm3Dwv6kxfSDbUcRU/wiaqhJ8qN8zNKYrhJZSxiFoMaw5kjc4ELsS/g/ MkDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251684; x=1736856484; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5d781KXFd/53/DYOF0dg8Emjd7VZv5JgYBNfxNXTRwM=; b=srPk4YjiJ0+YBwSBZZFxMvmC13gthb4sRvGmMg7pAohoHHIBVwfVK+I4RwkGxQtPkv FrP7RIlDHYOwMbj0PwPTTtckPjb3UE1C2cLmFmf/RaeH/e3S0zrblgNHCzi2Gjyvj4xl h9fXQ00pbdck+cQuZ1bVQSx81Xk2dzP11CyFUaaogIOsgaHvRiS4tuSOL+Cd0vlJMDmY yuyXrO3uMmrhSbs+hNJvGHUezFHsyR7umrPqzYwgaJauMnbH9i+KJqDSL8XjScIZP4Lv BZyVc2uB7azYqGu5LCiXIv5ngCdMBaOC5mQ44Tpl7fiuaiGy31xIuLzHz7vm/LdiehsN g6GQ== X-Forwarded-Encrypted: i=1; AJvYcCV69Mmt3NXcXzw0LbrRf6JkyGs2qik2Eh2w5LS9KIF6zbyVPF/UNKgBXp+8wHUQjFG/SglSXiGqY7tr9Q==@vger.kernel.org X-Gm-Message-State: AOJu0Ywx7m3jT7ChuBTMRKDTIJIKaLJicho+lErQMUKUeKuVEEQBk/g/ r1XODn/8onU6YP3ZWNxPam9zYsjjQOhnffUPEHbM7LA0CuS0xzIp X-Gm-Gg: ASbGncs72MuNaQ9yL176z7Ykofq6TjsI1XxDOmFu4nJuRBE22pm1gpEFn3VbsfR+Xu7 VeabrtKyP4Lu5HcZJMa26bYUkXIA9SFqTnYFcE6SaFFtrHcQIF5EsdMGCHLu/LrULIxEN7w/lUA w/5xvdg0Ymbewo7tOb6Y8cfly/VbKew0CqXExQKGGJ6k4HHkUNPcaYTn8CUCSy/zjUzq0zXd7jh PtrWNnSeJ10eUZbVKHIsJ3bTiNLar3jRLcirB9Af1CwSzmGHpLfpLdCzf8Iv1iTR4T+ X-Google-Smtp-Source: AGHT+IG0a0jJHAM2oMSQMiQBYSj/xkezTeRk7Ro8aPZrFqVKE/uENjHLkl5hdVJGIGmPTJTC6qgrJg== X-Received: by 2002:a05:6a00:1706:b0:726:f7c9:7b36 with SMTP id d2e1a72fcca58-72abdd7bacdmr103732314b3a.8.1736251683958; Tue, 07 Jan 2025 04:08:03 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:03 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Date: Tue, 7 Jan 2025 20:03:54 +0800 Message-ID: <20250107120417.1237392-4-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC ublk_need_map_io() is more readable, and it can cover the coming UBLK_BPF. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 00363e8affc6..1a63a1aa99ed 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -582,6 +582,11 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) return ubq->flags & UBLK_F_USER_COPY; } +static inline bool ublk_need_map_io(const struct ublk_queue *ubq) +{ + return !ublk_support_user_copy(ubq); +} + static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) { /* @@ -909,7 +914,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, { const unsigned int rq_bytes = blk_rq_bytes(req); - if (ublk_support_user_copy(ubq)) + if (!ublk_need_map_io(ubq)) return rq_bytes; /* @@ -933,7 +938,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, { const unsigned int rq_bytes = blk_rq_bytes(req); - if (ublk_support_user_copy(ubq)) + if (!ublk_need_map_io(ubq)) return rq_bytes; if (ublk_need_unmap_req(req)) { @@ -1809,7 +1814,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) goto out; - if (!ublk_support_user_copy(ubq)) { + if (ublk_need_map_io(ubq)) { /* * FETCH_RQ has to provide IO buffer if NEED GET * DATA is not enabled @@ -1831,7 +1836,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) goto out; - if (!ublk_support_user_copy(ubq)) { + if (ublk_need_map_io(ubq)) { /* * COMMIT_AND_FETCH_REQ has to provide IO buffer if * NEED GET DATA is not enabled or it is Read IO. From patchwork Tue Jan 7 12:03:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928775 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E03681EBFEB; Tue, 7 Jan 2025 12:08:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251695; cv=none; b=WYZkR4lK4+3rRuUnwx77vXy2UJ6AwOXZC1H274k30C2zF9DfTIyTNLiqqHkPIKzpUXKMxhqK3+gcNcfLpRi0LwJFjBWhxTGu/jWJN/9dPxAK7FOxzl6sZJdXa7spMuSSt07mldG2Csdd6+EOOCBbId1y6FFt3lTMFUc1ka/EZ9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251695; c=relaxed/simple; bh=KG3HsvC8HUhrbrbpUJE4bQxwMEjYH6YZtoy2sf9d6hY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qKra7Cq0eltZx7zQljR9l7AekuKq7uvmqxkThQzCE2b4r2tHKNVxFo3MBlyVDR50kZ5QyY2iJi9O0H95b5Ye0xAWOoFRXRipWfgsfbfb3CckeVvEcB3f9lXfbMGMTOZtOFrwUtWaM25/HMXKninePjDMOQK1hiGqlA6T4N1Isp8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Lrq49oJc; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Lrq49oJc" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-216426b0865so220067045ad.0; Tue, 07 Jan 2025 04:08:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251687; x=1736856487; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H2izW7BlDss+VrFsciVNsJsfoPjzBbhAU1O27zuOZQk=; b=Lrq49oJcRes6RoG7uwzwskeGf+hIFmp0QgX4hiIMs8x2L2Sb8rmxLgBUeEohNq3l38 iCQnkLwBP4IKi2vAmYQbP4TtGXDIS7PwkKaog4XSoBm4z91hAXL98Bgqoqp3/KilIjS1 GR0gu15qqNpkkC8hz7ddkAa6Fd7e16SjZ6zsvmDf7Dy1IeeMX6StMdj4NMEoix0ZnpHy 0uiXSIIifHTNBkj/WYsZzyrdLsaRRE/iS2W2olZaYOUcrl/lQH6tXL5y6IIF2mFBEBgq uZCvd1bkuydNxQ8EL8ntWyJNhr/JpTFzsOscqPTB68RDxzTMVwfRKg37NlCzAydw6FU2 vVoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251687; x=1736856487; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H2izW7BlDss+VrFsciVNsJsfoPjzBbhAU1O27zuOZQk=; b=p+C13c4dI4Fkbz7DNS3YUHf0EJsNiIJLVSFt1+di+RO9Z2HRCl5qkLFXZ0jaWiwZ3d 12oXVBHmGv/PKy65ZVaU8IEmY978m1vrc7y8XRRD6e8J8ElhAOBnqAfgQsUMjCoUKYc8 XWCE1JclWiOHG7N09FXairGpDkNLnyVy19XA4bC8wbtJIqeiJZksdVgEYYoDHwABTfZj ggcIPuCDmokKRYZSmsoXfVfbXLSaulC3PA+qjHy8wI47ODKMWK2/j/rch10ii8BtAWYp g0D+zoasofiJFyJ5XkgrLrqzSL3kBX7/tTj6CafVH5f2jppIQe5vh2/QXQ2oJCmsZTH6 KPYw== X-Forwarded-Encrypted: i=1; AJvYcCVHVNEmyKZxTmzjFQ4VV2WqEcSL0e641jYlNiPas7ue/cOaLp+SffrAZzrASpa4o+estKbYfKMegSLz0Q==@vger.kernel.org X-Gm-Message-State: AOJu0Yz0Y9yBpxvwjb8V76QfrDUp77YEjBuZKDDU7RnqeIrLcOMugeEK hwCrqFlCGkkeTXhpNwT+e+m3SUB2AZO42jUJ5NdpnWCTa0rO+D1g4tfj5IcABYo= X-Gm-Gg: ASbGncuMF9b5tET54KU+k74UhHdLIXoQ2YU1DB0BrnS/RnP3eMXC5XnehXkpTXO5TjD a1UtSIxSfyYTDy+J7YJ+DKjiPsSSX0tauPEBz7rDtSYedJFY9fRMeFv8eCHTrDBAKn8Bx9VhBkY vbgUS3/UDab5qWYJ770lqDzUedJNAqgc14NvA0Lb0+fE3hbabPTf3+faKx4Fj/ebnjjylIZhNzT Y8JYJibWl8L6+NJZDcpx575Cp5IS1+7MfJO3qvHJsC/5kL9jTc2B+r8fBr6Us2frTa+ X-Google-Smtp-Source: AGHT+IEnnJB14dmBBtCUVZUo7FwFnSPPUoxdpWvs9jXkgkA2nLE2vDwuDDtcGJUwu67g5cwPgpB16w== X-Received: by 2002:a05:6a20:e68c:b0:1e0:d380:fe66 with SMTP id adf61e73a8af0-1e5dfb408camr88880445637.0.1736251686998; Tue, 07 Jan 2025 04:08:06 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:06 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 04/22] ublk: move ublk into one standalone directory Date: Tue, 7 Jan 2025 20:03:55 +0800 Message-ID: <20250107120417.1237392-5-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Prepare for supporting ublk-bpf, which has to add more source files, so create ublk/ for avoiding to pollute drivers/block/ Meantime rename the source file as ublk/main.c Signed-off-by: Ming Lei --- MAINTAINERS | 2 +- drivers/block/Kconfig | 32 +------------------- drivers/block/Makefile | 2 +- drivers/block/ublk/Kconfig | 36 +++++++++++++++++++++++ drivers/block/ublk/Makefile | 7 +++++ drivers/block/{ublk_drv.c => ublk/main.c} | 0 6 files changed, 46 insertions(+), 33 deletions(-) create mode 100644 drivers/block/ublk/Kconfig create mode 100644 drivers/block/ublk/Makefile rename drivers/block/{ublk_drv.c => ublk/main.c} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index c575de4903db..890f6195d03f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23982,7 +23982,7 @@ M: Ming Lei L: linux-block@vger.kernel.org S: Maintained F: Documentation/block/ublk.rst -F: drivers/block/ublk_drv.c +F: drivers/block/ublk/ F: include/uapi/linux/ublk_cmd.h UBSAN diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index a97f2c40c640..4e5144183ade 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -379,37 +379,7 @@ config BLK_DEV_RBD If unsure, say N. -config BLK_DEV_UBLK - tristate "Userspace block driver (Experimental)" - select IO_URING - help - io_uring based userspace block driver. Together with ublk server, ublk - has been working well, but interface with userspace or command data - definition isn't finalized yet, and might change according to future - requirement, so mark is as experimental now. - - Say Y if you want to get better performance because task_work_add() - can be used in IO path for replacing io_uring cmd, which will become - shared between IO tasks and ubq daemon, meantime task_work_add() can - can handle batch more effectively, but task_work_add() isn't exported - for module, so ublk has to be built to kernel. - -config BLKDEV_UBLK_LEGACY_OPCODES - bool "Support legacy command opcode" - depends on BLK_DEV_UBLK - default y - help - ublk driver started to take plain command encoding, which turns out - one bad way. The traditional ioctl command opcode encodes more - info and basically defines each code uniquely, so opcode conflict - is avoided, and driver can handle wrong command easily, meantime it - may help security subsystem to audit io_uring command. - - Say Y if your application still uses legacy command opcode. - - Say N if you don't want to support legacy command opcode. It is - suggested to enable N if your application(ublk server) switches to - ioctl command encoding. +source "drivers/block/ublk/Kconfig" source "drivers/block/rnbd/Kconfig" diff --git a/drivers/block/Makefile b/drivers/block/Makefile index 1105a2d4fdcb..a6fdc62b817c 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -40,6 +40,6 @@ obj-$(CONFIG_BLK_DEV_RNBD) += rnbd/ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/ -obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o +obj-$(CONFIG_BLK_DEV_UBLK) += ublk/ swim_mod-y := swim.o swim_asm.o diff --git a/drivers/block/ublk/Kconfig b/drivers/block/ublk/Kconfig new file mode 100644 index 000000000000..b06e3df09779 --- /dev/null +++ b/drivers/block/ublk/Kconfig @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# ublkl block device driver configuration +# + +config BLK_DEV_UBLK + tristate "Userspace block driver (Experimental)" + select IO_URING + help + io_uring based userspace block driver. Together with ublk server, ublk + has been working well, but interface with userspace or command data + definition isn't finalized yet, and might change according to future + requirement, so mark is as experimental now. + + Say Y if you want to get better performance because task_work_add() + can be used in IO path for replacing io_uring cmd, which will become + shared between IO tasks and ubq daemon, meantime task_work_add() can + can handle batch more effectively, but task_work_add() isn't exported + for module, so ublk has to be built to kernel. + +config BLKDEV_UBLK_LEGACY_OPCODES + bool "Support legacy command opcode" + depends on BLK_DEV_UBLK + default y + help + ublk driver started to take plain command encoding, which turns out + one bad way. The traditional ioctl command opcode encodes more + info and basically defines each code uniquely, so opcode conflict + is avoided, and driver can handle wrong command easily, meantime it + may help security subsystem to audit io_uring command. + + Say Y if your application still uses legacy command opcode. + + Say N if you don't want to support legacy command opcode. It is + suggested to enable N if your application(ublk server) switches to + ioctl command encoding. diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile new file mode 100644 index 000000000000..30e06b74dd82 --- /dev/null +++ b/drivers/block/ublk/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +# needed for trace events +ccflags-y += -I$(src) + +ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o +obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk/main.c similarity index 100% rename from drivers/block/ublk_drv.c rename to drivers/block/ublk/main.c From patchwork Tue Jan 7 12:03:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928776 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 538E01E633C; Tue, 7 Jan 2025 12:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251702; cv=none; b=raTnVibM21W+Pq247vttSDk60ChNwn9Zk8sGpRQhPUs92EYV3FqwrnBiHzT1UeTjlVIzQ412acb3+spkhmXY+pAQgesVlz40L4WVYFhffa/vNMpSziYgajTnH0WTe5WexqelE1glE/KxJxr4oVEdlO3Egg9QNs3JRtMuXArUcas= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251702; c=relaxed/simple; bh=+w6vfnNIOAsGGzyADWtv6PequakMaBaQyMrXmjHxk5k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hhdqBECg3s5hGXvt+W2GCn/yFyvje32Zg2UF52tI9GP69Rwm5t3Zb5edqlfvFjqI/+z3jzIoiaAXOGW3Cow2BBQm8ynjWAX0503Omu2bORcPd4i5abRyhzqNgd+35fYdTH8sgW1BOmsMuZHw+f5d3cd8EwGVd4O0hm7NGfom6b0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T00pvf3D; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T00pvf3D" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2162c0f6a39so232873235ad.0; Tue, 07 Jan 2025 04:08:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251690; x=1736856490; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+5rVHdH2toybo/cyBeNRe7L6eC3QpRQ56FS/o4CoUgg=; b=T00pvf3DNlCJGbqop6nogaVDpV+izttUCj2GlLj6Q2Fqg/FRLEl1ydnDIxEAQjdAw0 +ge8tBcNroKahfEnYOD/LR7gGM+NDHDcpVmrq+BX8wG2zkGR70uh9kCFoeeXwSEHSmib aVCsf1fUFL4vhN6pAZZ2J/bwDUZfiRhl6ZEk22WhQXay+ErgK6Y2Rn08/jA5dfwgKSex lhECxFCZ8J/q1Jsq5uiqyWzhKWaDrwYGmtKGIVdMVjPO1FUO52ICkForMleHtMDXoFvI SO2v0T0XUmkHryAIfOUhIJc0lYQERWHpK/FKiu/73tNrthzloeC41vJd2Vak7mfs/neh jypQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251690; x=1736856490; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+5rVHdH2toybo/cyBeNRe7L6eC3QpRQ56FS/o4CoUgg=; b=eglaXqgjnCV2IuL6v1MkfnIKxDFXJWUgV9iIEF5GiGFY9LajHgz8bho6cdyPejoRby D60PW2K7O5DkM7dkOut3LHNXpdZFhcVsXrcMoTLM86BDhWudZIBk3vEzxZOoV2YCJhUw zZsLal5t/ucu8IwHi7TzOAaYCZjmF1HXVVdnjFxViCi9rJFgBX6A5ypo9FCBS4VZTWiq DFcpHj6sBXGM2qI3LNIcD5k8tchcN7Dsl6Fb2jOqm98mDDM4FeHjilQvAMHVIrY5xcFS L0ANzQQ5aw+i6udc51hEafLLyxfjqSWkJ9T4+WmX8V5H8UQ+ggfQKuVY1NhMvB+nduU+ YbVg== X-Forwarded-Encrypted: i=1; AJvYcCWHIoBpOHqGm54M9ohPna9vWrs2+h2mu62vTV/9O5yAQqYyNsqR97jJ98YYEOU+DviTJsr2BHyC0M1Nnw==@vger.kernel.org X-Gm-Message-State: AOJu0YyuLqPjEg2HClXe8Xmgp5zPDaAq3ciIM57dRh82XFrORfmWTGoa jnvCm0Z7bwg/BUZRBE3nw/PdPBtqB0LdKCLeqze67r1j9t0+xZnv X-Gm-Gg: ASbGncuoV6Rw8Vzz2mGapZv4etMM4136O4qbpzph17CMmMbjxw4t2nNGRjnzT+zJcB3 qlSS8TrFDDHcWKUIB8aNdKGi7c6P0QJmvahuLTIGnBqMdSx4OaqLVejXdwIBJVu1E7J6/3wc5te HjfPhNzVT2Q+sVymj7tXgcqsm9eCsEayujNUME1MEnuS4VHYs5jjhRH35oKWPwEhJdc6x1/vpNl eN/cCvsWuND8CnosXG0n34shvaz1m9cHYSr96c62e+bWp0FLdkOh7ZL1fQAbAoKug5J X-Google-Smtp-Source: AGHT+IFr1d4pwqLkSCGFFQm9QKDqDv3wd+SOHJxuD+sYVAabFB2Hslc9uE3oYjnCZhjGR4XE+k4XHQ== X-Received: by 2002:a05:6a21:9211:b0:1e1:b0e8:11dc with SMTP id adf61e73a8af0-1e745ce0e21mr4577306637.21.1736251690088; Tue, 07 Jan 2025 04:08:10 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:09 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 05/22] ublk: move private definitions into private header Date: Tue, 7 Jan 2025 20:03:56 +0800 Message-ID: <20250107120417.1237392-6-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add one private header file and move private definitions into this file. Signed-off-by: Ming Lei --- drivers/block/ublk/main.c | 150 +----------------------------------- drivers/block/ublk/ublk.h | 157 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 158 insertions(+), 149 deletions(-) create mode 100644 drivers/block/ublk/ublk.h diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index 1a63a1aa99ed..2510193303bb 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include #include @@ -35,162 +34,15 @@ #include #include #include -#include #include -#include #include #include #include #include #include #include -#include - -#define UBLK_MINORS (1U << MINORBITS) - -/* private ioctl command mirror */ -#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) - -/* All UBLK_F_* have to be included into UBLK_F_ALL */ -#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \ - | UBLK_F_URING_CMD_COMP_IN_TASK \ - | UBLK_F_NEED_GET_DATA \ - | UBLK_F_USER_RECOVERY \ - | UBLK_F_USER_RECOVERY_REISSUE \ - | UBLK_F_UNPRIVILEGED_DEV \ - | UBLK_F_CMD_IOCTL_ENCODE \ - | UBLK_F_USER_COPY \ - | UBLK_F_ZONED \ - | UBLK_F_USER_RECOVERY_FAIL_IO) - -#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ - | UBLK_F_USER_RECOVERY_REISSUE \ - | UBLK_F_USER_RECOVERY_FAIL_IO) - -/* All UBLK_PARAM_TYPE_* should be included here */ -#define UBLK_PARAM_TYPE_ALL \ - (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \ - UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED) - -struct ublk_rq_data { - struct llist_node node; - - struct kref ref; -}; - -struct ublk_uring_cmd_pdu { - struct ublk_queue *ubq; - u16 tag; -}; - -/* - * io command is active: sqe cmd is received, and its cqe isn't done - * - * If the flag is set, the io command is owned by ublk driver, and waited - * for incoming blk-mq request from the ublk block device. - * - * If the flag is cleared, the io command will be completed, and owned by - * ublk server. - */ -#define UBLK_IO_FLAG_ACTIVE 0x01 - -/* - * IO command is completed via cqe, and it is being handled by ublksrv, and - * not committed yet - * - * Basically exclusively with UBLK_IO_FLAG_ACTIVE, so can be served for - * cross verification - */ -#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02 - -/* - * IO command is aborted, so this flag is set in case of - * !UBLK_IO_FLAG_ACTIVE. - * - * After this flag is observed, any pending or new incoming request - * associated with this io command will be failed immediately - */ -#define UBLK_IO_FLAG_ABORTED 0x04 - -/* - * UBLK_IO_FLAG_NEED_GET_DATA is set because IO command requires - * get data buffer address from ublksrv. - * - * Then, bio data could be copied into this data buffer for a WRITE request - * after the IO command is issued again and UBLK_IO_FLAG_NEED_GET_DATA is unset. - */ -#define UBLK_IO_FLAG_NEED_GET_DATA 0x08 - -/* atomic RW with ubq->cancel_lock */ -#define UBLK_IO_FLAG_CANCELED 0x80000000 -struct ublk_io { - /* userspace buffer address from io cmd */ - __u64 addr; - unsigned int flags; - int res; - - struct io_uring_cmd *cmd; -}; - -struct ublk_queue { - int q_id; - int q_depth; - - unsigned long flags; - struct task_struct *ubq_daemon; - char *io_cmd_buf; - - struct llist_head io_cmds; - - unsigned short force_abort:1; - unsigned short timeout:1; - unsigned short canceling:1; - unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ - unsigned short nr_io_ready; /* how many ios setup */ - spinlock_t cancel_lock; - struct ublk_device *dev; - struct ublk_io ios[]; -}; - -struct ublk_device { - struct gendisk *ub_disk; - - char *__queues; - - unsigned int queue_size; - struct ublksrv_ctrl_dev_info dev_info; - - struct blk_mq_tag_set tag_set; - - struct cdev cdev; - struct device cdev_dev; - -#define UB_STATE_OPEN 0 -#define UB_STATE_USED 1 -#define UB_STATE_DELETED 2 - unsigned long state; - int ub_number; - - struct mutex mutex; - - spinlock_t lock; - struct mm_struct *mm; - - struct ublk_params params; - - struct completion completion; - unsigned int nr_queues_ready; - unsigned int nr_privileged_daemon; - - struct work_struct nosrv_work; -}; - -/* header of ublk_params */ -struct ublk_params_header { - __u32 len; - __u32 types; -}; +#include "ublk.h" static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq); diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h new file mode 100644 index 000000000000..12e39a33015a --- /dev/null +++ b/drivers/block/ublk/ublk.h @@ -0,0 +1,157 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#ifndef UBLK_INTERNAL_HEADER +#define UBLK_INTERNAL_HEADER + +#include +#include +#include +#include + +#define UBLK_MINORS (1U << MINORBITS) + +/* private ioctl command mirror */ +#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) + +/* All UBLK_F_* have to be included into UBLK_F_ALL */ +#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \ + | UBLK_F_URING_CMD_COMP_IN_TASK \ + | UBLK_F_NEED_GET_DATA \ + | UBLK_F_USER_RECOVERY \ + | UBLK_F_USER_RECOVERY_REISSUE \ + | UBLK_F_UNPRIVILEGED_DEV \ + | UBLK_F_CMD_IOCTL_ENCODE \ + | UBLK_F_USER_COPY \ + | UBLK_F_ZONED \ + | UBLK_F_USER_RECOVERY_FAIL_IO) + +#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ + | UBLK_F_USER_RECOVERY_REISSUE \ + | UBLK_F_USER_RECOVERY_FAIL_IO) + +/* All UBLK_PARAM_TYPE_* should be included here */ +#define UBLK_PARAM_TYPE_ALL \ + (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \ + UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED) + +struct ublk_rq_data { + struct llist_node node; + + struct kref ref; +}; + +struct ublk_uring_cmd_pdu { + struct ublk_queue *ubq; + u16 tag; +}; + +/* + * io command is active: sqe cmd is received, and its cqe isn't done + * + * If the flag is set, the io command is owned by ublk driver, and waited + * for incoming blk-mq request from the ublk block device. + * + * If the flag is cleared, the io command will be completed, and owned by + * ublk server. + */ +#define UBLK_IO_FLAG_ACTIVE 0x01 + +/* + * IO command is completed via cqe, and it is being handled by ublksrv, and + * not committed yet + * + * Basically exclusively with UBLK_IO_FLAG_ACTIVE, so can be served for + * cross verification + */ +#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02 + +/* + * IO command is aborted, so this flag is set in case of + * !UBLK_IO_FLAG_ACTIVE. + * + * After this flag is observed, any pending or new incoming request + * associated with this io command will be failed immediately + */ +#define UBLK_IO_FLAG_ABORTED 0x04 + +/* + * UBLK_IO_FLAG_NEED_GET_DATA is set because IO command requires + * get data buffer address from ublksrv. + * + * Then, bio data could be copied into this data buffer for a WRITE request + * after the IO command is issued again and UBLK_IO_FLAG_NEED_GET_DATA is unset. + */ +#define UBLK_IO_FLAG_NEED_GET_DATA 0x08 + +/* atomic RW with ubq->cancel_lock */ +#define UBLK_IO_FLAG_CANCELED 0x80000000 + +struct ublk_io { + /* userspace buffer address from io cmd */ + __u64 addr; + unsigned int flags; + int res; + + struct io_uring_cmd *cmd; +}; + +struct ublk_queue { + int q_id; + int q_depth; + + unsigned long flags; + struct task_struct *ubq_daemon; + char *io_cmd_buf; + + struct llist_head io_cmds; + + unsigned short force_abort:1; + unsigned short timeout:1; + unsigned short canceling:1; + unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ + unsigned short nr_io_ready; /* how many ios setup */ + spinlock_t cancel_lock; + struct ublk_device *dev; + struct ublk_io ios[]; +}; + +struct ublk_device { + struct gendisk *ub_disk; + + char *__queues; + + unsigned int queue_size; + struct ublksrv_ctrl_dev_info dev_info; + + struct blk_mq_tag_set tag_set; + + struct cdev cdev; + struct device cdev_dev; + +#define UB_STATE_OPEN 0 +#define UB_STATE_USED 1 +#define UB_STATE_DELETED 2 + unsigned long state; + int ub_number; + + struct mutex mutex; + + spinlock_t lock; + struct mm_struct *mm; + + struct ublk_params params; + + struct completion completion; + unsigned int nr_queues_ready; + unsigned int nr_privileged_daemon; + + struct work_struct nosrv_work; +}; + +/* header of ublk_params */ +struct ublk_params_header { + __u32 len; + __u32 types; +}; + + +#endif From patchwork Tue Jan 7 12:03:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928777 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB2EA1EE7AA; Tue, 7 Jan 2025 12:08:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251708; cv=none; b=uPHqWv62BAEdcH/WpU/A1VVsVCyFAaYpgmu/566GcWoqnnZddenl5TDyFDMU4tpnMD2m3aYr4iEfNf4WrI8TwwiyjnInbYDMCCQ8SYWS/wCJOopgZ12ASPOINRuMMowi5ux33jb8tYaOjtTvTeaFBEDHI2JmSBTZvzNbxfXqkVw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251708; c=relaxed/simple; bh=KpB8oZdccyptJ51tr/eLtNf7aCHFsrx2/aU7p0RUFOY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CnBPD/l+4a3RBNLs8rA7EXG9bfUMiSjmM2zR+vE8t4qYFrOIPV4mU/T5KpNFL3s6Rm+CV1fKYXrw1IS5g9/Lb/cgYhPlmX4YQw6DYTW/aRF93FLKFgwTUChEXyie4yRCmRFU1ItvFNy9MbjXpXlrrn0ehUdKACuBso/93GY6vbE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JgTFJyea; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JgTFJyea" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-218c8aca5f1so31260545ad.0; Tue, 07 Jan 2025 04:08:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251693; x=1736856493; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=75TVTqAygBf7jlHWTgqW0xceJCICHP0IxYYCkmaSwCo=; b=JgTFJyeaFwdGfvhgd0cMTc1oeQMl/H3BVlHOhBp/trZ5c0Ost+RAS0Eye8G4xAbGOt fnXF/QnCS3p6IxOMYUyt6IllYT8OggDK+HGFPcQGKNUx/jdgJYZhU6uKeaVnkCnPWGzZ wpxTmtWZ104rA+Mmb42BGrxlTyL1d7ZgY9StlA9Jf3jNVgm00ulD3SuiBG/t1h/nzlyP qotE0d5VupWJnXgTfz9iB+fhW15LGrXKVqlHdtwuh/R0RKAs44UFB63x1xvGY+fOuauj QYHLPQTS01+nYD/R5DpxxUqAWR+XpPxvGJlqb5dqFMWcZHmdQpdElrZU697tvQ7w94JM /AJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251693; x=1736856493; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=75TVTqAygBf7jlHWTgqW0xceJCICHP0IxYYCkmaSwCo=; b=VmZvm8uGDqsPOfWgWKrUDA5N6Thg4l+30dIdFfk2zrqdfx7HuQpxjdXJDr0Dw/N/Eg armsJ64MZcsjm3uqDGBINUvtxs4Qg3tYjYDETEt1iwJjuaKLXG5s/SZNeUK2ji4tVTgA VZMQ4SGj9Zeq3PfX1l5EaXDuk+XrqCQEql3aDdFrLpN07VwH7eC4HOjbkMzTnUv4Kxa5 ySctiidxK5+pg+Fl9/59zVorBXxrQ1GGFEHUcvPBGLiZLLMOcWHmMxmZodh6U0nyLHbA jSRDhdgM62ervQLi2wq13+J6w7wwgpWSURA9FveYhUZftx85oPPzs2yQJ4N/A2TCYCzN 31Lg== X-Forwarded-Encrypted: i=1; AJvYcCWfpIO/HEhEXFiTFmpYi8cvEAl/8EG+TZgyS7fDWz7TeyPUXgFBT+8vramKDPW/r+mIS1Vnv0DgZBePoA==@vger.kernel.org X-Gm-Message-State: AOJu0YyQK+USB3yctFIUCZ966/6P52i2r2IeMcW/2MVPAOOCxRrWrUEr KWBBG5VtSxpuCkpipE7v7CxHbuOS9cbKcj3L/+o2EyRR3sdw3BZ2 X-Gm-Gg: ASbGncuNxsr2dVOyzoS5ujx6NRXl3tO0ZHs87DiOg8NT6UzXsjQLdFELv9KCP8+vNeb VLPwgCKwCdxSR8443S6cTJQkyPTdMYhM/7PcTrVqwKfKvRWPKGjvmZslstlIGTcvDk9lSII+TmM BjbAb/6AsNYu+gHuTb3nK7PqrCt8x+EIb9dT4+VrmzBNtnYaOXSx4AdPkzmHwf6ulShd7Tno9fP FfNgEF3GZGIARO7ClCuMfX28ayeEAYbc/ldsHoWN8g+yC/uQgeLZ1tIc7ULEEKl/7as X-Google-Smtp-Source: AGHT+IHbDkTIn3sGKq6vC1/3PDsX5aI/VlFvjNwqfXOnZJPV1ZofFXkJR5hvFT/1pjOHNYIgrtrMnA== X-Received: by 2002:a05:6a00:114b:b0:729:49a:2da6 with SMTP id d2e1a72fcca58-72abdd3c2a4mr94023130b3a.3.1736251693210; Tue, 07 Jan 2025 04:08:13 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:12 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 06/22] ublk: move several helpers to private header Date: Tue, 7 Jan 2025 20:03:57 +0800 Message-ID: <20250107120417.1237392-7-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Move several helpers into the private header so that make them visible to the whole driver, and prepare for supporting ublk-bpf. Signed-off-by: Ming Lei --- drivers/block/ublk/main.c | 16 +++------------- drivers/block/ublk/ublk.h | 11 +++++++++++ 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index 2510193303bb..aefb414ebf6c 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -47,8 +47,6 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq); static inline unsigned int ublk_req_build_flags(struct request *req); -static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, - int tag); static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub) { return ub->dev_info.flags & UBLK_F_USER_COPY; @@ -325,7 +323,6 @@ static blk_status_t ublk_setup_iod_zoned(struct ublk_queue *ubq, #endif -static inline void __ublk_complete_rq(struct request *req); static void ublk_complete_rq(struct kref *ref); static dev_t ublk_chr_devt; @@ -496,7 +493,7 @@ static noinline struct ublk_device *ublk_get_device(struct ublk_device *ub) } /* Called in slow path only, keep it noinline for trace purpose */ -static noinline void ublk_put_device(struct ublk_device *ub) +void ublk_put_device(struct ublk_device *ub) { put_device(&ub->cdev_dev); } @@ -512,13 +509,6 @@ static inline bool ublk_rq_has_data(const struct request *rq) return bio_has_data(rq->bio); } -static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, - int tag) -{ - return (struct ublksrv_io_desc *) - &(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]); -} - static inline char *ublk_queue_cmd_buf(struct ublk_device *ub, int q_id) { return ublk_get_queue(ub, q_id)->io_cmd_buf; @@ -887,7 +877,7 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) } /* todo: handle partial completion */ -static inline void __ublk_complete_rq(struct request *req) +void __ublk_complete_rq(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; @@ -2082,7 +2072,7 @@ static void ublk_remove(struct ublk_device *ub) ublks_added--; } -static struct ublk_device *ublk_get_device_from_id(int idx) +struct ublk_device *ublk_get_device_from_id(int idx) { struct ublk_device *ub = NULL; diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 12e39a33015a..76aee4225c78 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -154,4 +154,15 @@ struct ublk_params_header { }; +static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, + int tag) +{ + return (struct ublksrv_io_desc *) + &(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]); +} + +struct ublk_device *ublk_get_device_from_id(int idx); +void ublk_put_device(struct ublk_device *ub); +void __ublk_complete_rq(struct request *req); + #endif From patchwork Tue Jan 7 12:03:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928778 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 696CA1EF080; Tue, 7 Jan 2025 12:08:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251706; cv=none; b=hYl9+AaZ2QxR/6tnj/oGCDxcuRRMG3z4n/LFizRevJf627OmtBgGxf42Roz1ZwzwXBBn9/vssOyL/7Oi+4lIQtTxT+V6enJ2ak2BBpBj7gBmgTIzirC/2MPLHVvkYmm55mCD3YvOVyU21T/6pBQKev1huJgeQGBtwxmKy+GOTcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251706; c=relaxed/simple; bh=rwh/Sdvw8Cyqx4MZdZgM7ir/civJuEX2EqHleHtzMmA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OEEg/pN+K/dgA6d9qqEmqE4V7DVq1ftjMdS4HU1d0pA/qSjbCLv8rtZs3TcOt5nLXbfBG/Eyylz7/kew2tF23jzUROMww1wE5PPrNjJLV8NObxG8B2sOG6iiaAL54iFuwRxoBXg315UzR4pDv/IAHesrozPVmnXNREpoe5M/fPU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B0STvmqv; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B0STvmqv" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-21a7ed0155cso4873555ad.3; Tue, 07 Jan 2025 04:08:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251697; x=1736856497; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/OCRtXrynCQq34YyOvSw6d6QSzz1lqkWDPmH7k10/UA=; b=B0STvmqvYoyuPMS4pAlkzvZ7rsPp3UCSiql1cD7bD40YW7xz9I2YDZQq1bN3FyXdX9 zbtZPPiJnsHT8zHV9cBRJJJ2fM+n4QIGuKG4wqUybsGrYRfNOJivO+5APyha0vDWlRJ4 WS7k7Muu1fWpHV5nSaJP9MnFwBfKSATeg4vKe1A0RCL35dP2vF4oIxT7bVhAYmymxFVu ASooHsnsgL7FF+U8fDPHc7tcyTbtC96XqQDx7wwvGb7jm9qXwTeCgSJ3FxabZomlj6NG z4cYWdTCUvBuGkGPan7OVhHP7OkhHk0w4lRF/PMoj9i90sEcLGKZV0mZ3+cXCnj1NeGC zqtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251697; x=1736856497; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/OCRtXrynCQq34YyOvSw6d6QSzz1lqkWDPmH7k10/UA=; b=JVpF/RakXvmDEPEf2zMqHIPoHtg+T7Z6KeoRipEwGD+Nj70nlUvWKGHuJamgr8AT+Z zeVDrPyBzL7Dq52aPtJ/JZx2ewVRQDdG0FJ0asm4D94kyLNMgSDmUWHWr7O73jm2AN2v R+5lRjXp+5n+9fQuewlb5hhclnM6w77yGeRgNX0JZz5Y2XHVQkbFjusVYdtHSS25dEDE MBtNBZ4dxPFN56kRCoSfiRWOtgDOAtbe1iRBm2g3LL/EWvgNrfVK1g9R51LGcj9a1xsD pFfodyW+sZXoE1ITGaB5RNULYDG6lF7IUMZUKGt4NwQQSYJprk+ZjMw8nX8uH1y9/WwU 8fyg== X-Forwarded-Encrypted: i=1; AJvYcCUOXmu2r/j5OINB8vkbt3Hm9l+jqkSIfd9MN+Djrh01CSTSGcDA6MANYGQv2+vVny0W+xI8L3Ogmpr4SQ==@vger.kernel.org X-Gm-Message-State: AOJu0YzbNrn9NXUZ+C/EYEmsJqmvO69OdbALLm++NyA2/QyQaxVfJDIY +obxhxSrI8pFvEXA3O240H63GipmgPRtCzeuZlfQiB8hkb7dcrpb X-Gm-Gg: ASbGncsNVuRyJUcVtDMoTx49MyCMn71+ZBmy0M+C/5xjzL1XiwFMfAsgHrn800YZ5CC pcHFk3L18Ir47SXVyK46awjxKwf1y6LU46VUWSEqyN1ex3eF4KGt2/heyeUi3hZAkj00xqNdP9R UhtIja1xtaeR/oBbxHvw0k9s2OrZydz4aB62BligseCuS8LorFeh6HGMFWDjIhJdTooIpxm6jz7 EnFu/5XfM9rosCUMoSdJsYEN/zQfv2smDo6dDx2KMxDL3iiFJQ/n9Vw4b/dsjWCSD1+ X-Google-Smtp-Source: AGHT+IHCFAXbWhFBOxYybnDYfv8S6jfPyrWW+tnhPMAI2bNhqoH8muMHe4XnlBcn4OymqwA7eOdodg== X-Received: by 2002:a05:6a00:3cc1:b0:725:df1a:288 with SMTP id d2e1a72fcca58-72abe18acb0mr85121816b3a.24.1736251696623; Tue, 07 Jan 2025 04:08:16 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:15 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Date: Tue, 7 Jan 2025 20:03:58 +0800 Message-ID: <20250107120417.1237392-8-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add bpf prog attach helpers and prepare for supporting ublk bpf, in which multiple ublk device may attach to same bpf prog, and there can be multiple bpf progs. `bpf_prog_consumer` will be embedded in the bpf prog user side, such as ublk device, `bpf_prog_provider` will be embedded in the bpf struct_ops prog side. Signed-off-by: Ming Lei --- drivers/block/ublk/bpf_reg.h | 77 ++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 drivers/block/ublk/bpf_reg.h diff --git a/drivers/block/ublk/bpf_reg.h b/drivers/block/ublk/bpf_reg.h new file mode 100644 index 000000000000..79d02e93aea8 --- /dev/null +++ b/drivers/block/ublk/bpf_reg.h @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#ifndef UBLK_INT_BPF_REG_HEADER +#define UBLK_INT_BPF_REG_HEADER + +#include + +struct bpf_prog_consumer; +struct bpf_prog_provider; + +typedef int (*bpf_prog_attach_t)(struct bpf_prog_consumer *consumer, + struct bpf_prog_provider *provider); +typedef void (*bpf_prog_detach_t)(struct bpf_prog_consumer *consumer, + bool unreg); + +struct bpf_prog_consumer_ops { + bpf_prog_attach_t attach_fn; + bpf_prog_detach_t detach_fn; +}; + +struct bpf_prog_consumer { + const struct bpf_prog_consumer_ops *ops; + unsigned int prog_id; + struct list_head node; + struct bpf_prog_provider *provider; +}; + +struct bpf_prog_provider { + struct list_head list; +}; + +static inline void bpf_prog_provider_init(struct bpf_prog_provider *provider) +{ + INIT_LIST_HEAD(&provider->list); +} + +static inline bool bpf_prog_provider_is_empty( + struct bpf_prog_provider *provider) +{ + return list_empty(&provider->list); +} + +static inline int bpf_prog_consumer_attach(struct bpf_prog_consumer *consumer, + struct bpf_prog_provider *provider) +{ + const struct bpf_prog_consumer_ops *ops = consumer->ops; + + if (!ops || !ops->attach_fn) + return -EINVAL; + + if (ops->attach_fn) { + int ret = ops->attach_fn(consumer, provider); + + if (ret) + return ret; + } + consumer->provider = provider; + list_add(&consumer->node, &provider->list); + return 0; +} + +static inline void bpf_prog_consumer_detach(struct bpf_prog_consumer *consumer, + bool unreg) +{ + const struct bpf_prog_consumer_ops *ops = consumer->ops; + + if (!consumer->provider) + return; + + if (!list_empty(&consumer->node)) { + if (ops && ops->detach_fn) + ops->detach_fn(consumer, unreg); + list_del_init(&consumer->node); + consumer->provider = NULL; + } +} + +#endif From patchwork Tue Jan 7 12:03:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928780 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E3431EE7BB; Tue, 7 Jan 2025 12:08:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251718; cv=none; b=VeTuTKo7Kj+bictEogWA5Ihp+uHEf0rhQQPmzR3en6wKKMXKmpu7IH1NwR6tOuqizwwoJMFrMqDpPic1+YHWdlZiAvDQbICLifCZhpwvt/p9/72pcr4xA7KJ+3wxjwb38yGQNlCGmIdN6fqsSxnWPKsWBgeM+TWlRdYYK0xt878= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251718; c=relaxed/simple; bh=u0GtLfmQGUEfx5RJXT7SHBRXTxZavPgYb3DiQyrE32g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MJ09Uy72Zt1zP/3Pwa4la6QgzJtsReK8WUwVzkdA3E9G3Tut9v1i7Fl+y74robm2ALLk4tzKJTctnFizpS5ZbgTQWpWK6wWTVIkJgMUzmXQZKM4P8DKev1xaBjnzMZPkTL41GeVT121NRW+rEwp58QzeT/Gf/BabEpb/lPevqzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VriRcPjo; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VriRcPjo" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21628b3fe7dso218295135ad.3; Tue, 07 Jan 2025 04:08:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251700; x=1736856500; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8SxNlHBwGatFKyZh1YrXFFqd/Sw71g3y4E/dnODeXAw=; b=VriRcPjo2EO3y7m9YhkIQc0kKH1mE7WJgP0lbvRK+HOLZtEM69ZlvXJdDjAUGHH1Qo EamcceRrqtQxcDTz0vRIkUbPBQwi4IJwr+Eqr9ahnyZu0Aao6SsFSR2k3fJUS1IZKUeN Ym5SLLt1DUvumvBtMwIFqvoCAr5TI9GdiRWoYGCZUqFHZP2tH/grXHAN7UNweuZeHEUL O7jOUuJJZREA6cy7cH5dh7HpukL+wHBd7AnBHYJqo7LSBCbIPjfF9GUc8GgfVut/h75x vURhr6Qltx6nRnoo6JerbTp9HUVs1nVF9yTuxlrPhKs+0VlIKXKtDNSJs7Yqqxut+v2+ GEMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251700; x=1736856500; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8SxNlHBwGatFKyZh1YrXFFqd/Sw71g3y4E/dnODeXAw=; b=V2JwA9pZwjEFuQuFLh7qXlZiCb85yCdgYIMHGdmLrei9g6yGfxXL/UVo1RMISdfpF4 tOR0Ikx/NoJhZ+yuNxsSLyqHlhQ61rfwFcGB+bUS2Tjc6Y1myIiwqpUMcwvtF1ZqFo9V ompHo7vvYY4F6uqGxGLEKmYVTn5kz/mqYd2BA5LDhjP7d/b1rYSWMusmzxhaI4uIfeXI WpBSSQdubSoDyUBMO8EziS/Pq077ShTmeEA2HwnohswmS8fMv7tEWHwmtsYOvx0tNvWD eTiYqGd5Dn+dGlwm1rUKkit+yh7A9Mtds+r3IaR8YoB3O256WBQyfnG1XdydIR4+6MuV ECgA== X-Forwarded-Encrypted: i=1; AJvYcCXABYfqe2l7NYoMKWaxEmI/LTDGjsUr3MKvoTDnyiM7+5MNqvDcQgOp4DEbT2/hGm0qNIf0MKJJE5RTRg==@vger.kernel.org X-Gm-Message-State: AOJu0Yy7Sx+ic8+1aR1wgFKFpUXSZJTaqCsf0W5krIq6HQOIHU3ptShx 2PsmxRpxdB6eu1+9On/Bv7n6cIJzdeR30ACKRyvOvC/38Ohv1KO3 X-Gm-Gg: ASbGncvz2tG3/vkveGSTsuJNbxmRTCTx9qBF8Lel2iEFaP+wMWKkg1ycPZ3Doq2dTF4 Vry6tWIOXUP1x5MrP0qbzYf/vDoX9nxxPJF/I0plIUSF8J9HmZ6PaXaTCxj+wajONMNOkLLgiRE aVq6m3/7jD91wy+LzR8RRDrCh/W2Ol0l3ElWJz78NuxxGXkY3o2xW6sf8GV2rhx4ZF2GtBtWxpR i+gnMlzAbF5kQA/nt3iUy3Q0Zpad0lVZ5DUlID5Sgt74taEyFcdwqJJRj9TvUsYSFbG X-Google-Smtp-Source: AGHT+IEhaylrPRTNtO3wFMYD29Rk/aijozZgikBVkn5Ht+Ts8WhoNQYAiteilc2iDOwyVJSZwxrfJQ== X-Received: by 2002:a05:6a20:7288:b0:1e1:aa24:2e5c with SMTP id adf61e73a8af0-1e5e083f156mr89926543637.38.1736251699700; Tue, 07 Jan 2025 04:08:19 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:19 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Date: Tue, 7 Jan 2025 20:03:59 +0800 Message-ID: <20250107120417.1237392-9-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add struct_ops support for ublk, so we can use struct_ops bpf prog to handle ublk IO command with application defined struct_ops. Follows the motivation for ublk-bpf: 1) support stacking ublk - there are many 3rd party volume manager, ublk may be built over ublk device for simplifying implementation, however, multiple userspace-kernel context switch for handling one single IO can't be accepted from performance view of point - ublk-bpf can avoid user-kernel context switch in most fast io path, so makes ublk over ublk possible 2) complicated virtual block device - many complicated virtual block devices have admin&meta code path and normal io fast path; meta & admin IO handling is usually complicated, so it can be moved to userspace for relieving development burden; meantime IO fast path can be kept in kernel space for the sake of high performance. - bpf provides rich maps, which can help a lot for communication between userspace and prog or between prog and prog. - one typical example is qcow2, which meta io handling can be moved to userspace, and fast io path is implemented with ublk-bpf in which one efficient bpf map can be looked up first and see if this virtual LBA & host LBA is found in the map, handle the IO with ublk-bpf if the mapping is hit, otherwise forward to userspace to deal with meta IO. 3) some simple high performance virtual devices - such as null & loop, the whole implementation can be done in bpf prog Export `struct ublk_bpf_ops` as bpf struct_ops, so that bpf prog can implement callbacks for handling ublk io commands: - if `UBLK_BPF_IO_QUEUED` is returned from ->queue_io_cmd() or ->queue_io_cmd_daemon(), this io command has been queued in bpf prog, so it won't be forwarded to userspace - if `UBLK_BPF_IO_REDIRECT` is returned from ->queue_io_cmd() or ->queue_io_cmd_daemon(), this io command will be forwarded to userspace - if `UBLK_BPF_IO_CONTINUE` is returned from ->queue_io_cmd() or ->queue_io_cmd_daemon(), part of this io command is queued, and `ublk_bpf_return_t` carries how many bytes queued, so ublk driver will continue to call the callback to queue remained bytes of this io command further, this way is helpful for implementing stacking devices by splitting io command. Also ->release_io_cmd() is added for providing chance to notify bpf prog that this io command is going to be released. Signed-off-by: Ming Lei --- drivers/block/ublk/Kconfig | 16 +++ drivers/block/ublk/Makefile | 3 + drivers/block/ublk/bpf.h | 184 ++++++++++++++++++++++++ drivers/block/ublk/bpf_ops.c | 261 +++++++++++++++++++++++++++++++++++ drivers/block/ublk/main.c | 29 +++- drivers/block/ublk/ublk.h | 33 +++++ 6 files changed, 524 insertions(+), 2 deletions(-) create mode 100644 drivers/block/ublk/bpf.h create mode 100644 drivers/block/ublk/bpf_ops.c diff --git a/drivers/block/ublk/Kconfig b/drivers/block/ublk/Kconfig index b06e3df09779..23aa97d51956 100644 --- a/drivers/block/ublk/Kconfig +++ b/drivers/block/ublk/Kconfig @@ -34,3 +34,19 @@ config BLKDEV_UBLK_LEGACY_OPCODES Say N if you don't want to support legacy command opcode. It is suggested to enable N if your application(ublk server) switches to ioctl command encoding. + +config UBLK_BPF + bool "UBLK-BPF support" + depends on BPF + depends on BLK_DEV_UBLK + help + This option allows to support eBPF programs on the UBLK subsystem. + eBPF programs can handle fast IO code path directly in kernel space, + and avoid to switch to ublk daemon userspace conext, meantime zero + copy can be supported directly. + + Usually target code need to partition into two parts: fast io code path + which is run as eBPF prog in kernel context, and slow & complicated + meta/admin code path which is run in ublk daemon userspace context. + And use efficient bpf map for communication between user mode and + kernel bpf prog. diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile index 30e06b74dd82..7058b0fc13bf 100644 --- a/drivers/block/ublk/Makefile +++ b/drivers/block/ublk/Makefile @@ -4,4 +4,7 @@ ccflags-y += -I$(src) ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o +ifeq ($(CONFIG_UBLK_BPF), y) +ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o +endif obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h new file mode 100644 index 000000000000..e3505c9ab86a --- /dev/null +++ b/drivers/block/ublk/bpf.h @@ -0,0 +1,184 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#ifndef UBLK_INT_BPF_HEADER +#define UBLK_INT_BPF_HEADER + +#include "bpf_reg.h" + +typedef unsigned long ublk_bpf_return_t; +typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int); +typedef void (*release_io_cmd_t)(struct ublk_bpf_io *io); + +#ifdef CONFIG_UBLK_BPF +#include + +/* + * enum ublk_bpf_disposition - how to dispose the bpf io command + * + * @UBLK_BPF_IO_QUEUED: io command queued completely by bpf prog, so this + * cmd needn't to be forwarded to ublk daemon any more + * @UBLK_BPF_IO_REDIRECT: io command can't be queued by bpf prog, so this + * cmd will be forwarded to ublk daemon + * @UBLK_BPF_IO_CONTINUE: io command is being queued, and can be disposed + * further by bpf prog, so bpf callback will be called further + */ +enum ublk_bpf_disposition { + UBLK_BPF_IO_QUEUED = 0, + UBLK_BPF_IO_REDIRECT, + UBLK_BPF_IO_CONTINUE, +}; + +/** + * struct ublk_bpf_ops - A BPF struct_ops of callbacks allowing to implement + * ublk target from bpf program + * @id: ops id + * @queue_io_cmd: callback for queuing io command in ublk io context + * @queue_io_cmd_daemon: callback for queuing io command in ublk daemon + */ +struct ublk_bpf_ops { + /* struct_ops id, used for ublk device to attach prog */ + unsigned id; + + /* queue io command from ublk io context, can't be sleepable */ + queue_io_cmd_t queue_io_cmd; + + /* queue io command from target io daemon context, can be sleepable */ + queue_io_cmd_t queue_io_cmd_daemon; + + /* called when the io command reference drops to zero, can't be sleepable */ + release_io_cmd_t release_io_cmd; + + /* private: don't show in doc, must be the last field */ + struct bpf_prog_provider provider; +}; + +#define UBLK_BPF_DISPOSITION_BITS (4) +#define UBLK_BPF_DISPOSITION_SHIFT (BITS_PER_LONG - UBLK_BPF_DISPOSITION_BITS) + +static inline enum ublk_bpf_disposition ublk_bpf_get_disposition(ublk_bpf_return_t ret) +{ + return ret >> UBLK_BPF_DISPOSITION_SHIFT; +} + +static inline unsigned int ublk_bpf_get_return_bytes(ublk_bpf_return_t ret) +{ + return ret & ((1UL << UBLK_BPF_DISPOSITION_SHIFT) - 1); +} + +static inline ublk_bpf_return_t ublk_bpf_return_val(enum ublk_bpf_disposition rc, + unsigned int bytes) +{ + return (ublk_bpf_return_t) ((unsigned long)rc << UBLK_BPF_DISPOSITION_SHIFT) | bytes; +} + +static inline struct request *ublk_bpf_get_req(const struct ublk_bpf_io *io) +{ + struct ublk_rq_data *data = container_of(io, struct ublk_rq_data, bpf_data); + struct request *req = blk_mq_rq_from_pdu(data); + + return req; +} + +static inline void ublk_bpf_io_dec_ref(struct ublk_bpf_io *io) +{ + if (refcount_dec_and_test(&io->ref)) { + struct request *req = ublk_bpf_get_req(io); + + if (req->mq_hctx) { + const struct ublk_queue *ubq = req->mq_hctx->driver_data; + + if (ubq->bpf_ops && ubq->bpf_ops->release_io_cmd) + ubq->bpf_ops->release_io_cmd(io); + } + + if (test_bit(UBLK_BPF_IO_COMPLETED, &io->flags)) { + smp_rmb(); + __clear_bit(UBLK_BPF_IO_PREP, &io->flags); + __ublk_complete_rq_with_res(req, io->res); + } + } +} + +static inline void ublk_bpf_complete_io_cmd(struct ublk_bpf_io *io, int res) +{ + io->res = res; + smp_wmb(); + set_bit(UBLK_BPF_IO_COMPLETED, &io->flags); + ublk_bpf_io_dec_ref(io); +} + + +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req, + queue_io_cmd_t cb); + +/* + * Return true if bpf prog handled this io command, otherwise return false + * so that this io command will be forwarded to userspace + */ +static inline bool ublk_run_bpf_prog(struct ublk_queue *ubq, + struct request *req, + queue_io_cmd_t cb, + bool fail_on_null) +{ + if (likely(cb)) + return ublk_run_bpf_handler(ubq, req, cb); + + /* bpf prog is un-registered */ + if (fail_on_null && !ubq->bpf_ops) { + __ublk_complete_rq_with_res(req, -EOPNOTSUPP); + return true; + } + + return false; +} + +static inline queue_io_cmd_t ublk_get_bpf_io_cb(struct ublk_queue *ubq) +{ + return ubq->bpf_ops ? ubq->bpf_ops->queue_io_cmd : NULL; +} + +static inline queue_io_cmd_t ublk_get_bpf_io_cb_daemon(struct ublk_queue *ubq) +{ + return ubq->bpf_ops ? ubq->bpf_ops->queue_io_cmd_daemon : NULL; +} + +static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) +{ + if (ublk_get_bpf_io_cb(ubq)) + return ublk_get_bpf_io_cb(ubq); + + return ublk_get_bpf_io_cb_daemon(ubq); +} + +int ublk_bpf_struct_ops_init(void); + +#else + +static inline bool ublk_run_bpf_prog(struct ublk_queue *ubq, + struct request *req, + queue_io_cmd_t cb, + bool fail_on_null) +{ + return false; +} + +static inline queue_io_cmd_t ublk_get_bpf_io_cb(struct ublk_queue *ubq) +{ + return NULL; +} + +static inline queue_io_cmd_t ublk_get_bpf_io_cb_daemon(struct ublk_queue *ubq) +{ + return NULL; +} + +static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) +{ + return NULL; +} + +static inline int ublk_bpf_struct_ops_init(void) +{ + return 0; +} +#endif +#endif diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c new file mode 100644 index 000000000000..6ac2aebd477e --- /dev/null +++ b/drivers/block/ublk/bpf_ops.c @@ -0,0 +1,261 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Red Hat */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ublk.h" +#include "bpf.h" + +static DEFINE_XARRAY(ublk_ops); +static DEFINE_MUTEX(ublk_bpf_ops_lock); + +static bool ublk_bpf_ops_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int ublk_bpf_ops_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + /* ublk prog can change nothing */ + if (size > 0) + return -EACCES; + + return NOT_INIT; +} + +static const struct bpf_verifier_ops ublk_bpf_verifier_ops = { + .get_func_proto = bpf_base_func_proto, + .is_valid_access = ublk_bpf_ops_is_valid_access, + .btf_struct_access = ublk_bpf_ops_btf_struct_access, +}; + +static int ublk_bpf_ops_init(struct btf *btf) +{ + return 0; +} + +static int ublk_bpf_ops_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + u32 moff = __btf_member_bit_offset(t, member) / 8; + + switch (moff) { + case offsetof(struct ublk_bpf_ops, queue_io_cmd): + case offsetof(struct ublk_bpf_ops, release_io_cmd): + if (prog->sleepable) + return -EINVAL; + case offsetof(struct ublk_bpf_ops, queue_io_cmd_daemon): + break; + default: + if (prog->sleepable) + return -EINVAL; + } + + return 0; +} + +static int ublk_bpf_ops_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct ublk_bpf_ops *uops; + struct ublk_bpf_ops *kops; + u32 moff; + + uops = (const struct ublk_bpf_ops *)udata; + kops = (struct ublk_bpf_ops *)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + + switch (moff) { + case offsetof(struct ublk_bpf_ops, id): + /* For dev_id, this function has to copy it and return 1 to + * indicate that the data has been handled by the struct_ops + * type, or the verifier will reject the map if the value of + * those fields is not zero. + */ + kops->id = uops->id; + return 1; + } + return 0; +} + +static int ublk_bpf_reg(void *kdata, struct bpf_link *link) +{ + struct ublk_bpf_ops *ops = kdata; + struct ublk_bpf_ops *curr; + int ret = -EBUSY; + + mutex_lock(&ublk_bpf_ops_lock); + if (!xa_load(&ublk_ops, ops->id)) { + curr = kmalloc(sizeof(*curr), GFP_KERNEL); + if (curr) { + *curr = *ops; + bpf_prog_provider_init(&curr->provider); + ret = xa_err(xa_store(&ublk_ops, ops->id, curr, GFP_KERNEL)); + } else { + ret = -ENOMEM; + } + } + mutex_unlock(&ublk_bpf_ops_lock); + + return ret; +} + +static void ublk_bpf_unreg(void *kdata, struct bpf_link *link) +{ + struct ublk_bpf_ops *ops = kdata; + struct ublk_bpf_ops *curr; + LIST_HEAD(consumer_list); + struct bpf_prog_consumer *consumer, *tmp; + + mutex_lock(&ublk_bpf_ops_lock); + curr = xa_erase(&ublk_ops, ops->id); + if (curr) + list_splice_init(&curr->provider.list, &consumer_list); + mutex_unlock(&ublk_bpf_ops_lock); + + list_for_each_entry_safe(consumer, tmp, &consumer_list, node) + bpf_prog_consumer_detach(consumer, true); + kfree(curr); +} + +static void ublk_bpf_prep_io(struct ublk_bpf_io *io, + const struct ublksrv_io_desc *iod) +{ + io->flags = 0; + io->res = 0; + io->iod = iod; + __set_bit(UBLK_BPF_IO_PREP, &io->flags); + /* one is for submission, another is for completion */ + refcount_set(&io->ref, 2); +} + +/* Return true if io cmd is queued, otherwise forward it to userspace */ +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req, + queue_io_cmd_t cb) +{ + ublk_bpf_return_t ret; + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag); + struct ublk_bpf_io *bpf_io = &data->bpf_data; + const unsigned long total = iod->nr_sectors << 9; + unsigned int done = 0; + bool res = true; + int err; + + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags)) + ublk_bpf_prep_io(bpf_io, iod); + + do { + enum ublk_bpf_disposition rc; + unsigned int bytes; + + ret = cb(bpf_io, done); + rc = ublk_bpf_get_disposition(ret); + + if (rc == UBLK_BPF_IO_QUEUED) + goto exit; + + if (rc == UBLK_BPF_IO_REDIRECT) + break; + + if (unlikely(rc != UBLK_BPF_IO_CONTINUE)) { + printk_ratelimited(KERN_ERR "%s: unknown rc code %d\n", + __func__, rc); + err = -EINVAL; + goto fail; + } + + bytes = ublk_bpf_get_return_bytes(ret); + if (unlikely((bytes & 511) || !bytes)) { + err = -EREMOTEIO; + goto fail; + } else if (unlikely(bytes > total - done)) { + err = -ENOSPC; + goto fail; + } else { + done += bytes; + } + } while (done < total); + + /* + * If any bytes are queued, we can't forward to userspace + * immediately because it is too complicated to support two side + * completion. + * + * But the request will be updated and retried after the queued + * part is completed, then it can be forwarded to userspace too. + */ + res = done > 0; + if (!res) { + /* will redirect to userspace, so forget bpf handling */ + __clear_bit(UBLK_BPF_IO_PREP, &bpf_io->flags); + refcount_dec(&bpf_io->ref); + } + goto exit; +fail: + res = true; + ublk_bpf_complete_io_cmd(bpf_io, err); +exit: + ublk_bpf_io_dec_ref(bpf_io); + return res; +} + +static ublk_bpf_return_t ublk_bpf_run_io_task(struct ublk_bpf_io *io, + unsigned int offset) +{ + return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0); +} + +static ublk_bpf_return_t ublk_bpf_queue_io_cmd(struct ublk_bpf_io *io, + unsigned int offset) +{ + return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0); +} + +static void ublk_bpf_release_io_cmd(struct ublk_bpf_io *io) +{ +} + +static struct ublk_bpf_ops __bpf_ublk_bpf_ops = { + .queue_io_cmd = ublk_bpf_queue_io_cmd, + .queue_io_cmd_daemon = ublk_bpf_run_io_task, + .release_io_cmd = ublk_bpf_release_io_cmd, +}; + +static struct bpf_struct_ops bpf_ublk_bpf_ops = { + .verifier_ops = &ublk_bpf_verifier_ops, + .init = ublk_bpf_ops_init, + .check_member = ublk_bpf_ops_check_member, + .init_member = ublk_bpf_ops_init_member, + .reg = ublk_bpf_reg, + .unreg = ublk_bpf_unreg, + .name = "ublk_bpf_ops", + .cfi_stubs = &__bpf_ublk_bpf_ops, + .owner = THIS_MODULE, +}; + +int __init ublk_bpf_struct_ops_init(void) +{ + int err; + + err = register_bpf_struct_ops(&bpf_ublk_bpf_ops, ublk_bpf_ops); + if (err) + pr_warn("error while registering ublk bpf struct ops: %d", err); + + return 0; +} diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index aefb414ebf6c..29d3e7f656a7 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -43,6 +43,7 @@ #include #include "ublk.h" +#include "bpf.h" static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq); @@ -1061,6 +1062,10 @@ static inline void __ublk_rq_task_work(struct request *req, mapped_bytes >> 9; } + if (ublk_support_bpf(ubq) && ublk_run_bpf_prog(ubq, req, + ublk_get_bpf_io_cb_daemon(ubq), true)) + return; + ublk_init_req_ref(ubq, req); ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags); } @@ -1088,6 +1093,10 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq) { struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + if (ublk_support_bpf(ubq) && ublk_run_bpf_prog(ubq, rq, + ublk_get_bpf_io_cb(ubq), false)) + return; + if (llist_add(&data->node, &ubq->io_cmds)) { struct ublk_io *io = &ubq->ios[rq->tag]; @@ -1265,8 +1274,24 @@ static void ublk_commit_completion(struct ublk_device *ub, if (req_op(req) == REQ_OP_ZONE_APPEND) req->__sector = ub_cmd->zone_append_lba; - if (likely(!blk_should_fake_timeout(req->q))) - ublk_put_req_ref(ubq, req); + if (likely(!blk_should_fake_timeout(req->q))) { + /* + * userspace may have setup everything, but still let bpf + * prog to handle io by returning -EAGAIN, this way provides + * single bpf io handle fast path, and should simplify things + * a lot. + */ + if (ublk_support_bpf(ubq) && io->res == -EAGAIN) { + if(!ublk_run_bpf_prog(ubq, req, + ublk_get_bpf_any_io_cb(ubq), true)) { + /* give up now */ + io->res = -EIO; + ublk_put_req_ref(ubq, req); + } + } else { + ublk_put_req_ref(ubq, req); + } + } } /* diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 76aee4225c78..e9ceadbc616d 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -33,10 +33,26 @@ (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \ UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED) +enum { + UBLK_BPF_IO_PREP = 0, + UBLK_BPF_IO_COMPLETED = 1, +}; + +struct ublk_bpf_io { + const struct ublksrv_io_desc *iod; + unsigned long flags; + refcount_t ref; + int res; +}; + struct ublk_rq_data { struct llist_node node; struct kref ref; + +#ifdef CONFIG_UBLK_BPF + struct ublk_bpf_io bpf_data; +#endif }; struct ublk_uring_cmd_pdu { @@ -104,6 +120,10 @@ struct ublk_queue { struct llist_head io_cmds; +#ifdef CONFIG_UBLK_BPF + struct ublk_bpf_ops *bpf_ops; +#endif + unsigned short force_abort:1; unsigned short timeout:1; unsigned short canceling:1; @@ -161,8 +181,21 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, &(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]); } +static inline bool ublk_support_bpf(const struct ublk_queue *ubq) +{ + return false; +} + struct ublk_device *ublk_get_device_from_id(int idx); void ublk_put_device(struct ublk_device *ub); void __ublk_complete_rq(struct request *req); +static inline void __ublk_complete_rq_with_res(struct request *req, int res) +{ + struct ublk_queue *ubq = req->mq_hctx->driver_data; + struct ublk_io *io = &ubq->ios[req->tag]; + + io->res = res; + __ublk_complete_rq(req); +} #endif From patchwork Tue Jan 7 12:04:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928779 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBB941E04A1; Tue, 7 Jan 2025 12:08:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251718; cv=none; b=T5t3dKPycd6lVhmaLSud68I7oL7KymXWZtlWNn9MdA2XbbySN+1ZIoOTS4xTAWBQ6/6syp3+A0BeKrA2Nr36Y9RWHoosx2LRs3x27MTiVfVR7QQ4/NEYOZq5Lc0UUQ06OwBo8wNrg55Qk1AdZe02o7tOOiw77/ghAnAOdnnW8+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251718; c=relaxed/simple; bh=cnPLy76INlaLrwkt7MAjvY9YxgcT87cLRYGRQL14RRU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=enbgR3iCOcqTxctk/9FkAwp9bseRe0+FrPe+cvCxBd0XvuxjWxyM2xJ1ADE5jP9v9WiliTa+P9Lxd3IpoyXW4Yb1JBK9bIztoVX9HeOtuZthB5i4xjBVX/IUP1TfMzoP/msIxC0ebayQXQchXPBeBjnZ1+NKHWpAXYVNQQkc8Ks= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I76fVNy0; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I76fVNy0" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2f13acbe29bso20001542a91.1; Tue, 07 Jan 2025 04:08:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251703; x=1736856503; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QaTnWdIWLR1sE2ww6RDndLTTgfPGGW5SN20fYrYrxaE=; b=I76fVNy06dBCVeQJFL9okoRmLTo4686eUyc1PxNKBHf+/0vyFjEVpI+mREu48y/fBD YpTbFhxHHQR/L9oMftpdnLCJkYoS0vV+SYfZmsZuOjH0tCCcuyhXonuclxYhCxnBu/m+ /HSTFZeeS0s4IitningAfkQCh8MjC5vlvXAmcWtkkDERZoTT0hTCVU1AiiQUoGKXQu2h 8kqOOutWmipuQpNZXUKpETLnWhHI+mlArOs2/GvgHOm/FGjULAfdjeuVA4Lw7W4D6tkw oKKwvQpVS2VmEzxAQcnrhQIfgPTS+DfLjyamMi3vqadQBrSeTTfpAB+XI1delcv6J2Lt wIkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251703; x=1736856503; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QaTnWdIWLR1sE2ww6RDndLTTgfPGGW5SN20fYrYrxaE=; b=Jyc0/uPu0MVO6OGh12lE/SDlzE12Topqk9vrbkUkm6tscbbOpK4gZeTWFaHu+/glMN oIeoJcSqvH0QPNhLDMA4nvYRXZ7+PK2ebREsq32O6L7ZE206LtifYhlGCaX9S4UCYnuS K5BlmzXurkQckSd8ADsYtTGY2Df00l9ZCCvnhprttA4iWx7xBgIjfu+TH7/e7vddlfrM AeWY643Ne5zeXjdQGjaxDZXPaF9s1UVIsu2DvgvvZe2W/4HifX/hTiCu9qqsHr+NnKlv tqsU5CSsgcy99IcpJ9U+WXhg1WNbMQKktTgTsJGGHUtZbeeAxeH3UHjtiOShVEkSdjLW Me9Q== X-Forwarded-Encrypted: i=1; AJvYcCVvQstFNLk3CaWzn2pbUmV+IFqiPY1K9gI9dyu0vUkkLk3H6kPKG+4qxSdNQsPlJkk16oe3lHo3JdynBA==@vger.kernel.org X-Gm-Message-State: AOJu0YxmuOQMaRsKOFDyQ04V6awAOl5/SHaaXHcLBUNfGr6CzGy5zXpo HiKA5oc9klaEEgfyJXLEfCEUZXK1lHfuEouiAf7cFhmYvzw1Y0ZYXVMLZiiFcz4= X-Gm-Gg: ASbGncuLbAD6MvbqUu+l5Hi7NOxiwZc31QVASADClYSMLbeqUBq7Lzq0rX4P7wgL6FT 3nwZE75+Lx8AzGoEte9k7qLtJt8jeLcogK+zricjjY/SOEWdywWIUff+dTpu+R4+c3u4cOmmgif VftennpDPC7l2FYrjst4RODI93xHeruCF+PYTuIaLsuYlqu9Eo6POowobqXItOYrSX6Mf8URDLo Cqgbx22Cs48jLJRs5ioxRAUioOPQRjV5QLpzmz3I4R5VrqwVDZQh1ItCpK8fIZNt89t X-Google-Smtp-Source: AGHT+IE/IbyVPD6/Pj2zLNT+fW1Fsx1kN0ii1GFHjth+jWZh0cZqXDm/340Ft45v7O8hlWH1wn4XIg== X-Received: by 2002:a05:6a00:1c92:b0:725:4915:c0f with SMTP id d2e1a72fcca58-72d103dc480mr4429129b3a.11.1736251702842; Tue, 07 Jan 2025 04:08:22 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:22 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Date: Tue, 7 Jan 2025 20:04:00 +0800 Message-ID: <20250107120417.1237392-10-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Attach bpf program to ublk device before adding ublk disk, and detach it after the disk is removed. ublk device needs to provide the struct_ops ID for attaching the specific prog, and each ublk device has to attach to only single bpf prog. So that we can use the attached bpf prog for handling ublk IO command. Meantime add two ublk bpf callbacks for prog to attach & detach ublk device. Signed-off-by: Ming Lei --- drivers/block/ublk/Makefile | 2 +- drivers/block/ublk/bpf.c | 99 ++++++++++++++++++++++++++++++++++++ drivers/block/ublk/bpf.h | 33 ++++++++++++ drivers/block/ublk/bpf_ops.c | 34 +++++++++++++ drivers/block/ublk/main.c | 25 ++++++--- drivers/block/ublk/ublk.h | 16 ++++++ 6 files changed, 200 insertions(+), 9 deletions(-) create mode 100644 drivers/block/ublk/bpf.c diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile index 7058b0fc13bf..f843a9005cdb 100644 --- a/drivers/block/ublk/Makefile +++ b/drivers/block/ublk/Makefile @@ -5,6 +5,6 @@ ccflags-y += -I$(src) ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o ifeq ($(CONFIG_UBLK_BPF), y) -ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o +ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o endif obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c new file mode 100644 index 000000000000..479045a5f0d9 --- /dev/null +++ b/drivers/block/ublk/bpf.c @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Red Hat */ + +#include "ublk.h" +#include "bpf.h" + +static int ublk_set_bpf_ops(struct ublk_device *ub, + struct ublk_bpf_ops *ops) +{ + int i; + + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { + if (ops && ublk_get_queue(ub, i)->bpf_ops) { + ublk_set_bpf_ops(ub, NULL); + return -EBUSY; + } + ublk_get_queue(ub, i)->bpf_ops = ops; + } + return 0; +} + +static int ublk_bpf_prog_attach_cb(struct bpf_prog_consumer *consumer, + struct bpf_prog_provider *provider) +{ + struct ublk_device *ub = container_of(consumer, struct ublk_device, + prog); + struct ublk_bpf_ops *ops = container_of(provider, + struct ublk_bpf_ops, provider); + int ret; + + if (!ublk_get_device(ub)) + return -ENODEV; + + ret = ublk_set_bpf_ops(ub, ops); + if (ret) + goto fail_put_dev; + + if (ops->attach_dev) { + ret = ops->attach_dev(ub->dev_info.dev_id); + if (ret) + goto fail_reset_ops; + } + return 0; + +fail_reset_ops: + ublk_set_bpf_ops(ub, NULL); +fail_put_dev: + ublk_put_device(ub); + return ret; +} + +static void ublk_bpf_prog_detach_cb(struct bpf_prog_consumer *consumer, + bool unreg) +{ + struct ublk_device *ub = container_of(consumer, struct ublk_device, + prog); + struct ublk_bpf_ops *ops = container_of(consumer->provider, + struct ublk_bpf_ops, provider); + + if (unreg) { + blk_mq_freeze_queue(ub->ub_disk->queue); + ublk_set_bpf_ops(ub, NULL); + blk_mq_unfreeze_queue(ub->ub_disk->queue); + } else { + ublk_set_bpf_ops(ub, NULL); + } + if (ops->detach_dev) + ops->detach_dev(ub->dev_info.dev_id); + ublk_put_device(ub); +} + +static const struct bpf_prog_consumer_ops ublk_prog_consumer_ops = { + .attach_fn = ublk_bpf_prog_attach_cb, + .detach_fn = ublk_bpf_prog_detach_cb, +}; + +int ublk_bpf_attach(struct ublk_device *ub) +{ + if (!ublk_dev_support_bpf(ub)) + return 0; + + /* todo: ublk device need to provide struct_ops prog id */ + ub->prog.prog_id = 0; + ub->prog.ops = &ublk_prog_consumer_ops; + + return ublk_bpf_prog_attach(&ub->prog); +} + +void ublk_bpf_detach(struct ublk_device *ub) +{ + if (!ublk_dev_support_bpf(ub)) + return; + ublk_bpf_prog_detach(&ub->prog); +} + +int __init ublk_bpf_init(void) +{ + return ublk_bpf_struct_ops_init(); +} diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h index e3505c9ab86a..4e178cbecb74 100644 --- a/drivers/block/ublk/bpf.h +++ b/drivers/block/ublk/bpf.h @@ -7,6 +7,8 @@ typedef unsigned long ublk_bpf_return_t; typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int); typedef void (*release_io_cmd_t)(struct ublk_bpf_io *io); +typedef int (*attach_dev_t)(int dev_id); +typedef void (*detach_dev_t)(int dev_id); #ifdef CONFIG_UBLK_BPF #include @@ -47,6 +49,12 @@ struct ublk_bpf_ops { /* called when the io command reference drops to zero, can't be sleepable */ release_io_cmd_t release_io_cmd; + /* called when attaching bpf prog to this ublk dev */ + attach_dev_t attach_dev; + + /* called when detaching bpf prog from this ublk dev */ + detach_dev_t detach_dev; + /* private: don't show in doc, must be the last field */ struct bpf_prog_provider provider; }; @@ -149,7 +157,12 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) return ublk_get_bpf_io_cb_daemon(ubq); } +int ublk_bpf_init(void); int ublk_bpf_struct_ops_init(void); +int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer); +void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer); +int ublk_bpf_attach(struct ublk_device *ub); +void ublk_bpf_detach(struct ublk_device *ub); #else @@ -176,9 +189,29 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) return NULL; } +static inline int ublk_bpf_init(void) +{ + return 0; +} + static inline int ublk_bpf_struct_ops_init(void) { return 0; } + +static inline int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer) +{ + return 0; +} +static inline void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer) +{ +} +static inline int ublk_bpf_attach(struct ublk_device *ub) +{ + return 0; +} +static inline void ublk_bpf_detach(struct ublk_device *ub) +{ +} #endif #endif diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c index 6ac2aebd477e..05d8d415b30d 100644 --- a/drivers/block/ublk/bpf_ops.c +++ b/drivers/block/ublk/bpf_ops.c @@ -133,6 +133,29 @@ static void ublk_bpf_unreg(void *kdata, struct bpf_link *link) kfree(curr); } +int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer) +{ + unsigned id = consumer->prog_id; + struct ublk_bpf_ops *ops; + int ret = -EINVAL; + + mutex_lock(&ublk_bpf_ops_lock); + ops = xa_load(&ublk_ops, id); + if (ops && ops->id == id) + ret = bpf_prog_consumer_attach(consumer, &ops->provider); + mutex_unlock(&ublk_bpf_ops_lock); + + return ret; +} + +void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer) +{ + mutex_lock(&ublk_bpf_ops_lock); + bpf_prog_consumer_detach(consumer, false); + mutex_unlock(&ublk_bpf_ops_lock); +} + + static void ublk_bpf_prep_io(struct ublk_bpf_io *io, const struct ublksrv_io_desc *iod) { @@ -231,10 +254,21 @@ static void ublk_bpf_release_io_cmd(struct ublk_bpf_io *io) { } +static int ublk_bpf_attach_dev(int dev_id) +{ + return 0; +} + +static void ublk_bpf_detach_dev(int dev_id) +{ +} + static struct ublk_bpf_ops __bpf_ublk_bpf_ops = { .queue_io_cmd = ublk_bpf_queue_io_cmd, .queue_io_cmd_daemon = ublk_bpf_run_io_task, .release_io_cmd = ublk_bpf_release_io_cmd, + .attach_dev = ublk_bpf_attach_dev, + .detach_dev = ublk_bpf_detach_dev, }; static struct bpf_struct_ops bpf_ublk_bpf_ops = { diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index 29d3e7f656a7..0b136bc5247f 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -486,7 +486,7 @@ static inline bool ublk_need_get_data(const struct ublk_queue *ubq) } /* Called in slow path only, keep it noinline for trace purpose */ -static noinline struct ublk_device *ublk_get_device(struct ublk_device *ub) +struct ublk_device *ublk_get_device(struct ublk_device *ub) { if (kobject_get_unless_zero(&ub->cdev_dev.kobj)) return ub; @@ -499,12 +499,6 @@ void ublk_put_device(struct ublk_device *ub) put_device(&ub->cdev_dev); } -static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev, - int qid) -{ - return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]); -} - static inline bool ublk_rq_has_data(const struct request *rq) { return bio_has_data(rq->bio); @@ -1492,6 +1486,8 @@ static struct gendisk *ublk_detach_disk(struct ublk_device *ub) { struct gendisk *disk; + ublk_bpf_detach(ub); + /* Sync with ublk_abort_queue() by holding the lock */ spin_lock(&ub->lock); disk = ub->ub_disk; @@ -2206,12 +2202,19 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd) goto out_put_cdev; } - ret = add_disk(disk); + ret = ublk_bpf_attach(ub); if (ret) goto out_put_cdev; + ret = add_disk(disk); + if (ret) + goto out_put_bpf; + set_bit(UB_STATE_USED, &ub->state); +out_put_bpf: + if (ret) + ublk_bpf_detach(ub); out_put_cdev: if (ret) { ublk_detach_disk(ub); @@ -2967,8 +2970,14 @@ static int __init ublk_init(void) if (ret) goto free_chrdev_region; + ret = ublk_bpf_init(); + if (ret) + goto unregister_class; + return 0; +unregister_class: + class_unregister(&ublk_chr_class); free_chrdev_region: unregister_chrdev_region(ublk_chr_devt, UBLK_MINORS); unregister_mis: diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index e9ceadbc616d..7579b0032a3c 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -7,6 +7,8 @@ #include #include +#include "bpf_reg.h" + #define UBLK_MINORS (1U << MINORBITS) /* private ioctl command mirror */ @@ -153,6 +155,9 @@ struct ublk_device { unsigned long state; int ub_number; +#ifdef CONFIG_UBLK_BPF + struct bpf_prog_consumer prog; +#endif struct mutex mutex; spinlock_t lock; @@ -173,6 +178,11 @@ struct ublk_params_header { __u32 types; }; +static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev, + int qid) +{ + return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]); +} static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, int tag) @@ -186,6 +196,12 @@ static inline bool ublk_support_bpf(const struct ublk_queue *ubq) return false; } +static inline bool ublk_dev_support_bpf(const struct ublk_device *ub) +{ + return false; +} + +struct ublk_device *ublk_get_device(struct ublk_device *ub); struct ublk_device *ublk_get_device_from_id(int idx); void ublk_put_device(struct ublk_device *ub); void __ublk_complete_rq(struct request *req); From patchwork Tue Jan 7 12:04:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928781 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 332271F03F0; Tue, 7 Jan 2025 12:08:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251730; cv=none; b=N8GM/SO+/3VwY8MSrmytBN2lmml4yRG6noQHA5pepbS72Rcdec58AtdAydHgNMhgJKzYxxILsr9ECUkyCpu+TzQINU62zS3Q6Ece7ERmBeHey+H3xQDB/oenajNnJNOPi++1voYLxxIy39/1CXYNGX0/C3WtG0gSet3DpLv4b6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251730; c=relaxed/simple; bh=dFowweZmqoHTDjnE3Q4Oi/G8Ma5JlTL4czltXfRmVTw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=saKiN9x6+WCIRTTmCj4qF4dmzUZM0gMe3UkKjBsSLuREVBg3GEy3Gs8Vy18mRLHtpp0KE/ui+SdvV1Dv77eoyFWXSF4/Cbgu0gZ5Wuw3PsEZLnTU4oIW2a8gmArPyh6TumZo09Mlr3XrwKXHSDUD77douAaMpVMSMjUkbdbwm2Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ImuqVbzE; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ImuqVbzE" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-2163dc5155fso221725395ad.0; Tue, 07 Jan 2025 04:08:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251706; x=1736856506; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UkIoWa3+df1IoXEzOIp55BOynbHpat50uMkhx/SzFZk=; b=ImuqVbzENgAB66lGmTvEe6iakhlljRBtxuYYiXAywWDDP/CghfVY04lYeUpQAW3jJM +damsxvkiZIAO8qg8NMBnf1I+pB8uoCOIXYdVc8CRL28QnlthP6rZ4xlbU1vHC904yFw tCafDms+qFXiJlPHbjBvzy5iXDTKyUaEwcu3KsbqoRDyx1p1TQl/jN6sCDob9Tg8DdsW j09+Gu02oyd7b6IslZwGAOybT7WdadMZfUWyPG4xPNHYaXcMhunBp4GY+bJCegCqTMHL usjQ8X5/B095ugVy/l8duznbmHAamJ31udCiI9lk95Ua+NzpYrFC9SJiL2k453j1sjYu w+1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251706; x=1736856506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UkIoWa3+df1IoXEzOIp55BOynbHpat50uMkhx/SzFZk=; b=fNannKtnRTA93DPLvxbVgROOPAKAr8CkDfTxGzI0JJzYPHj45NpmdAFhauZFGuTFTg ejzcfq4FTRbhmXbZemqZ+FJeAesMkK/WjKLvyUyevD0+Tu9HxjUE0q/Trt7izWIpGwdf q0di5bqvhJPFtVd3eT4ST9nSjw2VdCWCW20jh+Bh8cLi3eunZtcijYq/4erP9vuHaYcd KfYYXkMhISwgwjxBUQSiPZKLhYEmnbsDJp0wLdM9ZT4lefPDUMXMuAU1i++SIWT+anef mpPe2in32rcM3mtGGaHST0jFdgEe3er0S/1Do7xhwX7GQjYQ7ii2dgD5zcm93ZeaHjYn og+g== X-Forwarded-Encrypted: i=1; AJvYcCVZpCrglCwqVCunySQhsDqdBO7H1TxE76GKckJ5jeU+sj5ATCrNQf7+YxakKhSmgQReqAr8VkpDQQIHoA==@vger.kernel.org X-Gm-Message-State: AOJu0Yy85xyE15dub8UyV5jYF97j3jePOd6lba4Qkr9kwyjIJ5p3Pv0c Q1TphJzKdRztkj5yDc74dYJ9cSoESMOkhe4xakP9mUYtWPUcijqo X-Gm-Gg: ASbGncurz5Z3SHREiMwpMMhi55evzls0iTuLxu2OqMABI60YSDUNF4GOmg5SWqY37Vy uHZ7HLfrbDDX7WP+NPxdrc6coGAwdopxmmglowp6UZdoPPH+Ir5BcUhAnFLwFXnhq/MuLbmB3f3 Nfpui+myRROh7seHI1DxECcygTtwBDq3WQtgpZpfTBDbUJaeHpGLS5Uv3SueUXvwR8EdiZPWzfZ rLhp1eVsD0AohR3s01aMSiNHdJhDz+JYzcBhXzKDtuTMdpGjSVeG3xkTarnlkxpN1zk X-Google-Smtp-Source: AGHT+IHfwMTDN49X3zErgeAJH9i8tWN9t9kmUkMAdL68ZKIQBO5Mc6BAe/sERHnvYxZpHRxGXc83AA== X-Received: by 2002:a05:6a00:6c89:b0:728:e52b:1cc9 with SMTP id d2e1a72fcca58-72abdeaad08mr87614041b3a.18.1736251706027; Tue, 07 Jan 2025 04:08:26 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:25 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Date: Tue, 7 Jan 2025 20:04:01 +0800 Message-ID: <20250107120417.1237392-11-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Define some kfunc for ublk bpf prog for handling ublk IO command in application code. Signed-off-by: Ming Lei --- drivers/block/ublk/bpf.c | 78 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c index 479045a5f0d9..4179b7f61e92 100644 --- a/drivers/block/ublk/bpf.c +++ b/drivers/block/ublk/bpf.c @@ -93,7 +93,85 @@ void ublk_bpf_detach(struct ublk_device *ub) ublk_bpf_prog_detach(&ub->prog); } + +__bpf_kfunc_start_defs(); +__bpf_kfunc const struct ublksrv_io_desc * +ublk_bpf_get_iod(const struct ublk_bpf_io *io) +{ + if (io) + return io->iod; + return NULL; +} + +__bpf_kfunc unsigned int +ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) +{ + if (io) { + const struct request *req = ublk_bpf_get_req(io); + + return req->tag; + } + return -1; +} + +__bpf_kfunc unsigned int +ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) +{ + if (io) { + const struct request *req = ublk_bpf_get_req(io); + + if (req->mq_hctx) { + const struct ublk_queue *ubq = req->mq_hctx->driver_data; + + return ubq->q_id; + } + } + return -1; +} + +__bpf_kfunc unsigned int +ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) +{ + if (io) { + const struct request *req = ublk_bpf_get_req(io); + + if (req->mq_hctx) { + const struct ublk_queue *ubq = req->mq_hctx->driver_data; + + return ubq->dev->dev_info.dev_id; + } + } + return -1; +} + +__bpf_kfunc void +ublk_bpf_complete_io(struct ublk_bpf_io *io, int res) +{ + ublk_bpf_complete_io_cmd(io, res); +} + +BTF_KFUNCS_START(ublk_bpf_kfunc_ids) +BTF_ID_FLAGS(func, ublk_bpf_complete_io, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL) +BTF_ID_FLAGS(func, ublk_bpf_get_io_tag, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, ublk_bpf_get_queue_id, KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, ublk_bpf_get_dev_id, KF_TRUSTED_ARGS) +BTF_KFUNCS_END(ublk_bpf_kfunc_ids) + +static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = { + .owner = THIS_MODULE, + .set = &ublk_bpf_kfunc_ids, +}; + int __init ublk_bpf_init(void) { + int err; + + err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, + &ublk_bpf_kfunc_set); + if (err) { + pr_warn("error while setting UBLK BPF tracing kfuncs: %d", err); + return err; + } return ublk_bpf_struct_ops_init(); } From patchwork Tue Jan 7 12:04:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928782 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F9F91DFE0A; Tue, 7 Jan 2025 12:08:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251735; cv=none; b=cS9L2LQbJcCcnf7MERy2Ik5J1UBMdcu5komXdzFqIZrgjfnFCHhQypClJ18VwGcoAX16lfpp4+yVf487ucmsJXl84tBwT8zCtQvd0zJH4r+NJ/mS3nPAXkWUgTwsaXUPe5qbk3o3mpOl9wxUcP0lG982pZWIuZc80mWEcpe/bVM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251735; c=relaxed/simple; bh=eHznBwop2mDpz2vdgcSlQkIZPviDdCxuRAT4iPGpYio=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IOIbznOK9ukyo/mu5IRG2dOYqWCDJp7FQXnUrLjGbLjTJI2XrG3UUBG1a5qfkR4OVehzDyIIkiNmhqnP3LE13HI8JmEDW2toEpquAbJgZgXyXsC6tGv0OPjKDOm8nJ1m5Zw19EecI8AVkdLD5tGoUb80JADQH4FQ8+ih+yYUq4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hsPB2/1O; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hsPB2/1O" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21628b3fe7dso218297525ad.3; Tue, 07 Jan 2025 04:08:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251709; x=1736856509; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cuVs/Dhtg/rqdT/P+jUJ6cUXUeXkBAdWAoqGqwWwJHc=; b=hsPB2/1OGz1kaBhQx1Zr9wcaSOKqCYA1pN/a7wx3oTUB/vDPZlZnd+YON7Yo/gKNj3 ZdkysY5ZjKi0EsqTlnDKmFs/Gwd81Sy1h6ybzLDuf4pxfLIuc6nc6/mptL7fIapZNCxn hGGEP2txmyCRmONeUbwpD78JQGo/o5NIsspHcXkKrdHKAstR/gp/fdRd7RF6BuR0nB2K 1x68L0gotUn5oGadVm8/0S4P6t7PzgbSI0nVv26E6VMySsLwzzOC5l6aSW8HsYhZ6j+T GXrGqo02NxvOhaIpiUTRCUo7HcPpj3l6YPES5TlZ0G8kjoaHBFAbtUkjZTwtkbK63UdQ K9PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251709; x=1736856509; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cuVs/Dhtg/rqdT/P+jUJ6cUXUeXkBAdWAoqGqwWwJHc=; b=gjlO6bQC+Y5z1iKV0ZGhgkWBywPBomUw/3ehy9Id89vMHWisvv7VRQZRZKvncvKPoN ubvQR8b0ZothuItjhPGA1sURvpmVLtBN/vgYB/ehBRvfiXU71j6O3797YnWwR6mZZMll gRvTKbOImOAZClFfwK7eaRd6MhJ0z4nqOVzGgWm9KxR74Jc04/SoIl+Pt9LV2jUkAvH5 umlcDj9vivlK5c4l1kmaY22Ht0DfXQkEg5BGBzjmWA4/XuKjeh3lCPvXqluf8gR8PIsN 0J4y1EnNpWrCtZ7Nt2OudpWEcLQxrdRVH6HfLthNNNcuszoUZa7djKRXpWXIYPdi+iks zorA== X-Forwarded-Encrypted: i=1; AJvYcCXWtLoSuzMp/xMNBAC5C9tFGv/2tvPFqVr5jCmyfydlr0wVseH6Y65w9QA92xcroQHXI+89G9+SsM2O/w==@vger.kernel.org X-Gm-Message-State: AOJu0YyryX9SZSqcYoe7RszDhuBeAACPe3qe9/dZyJR9dgiVBhEaWrOD 6u6Mzo2zPOCQAE7uiyYLm42deA6Texx/Xx7sa6I1REXe6EJgY1sL X-Gm-Gg: ASbGncvUiItIfhjCbK62bed13+WhJ5ToaUIC1sIXVtMIMNXpSar0fx/EcoaIfza/LCL z3StCdkNbL4Q7yI/rQQ2DkXvwFUt86qAdlCN9ZLlkNt7OtN9cONyXTmGdVQCr1uNdGu+01R9my6 pc9DDoeKRezsSL2qXXpBhr3Abr9Erev2LVafU6baUFWykl5NwkBGv/BABIhO/hQZ7B1vbdNquyi CLS3YOxCOKL+fHUdes9BF81QIXTrhsP8sfaQDnGmiCIT5kox+1ZP6wVCtzEXnOnuP92 X-Google-Smtp-Source: AGHT+IG8JRG66VyxKo7OHp6oFe7EHUQ/km+xwQfaSQllDMuBjxQ3wkKk6QpPMBY9BsTb6wzQs7g4rg== X-Received: by 2002:a05:6a21:3991:b0:1db:ed8a:a607 with SMTP id adf61e73a8af0-1e5e047b457mr100622648637.11.1736251709154; Tue, 07 Jan 2025 04:08:29 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:28 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Date: Tue, 7 Jan 2025 20:04:02 +0800 Message-ID: <20250107120417.1237392-12-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add feature flag of UBLK_F_BPF, meantime pass bpf struct_ops prog id via ublk parameter from userspace. ublk-bpf needs to copy data between ublk request pages and userspace buffer any more, so let ublk_need_map_io() return false for UBLK_F_BPF too. Signed-off-by: Ming Lei --- drivers/block/ublk/bpf.c | 3 +-- drivers/block/ublk/main.c | 15 ++++++++++++++- drivers/block/ublk/ublk.h | 10 ++++++---- include/uapi/linux/ublk_cmd.h | 14 +++++++++++++- 4 files changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c index 4179b7f61e92..ef1546a7ccda 100644 --- a/drivers/block/ublk/bpf.c +++ b/drivers/block/ublk/bpf.c @@ -79,8 +79,7 @@ int ublk_bpf_attach(struct ublk_device *ub) if (!ublk_dev_support_bpf(ub)) return 0; - /* todo: ublk device need to provide struct_ops prog id */ - ub->prog.prog_id = 0; + ub->prog.prog_id = ub->params.bpf.ops_id; ub->prog.ops = &ublk_prog_consumer_ops; return ublk_bpf_prog_attach(&ub->prog); diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index 0b136bc5247f..3c2ed9bf924d 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -416,6 +416,19 @@ static int ublk_validate_params(const struct ublk_device *ub) else if (ublk_dev_is_zoned(ub)) return -EINVAL; + if (ub->params.types & UBLK_PARAM_TYPE_BPF) { + const struct ublk_param_bpf *p = &ub->params.bpf; + + if (!ublk_dev_support_bpf(ub)) + return -EINVAL; + + if (!(p->flags & UBLK_BPF_HAS_OPS_ID)) + return -EINVAL; + } else { + if (ublk_dev_support_bpf(ub)) + return -EINVAL; + } + return 0; } @@ -434,7 +447,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) static inline bool ublk_need_map_io(const struct ublk_queue *ubq) { - return !ublk_support_user_copy(ubq); + return !(ublk_support_user_copy(ubq) || ublk_support_bpf(ubq)); } static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 7579b0032a3c..8343e70bd723 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -24,7 +24,8 @@ | UBLK_F_CMD_IOCTL_ENCODE \ | UBLK_F_USER_COPY \ | UBLK_F_ZONED \ - | UBLK_F_USER_RECOVERY_FAIL_IO) + | UBLK_F_USER_RECOVERY_FAIL_IO \ + | UBLK_F_BPF) #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ | UBLK_F_USER_RECOVERY_REISSUE \ @@ -33,7 +34,8 @@ /* All UBLK_PARAM_TYPE_* should be included here */ #define UBLK_PARAM_TYPE_ALL \ (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \ - UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED) + UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED | \ + UBLK_PARAM_TYPE_BPF) enum { UBLK_BPF_IO_PREP = 0, @@ -193,12 +195,12 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, static inline bool ublk_support_bpf(const struct ublk_queue *ubq) { - return false; + return ubq->flags & UBLK_F_BPF; } static inline bool ublk_dev_support_bpf(const struct ublk_device *ub) { - return false; + return ub->dev_info.flags & UBLK_F_BPF; } struct ublk_device *ublk_get_device(struct ublk_device *ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index a8bc98bb69fc..27cf14e65cbc 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -207,6 +207,9 @@ */ #define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9) +/* ublk IO is handled by bpf prog */ +#define UBLK_F_BPF (1ULL << 10) + /* device state */ #define UBLK_S_DEV_DEAD 0 #define UBLK_S_DEV_LIVE 1 @@ -401,6 +404,13 @@ struct ublk_param_zoned { __u8 reserved[20]; }; +struct ublk_param_bpf { +#define UBLK_BPF_HAS_OPS_ID (1 << 0) + __u8 flags; + __u8 ops_id; + __u8 reserved[6]; +}; + struct ublk_params { /* * Total length of parameters, userspace has to set 'len' for both @@ -413,12 +423,14 @@ struct ublk_params { #define UBLK_PARAM_TYPE_DISCARD (1 << 1) #define UBLK_PARAM_TYPE_DEVT (1 << 2) #define UBLK_PARAM_TYPE_ZONED (1 << 3) +#define UBLK_PARAM_TYPE_BPF (1 << 4) __u32 types; /* types of parameter included */ struct ublk_param_basic basic; struct ublk_param_discard discard; struct ublk_param_devt devt; - struct ublk_param_zoned zoned; + struct ublk_param_zoned zoned; + struct ublk_param_bpf bpf; }; #endif From patchwork Tue Jan 7 12:04:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928802 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3D1E1F0E59; Tue, 7 Jan 2025 12:08:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251747; cv=none; b=XaaoKtJGdyWJYMBJVphzAfJTyn1XvDWyxaqK7lCUi+qrPDre3bKazGpGeZcWDysP17C0KrRJJCR3uy4DklrBogfWSCU6Z06/cO7XRY3J9UTpsYPwwkcEmJctfph628lyJLXa6agwzp6PrdieZQQOpi8sIi74nUGRWsURkylDZ0k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251747; c=relaxed/simple; bh=osXqyXtH97CQoBSxxuv95CbBgzFwd3GgTgbhpFmavPM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HKzBQpk9jpMbnxDgzd+Gt3wegAbVnvC4DyHgV1XKd/q40AFlV2rE3gms0lLRzuqDXqxkPdE2zI8rRCBrWwW1N3OzTBPPWaMor7amRn53ZKTjtrrWke4J0qm2inFDXpYD9K7NpT/33SVG6rlZmYfJ7VEG8oKIrjBkZULjP5j3iI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XblYy8Aq; arc=none smtp.client-ip=209.85.214.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XblYy8Aq" Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-21628b3fe7dso218298485ad.3; Tue, 07 Jan 2025 04:08:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251713; x=1736856513; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i4nYY5xKQLCD4XShiwpYWdqCAPwF8iOC/Na9Gt3GaZA=; b=XblYy8Aq3v2E0lH4WL4mO2OMCIFXNpQR9dtxM/RQ9zB3KpIoGjmDLfUPRapWeU7d7A YE2YtNBuGPOy6KF1+2xb/hUPzno6UqLTg7XPIDkyXWu6Wt8wPIWcvSxy2WavX0CkGEar E5E3YTEivsz6qhegKbEfJ4wZSmcvM6Y7V4podKuSo/09YMeUjYMa65G4af44xXLTJGtQ pAIZE0qW5u6Kwhnpu7MD/MT+/GLfuiWu8Mm4n8djUUpdPku/ffVaXMv1lPeTUQrwClsI JqiDKRvdmj5k7triPWnB8tui9NOFk40ZPCS27p33H8YsMOK5Qw+BDaoX9125BizR655q 94uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251713; x=1736856513; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i4nYY5xKQLCD4XShiwpYWdqCAPwF8iOC/Na9Gt3GaZA=; b=eX9wBCtZCoPt0AShLGs/MAQuaBnMR91T+rzcaCstkgNeESvQtlrZ6W6RYo9pl1o9AJ 4s8WwjHhDb6O6SjTnSCOTz4xQYczHa39qqCXDdptsNo73jHPK7S0OIYeNlYZHLBuR0E1 lNOdyIQj7+azPoEMa6cKPyURFjsegf/oDJxlax9z1vQsh+QG1c/pGXIVvB1UOTPE7spK uksU7FxbEjgka/XEhe2YlfHDy1K29WBs0VQPFf3hSPtKUExqZtVOsM83YDb6gBSTf9ir y26b1Y/pV5sMhZsEujkBiXAjEvDFA77Q++mhcvMYaxI6aWZGzFW/wrsne/MVS0hmtVV3 NpuA== X-Forwarded-Encrypted: i=1; AJvYcCWiR5u80Ie7NsgYdNVtt7DkcK04G2loVthwsbxxawqrZoTdyXnm+8XcX02uz6/daAzib8uxuR7FuwF3Yw==@vger.kernel.org X-Gm-Message-State: AOJu0YzbZeD8+cGn2JW1sZwfgymNC+An86KzGVHi5c3qmsUcm5rjraWP M9g1TGL/Gl5KArcfIvLfTt1cXQkrfNn3fnfdVXegoX21kaF5WyXe X-Gm-Gg: ASbGncs8vU4/oNJ9ySKQUi2qXE6OvkXgeYmTqz2OeTfrg3jcrJEces4K6SJJZX+vz15 8t1bSkNGBG7yMqr5Lp8ndZyj99I3bJHfen1VBoqEmCYXUBuQJ3HDrVVehn8rEipYw36DAbKfsh2 7OZZccx7rp9Jr3QFUIhQGhIZkzGDwJIL6ANJVVRLIXjDHfxIE2son0mU9JOLKTFTSl0aqybUhi0 sGX0zGLG9kyTrQYaByQ4NE9q+cdpEKateZe3Jx5w/wzaS2G4HtIuNDWxhbu4GWsZI+5 X-Google-Smtp-Source: AGHT+IHZlURjqM3y/sGFhkSGNyXbWo2j4kS0GhxooVVRsPDjTzPI7x/tvW7VJh703YmygTU8PIWRZw== X-Received: by 2002:a05:6a20:431d:b0:1e1:bf3d:a190 with SMTP id adf61e73a8af0-1e5e080c83fmr92989330637.30.1736251712681; Tue, 07 Jan 2025 04:08:32 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:32 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Date: Tue, 7 Jan 2025 20:04:03 +0800 Message-ID: <20250107120417.1237392-13-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Create one ublk null target over ublk-bpf, in which every block IO is handled by the `ublk_null` bpf prog. And the whole ublk implementation requires liburing. Meantime add basic read/write IO test over this ublk null disk, and make sure basic IO function works as expected. ublk/Makefile is stolen from tools/testing/selftests/hid/Makefile Signed-off-by: Ming Lei --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/ublk/.gitignore | 4 + tools/testing/selftests/ublk/Makefile | 228 +++ tools/testing/selftests/ublk/config | 2 + tools/testing/selftests/ublk/progs/ublk_bpf.h | 13 + .../selftests/ublk/progs/ublk_bpf_kfunc.h | 23 + .../testing/selftests/ublk/progs/ublk_null.c | 63 + tools/testing/selftests/ublk/test_common.sh | 72 + tools/testing/selftests/ublk/test_null_01.sh | 19 + tools/testing/selftests/ublk/test_null_02.sh | 23 + tools/testing/selftests/ublk/ublk_bpf.c | 1429 +++++++++++++++++ 12 files changed, 1878 insertions(+) create mode 100644 tools/testing/selftests/ublk/.gitignore create mode 100644 tools/testing/selftests/ublk/Makefile create mode 100644 tools/testing/selftests/ublk/config create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf.h create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h create mode 100644 tools/testing/selftests/ublk/progs/ublk_null.c create mode 100755 tools/testing/selftests/ublk/test_common.sh create mode 100755 tools/testing/selftests/ublk/test_null_01.sh create mode 100755 tools/testing/selftests/ublk/test_null_02.sh create mode 100644 tools/testing/selftests/ublk/ublk_bpf.c diff --git a/MAINTAINERS b/MAINTAINERS index 890f6195d03f..8ff8773377c4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23984,6 +23984,7 @@ S: Maintained F: Documentation/block/ublk.rst F: drivers/block/ublk/ F: include/uapi/linux/ublk_cmd.h +F: tools/testing/selftests/ublk/ UBSAN M: Kees Cook diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 2401e973c359..1c20256e662b 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -111,6 +111,7 @@ endif TARGETS += tmpfs TARGETS += tpm2 TARGETS += tty +TARGETS += ublk TARGETS += uevent TARGETS += user_events TARGETS += vDSO diff --git a/tools/testing/selftests/ublk/.gitignore b/tools/testing/selftests/ublk/.gitignore new file mode 100644 index 000000000000..865dca93cf75 --- /dev/null +++ b/tools/testing/selftests/ublk/.gitignore @@ -0,0 +1,4 @@ +ublk_bpf +*.skel.h +/tools +*-verify.state diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile new file mode 100644 index 000000000000..a95f317211e7 --- /dev/null +++ b/tools/testing/selftests/ublk/Makefile @@ -0,0 +1,228 @@ +# SPDX-License-Identifier: GPL-2.0 + +# based on tools/testing/selftest/bpf/Makefile +include ../../../build/Build.include +include ../../../scripts/Makefile.arch +include ../../../scripts/Makefile.include + +CXX ?= $(CROSS_COMPILE)g++ + +HOSTPKG_CONFIG := pkg-config + +CFLAGS += -g -O0 -rdynamic -Wall -Werror -I$(OUTPUT) +CFLAGS += -I$(OUTPUT)/tools/include + +LDLIBS += -lelf -lz -lrt -lpthread -luring + +# Silence some warnings when compiled with clang +ifneq ($(LLVM),) +CFLAGS += -Wno-unused-command-line-argument +endif + +TEST_PROGS := test_null_01.sh +TEST_PROGS += test_null_02.sh + +# Order correspond to 'make run_tests' order +TEST_GEN_PROGS_EXTENDED = ublk_bpf + +# Emit succinct information message describing current building step +# $1 - generic step name (e.g., CC, LINK, etc); +# $2 - optional "flavor" specifier; if provided, will be emitted as [flavor]; +# $3 - target (assumed to be file); only file name will be emitted; +# $4 - optional extra arg, emitted as-is, if provided. +ifeq ($(V),1) +Q = +msg = +else +Q = @ +msg = @printf ' %-8s%s %s%s\n' "$(1)" "$(if $(2), [$(2)])" "$(notdir $(3))" "$(if $(4), $(4))"; +MAKEFLAGS += --no-print-directory +submake_extras := feature_display=0 +endif + +# override lib.mk's default rules +OVERRIDE_TARGETS := 1 +override define CLEAN + $(call msg,CLEAN) + $(Q)$(RM) -r $(TEST_GEN_PROGS) + $(Q)$(RM) -r $(EXTRA_CLEAN) +endef + +include ../lib.mk + +TOOLSDIR := $(top_srcdir)/tools +LIBDIR := $(TOOLSDIR)/lib +BPFDIR := $(LIBDIR)/bpf +TOOLSINCDIR := $(TOOLSDIR)/include +BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool +SCRATCH_DIR := $(OUTPUT)/tools +BUILD_DIR := $(SCRATCH_DIR)/build +INCLUDE_DIR := $(SCRATCH_DIR)/include +BPFOBJ := $(BUILD_DIR)/libbpf/libbpf.a +ifneq ($(CROSS_COMPILE),) +HOST_BUILD_DIR := $(BUILD_DIR)/host +HOST_SCRATCH_DIR := $(OUTPUT)/host-tools +HOST_INCLUDE_DIR := $(HOST_SCRATCH_DIR)/include +else +HOST_BUILD_DIR := $(BUILD_DIR) +HOST_SCRATCH_DIR := $(SCRATCH_DIR) +HOST_INCLUDE_DIR := $(INCLUDE_DIR) +endif +HOST_BPFOBJ := $(HOST_BUILD_DIR)/libbpf/libbpf.a +RESOLVE_BTFIDS := $(HOST_BUILD_DIR)/resolve_btfids/resolve_btfids + +VMLINUX_BTF_PATHS ?= /sys/kernel/btf/ublk_drv +VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS)))) +ifeq ($(VMLINUX_BTF),) +$(error Cannot find a vmlinux for VMLINUX_BTF at any of "$(VMLINUX_BTF_PATHS)") +endif + +# Define simple and short `make test_progs`, `make test_sysctl`, etc targets +# to build individual tests. +# NOTE: Semicolon at the end is critical to override lib.mk's default static +# rule for binaries. +$(notdir $(TEST_GEN_PROGS)): %: $(OUTPUT)/% ; + +# sort removes libbpf duplicates when not cross-building +MAKE_DIRS := $(sort $(BUILD_DIR)/libbpf $(HOST_BUILD_DIR)/libbpf \ + $(HOST_BUILD_DIR)/bpftool $(HOST_BUILD_DIR)/resolve_btfids \ + $(INCLUDE_DIR)) +$(MAKE_DIRS): + $(call msg,MKDIR,,$@) + $(Q)mkdir -p $@ + +# LLVM's ld.lld doesn't support all the architectures, so use it only on x86 +ifeq ($(SRCARCH),x86) +LLD := lld +else +LLD := ld +endif + +DEFAULT_BPFTOOL := $(HOST_SCRATCH_DIR)/sbin/bpftool + +TEST_GEN_PROGS_EXTENDED += $(DEFAULT_BPFTOOL) + +$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(BPFOBJ) + +BPFTOOL ?= $(DEFAULT_BPFTOOL) +$(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \ + $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/bpftool + $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \ + ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD) \ + EXTRA_CFLAGS='-g -O0' \ + OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \ + LIBBPF_OUTPUT=$(HOST_BUILD_DIR)/libbpf/ \ + LIBBPF_DESTDIR=$(HOST_SCRATCH_DIR)/ \ + prefix= DESTDIR=$(HOST_SCRATCH_DIR)/ install-bin + +$(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \ + | $(BUILD_DIR)/libbpf + $(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(BUILD_DIR)/libbpf/ \ + EXTRA_CFLAGS='-g -O0' \ + DESTDIR=$(SCRATCH_DIR) prefix= all install_headers + +ifneq ($(BPFOBJ),$(HOST_BPFOBJ)) +$(HOST_BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \ + | $(HOST_BUILD_DIR)/libbpf + $(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) \ + EXTRA_CFLAGS='-g -O0' ARCH= CROSS_COMPILE= \ + OUTPUT=$(HOST_BUILD_DIR)/libbpf/ CC=$(HOSTCC) LD=$(HOSTLD) \ + DESTDIR=$(HOST_SCRATCH_DIR)/ prefix= all install_headers +endif + +$(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR) +ifeq ($(VMLINUX_H),) + $(call msg,GEN,,$@) + $(Q)$(BPFTOOL) btf dump file $(VMLINUX_BTF) format c > $@ +else + $(call msg,CP,,$@) + $(Q)cp "$(VMLINUX_H)" $@ +endif + +$(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids \ + $(TOOLSDIR)/bpf/resolve_btfids/main.c \ + $(TOOLSDIR)/lib/rbtree.c \ + $(TOOLSDIR)/lib/zalloc.c \ + $(TOOLSDIR)/lib/string.c \ + $(TOOLSDIR)/lib/ctype.c \ + $(TOOLSDIR)/lib/str_error_r.c + $(Q)$(MAKE) $(submake_extras) -C $(TOOLSDIR)/bpf/resolve_btfids \ + CC=$(HOSTCC) LD=$(HOSTLD) AR=$(HOSTAR) \ + LIBBPF_INCLUDE=$(HOST_INCLUDE_DIR) \ + OUTPUT=$(HOST_BUILD_DIR)/resolve_btfids/ BPFOBJ=$(HOST_BPFOBJ) + +# Get Clang's default includes on this system, as opposed to those seen by +# '--target=bpf'. This fixes "missing" files on some architectures/distros, +# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc. +# +# Use '-idirafter': Don't interfere with include mechanics except where the +# build would have failed anyways. +define get_sys_includes +$(shell $(1) -v -E - &1 \ + | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \ +$(shell $(1) -dM -E - $@ + +$(OUTPUT)/%.o: %.c $(BPF_SKELS) + $(call msg,CC,,$@) + $(Q)$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@ + +$(OUTPUT)/%: $(OUTPUT)/%.o + $(call msg,BINARY,,$@) + $(Q)$(LINK.c) $^ $(LDLIBS) -o $@ + +EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) feature bpftool \ + $(addprefix $(OUTPUT)/,*.o *.skel.h no_alu32) diff --git a/tools/testing/selftests/ublk/config b/tools/testing/selftests/ublk/config new file mode 100644 index 000000000000..295b1f5c6c6c --- /dev/null +++ b/tools/testing/selftests/ublk/config @@ -0,0 +1,2 @@ +CONFIG_BLK_DEV_UBLK=m +CONFIG_UBLK_BPF=y diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf.h b/tools/testing/selftests/ublk/progs/ublk_bpf.h new file mode 100644 index 000000000000..a302a645b096 --- /dev/null +++ b/tools/testing/selftests/ublk/progs/ublk_bpf.h @@ -0,0 +1,13 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef UBLK_BPF_GEN_H +#define UBLK_BPF_GEN_H + +#include "ublk_bpf_kfunc.h" + +#ifdef DEBUG +#define BPF_DBG(...) bpf_printk(__VA_ARGS__) +#else +#define BPF_DBG(...) +#endif + +#endif diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h new file mode 100644 index 000000000000..acab490d933c --- /dev/null +++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef UBLK_BPF_INTERNAL_H +#define UBLK_BPF_INTERNAL_H + +#ifndef BITS_PER_LONG +#define BITS_PER_LONG (sizeof(unsigned long) * 8) +#endif + +#define UBLK_BPF_DISPOSITION_BITS (4) +#define UBLK_BPF_DISPOSITION_SHIFT (BITS_PER_LONG - UBLK_BPF_DISPOSITION_BITS) + +static inline ublk_bpf_return_t ublk_bpf_return_val(enum ublk_bpf_disposition rc, + unsigned int bytes) +{ + return (ublk_bpf_return_t) ((unsigned long)rc << UBLK_BPF_DISPOSITION_SHIFT) | bytes; +} + +extern const struct ublksrv_io_desc *ublk_bpf_get_iod(const struct ublk_bpf_io *io) __ksym; +extern void ublk_bpf_complete_io(const struct ublk_bpf_io *io, int res) __ksym; +extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym; +extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym; +extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym; +#endif diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c new file mode 100644 index 000000000000..3225b52dcd24 --- /dev/null +++ b/tools/testing/selftests/ublk/progs/ublk_null.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include +#include +#include + +//#define DEBUG +#include "ublk_bpf.h" + +/* libbpf v1.4.5 is required for struct_ops to work */ + +static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off) +{ + unsigned long off = -1, sects = -1; + const struct ublksrv_io_desc *iod; + int res; + + iod = ublk_bpf_get_iod(io); + if (iod) { + res = iod->nr_sectors << 9; + off = iod->start_sector; + sects = iod->nr_sectors; + } else + res = -EINVAL; + + BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d", + ublk_bpf_get_dev_id(io), + ublk_bpf_get_queue_id(io), + ublk_bpf_get_io_tag(io), + off, sects, res); + ublk_bpf_complete_io(io, res); + + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); +} + +SEC("struct_ops/ublk_bpf_queue_io_cmd") +ublk_bpf_return_t BPF_PROG(ublk_null_handle_io, struct ublk_bpf_io *io, unsigned int off) +{ + return __ublk_null_handle_io(io, off); +} + +SEC("struct_ops/ublk_bpf_attach_dev") +int BPF_PROG(ublk_null_attach_dev, int dev_id) +{ + return 0; +} + +SEC("struct_ops/ublk_bpf_detach_dev") +void BPF_PROG(ublk_null_detach_dev, int dev_id) +{ +} + +SEC(".struct_ops.link") +struct ublk_bpf_ops null_ublk_bpf_ops = { + .id = 0, + .queue_io_cmd = (void *)ublk_null_handle_io, + .attach_dev = (void *)ublk_null_attach_dev, + .detach_dev = (void *)ublk_null_detach_dev, +}; + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh new file mode 100755 index 000000000000..466b82e77860 --- /dev/null +++ b/tools/testing/selftests/ublk/test_common.sh @@ -0,0 +1,72 @@ +#!/bin/bash + +_check_root() { + local ksft_skip=4 + + if [ $UID != 0 ]; then + echo please run this as root >&2 + exit $ksft_skip + fi +} + +_remove_ublk_devices() { + ${UBLK_PROG} del -a +} + +_get_ublk_dev_state() { + ${UBLK_PROG} list -n "$1" | grep "state" | awk '{print $11}' +} + +_get_ublk_daemon_pid() { + ${UBLK_PROG} list -n "$1" | grep "pid" | awk '{print $7}' +} + +_prep_test() { + _check_root + export UBLK_PROG=$(pwd)/ublk_bpf + _remove_ublk_devices +} + +_prep_bpf_test() { + _prep_test + _reg_bpf_prog $@ +} + +_show_result() +{ + if [ $2 -ne 0 ]; then + echo "$1 : [FAIL]" + else + echo "$1 : [PASS]" + fi +} + +_cleanup_test() { + _remove_ublk_devices +} + +_cleanup_bpf_test() { + _cleanup_test + _unreg_bpf_prog $@ +} + +_reg_bpf_prog() { + ${UBLK_PROG} reg -t $1 $2 + if [ $? -ne 0 ]; then + echo "fail to register bpf prog $1 $2" + exit -1 + fi +} + +_unreg_bpf_prog() { + ${UBLK_PROG} unreg -t $1 +} + +_add_ublk_dev() { + ${UBLK_PROG} add $@ + if [ $? -ne 0 ]; then + echo "fail to add ublk dev $@" + exit -1 + fi + udevadm settle +} diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh new file mode 100755 index 000000000000..eecb4278e894 --- /dev/null +++ b/tools/testing/selftests/ublk/test_null_01.sh @@ -0,0 +1,19 @@ +#!/bin/bash + +. test_common.sh + +TID="null_01" +ERR_CODE=0 + +_prep_test + +# add single ublk null disk without bpf prog +_add_ublk_dev -t null -n 0 --quiet + +# run fio over the two disks +fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1 +ERR_CODE=$? + +_cleanup_test + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh new file mode 100755 index 000000000000..eb0da89f3461 --- /dev/null +++ b/tools/testing/selftests/ublk/test_null_02.sh @@ -0,0 +1,23 @@ +#!/bin/bash + +. test_common.sh + +TID="null_02" +ERR_CODE=0 + +# prepare & register and pin bpf prog +_prep_bpf_test "null" ublk_null.bpf.o + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t null -n 0 --bpf_prog 0 --quiet +_add_ublk_dev -t null -n 1 --bpf_prog 0 --quiet + +# run fio over the two disks +fio --name=job1 --filename=/dev/ublkb0 --rw=readwrite --size=256M \ + --name=job2 --filename=/dev/ublkb1 --rw=readwrite --size=256M > /dev/null 2>&1 +ERR_CODE=$? + +# cleanup & unregister and unpin the bpf prog +_cleanup_bpf_test "null" + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c new file mode 100644 index 000000000000..2d923e42845d --- /dev/null +++ b/tools/testing/selftests/ublk/ublk_bpf.c @@ -0,0 +1,1429 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Description: uring_cmd based ublk + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#define __maybe_unused __attribute__((unused)) +#define MAX_BACK_FILES 4 +#ifndef min +#define min(a, b) ((a) < (b) ? (a) : (b)) +#endif +#define UBLK_BPF_PIN_PATH "ublk" + +/****************** part 1: libublk ********************/ + +#define CTRL_DEV "/dev/ublk-control" +#define UBLKC_DEV "/dev/ublkc" +#define UBLKB_DEV "/dev/ublkb" +#define UBLK_CTRL_RING_DEPTH 32 + +/* queue idle timeout */ +#define UBLKSRV_IO_IDLE_SECS 20 + +#define UBLK_IO_MAX_BYTES 65536 +#define UBLK_MAX_QUEUES 4 +#define UBLK_QUEUE_DEPTH 128 + +#define UBLK_DBG_DEV (1U << 0) +#define UBLK_DBG_QUEUE (1U << 1) +#define UBLK_DBG_IO_CMD (1U << 2) +#define UBLK_DBG_IO (1U << 3) +#define UBLK_DBG_CTRL_CMD (1U << 4) +#define UBLK_LOG (1U << 5) + +struct ublk_dev; +struct ublk_queue; + +struct dev_ctx { + char tgt_type[16]; + unsigned long flags; + unsigned nr_hw_queues; + unsigned queue_depth; + int dev_id; + int nr_files; + char *files[MAX_BACK_FILES]; + int bpf_prog_id; + unsigned int logging:1; + unsigned int all:1; +}; + +struct ublk_ctrl_cmd_data { + __u32 cmd_op; +#define CTRL_CMD_HAS_DATA 1 +#define CTRL_CMD_HAS_BUF 2 + __u32 flags; + + __u64 data[2]; + __u64 addr; + __u32 len; +}; + +struct ublk_io { + char *buf_addr; + +#define UBLKSRV_NEED_FETCH_RQ (1UL << 0) +#define UBLKSRV_NEED_COMMIT_RQ_COMP (1UL << 1) +#define UBLKSRV_IO_FREE (1UL << 2) + unsigned short flags; + unsigned short refs; /* used by target code only */ + + int result; +}; + +struct ublk_tgt_ops { + const char *name; + int (*init_tgt)(struct ublk_dev *); + void (*deinit_tgt)(struct ublk_dev *); + + int (*queue_io)(struct ublk_queue *, int tag); + void (*tgt_io_done)(struct ublk_queue *, + int tag, const struct io_uring_cqe *); +}; + +struct ublk_tgt { + unsigned long dev_size; + unsigned int sq_depth; + unsigned int cq_depth; + const struct ublk_tgt_ops *ops; + struct ublk_params params; + char backing_file[1024 - 8 - sizeof(struct ublk_params)]; +}; + +struct ublk_queue { + int q_id; + int q_depth; + unsigned int cmd_inflight; + unsigned int io_inflight; + struct ublk_dev *dev; + const struct ublk_tgt_ops *tgt_ops; + char *io_cmd_buf; + struct io_uring ring; + struct ublk_io ios[UBLK_QUEUE_DEPTH]; +#define UBLKSRV_QUEUE_STOPPING (1U << 0) +#define UBLKSRV_QUEUE_IDLE (1U << 1) +#define UBLKSRV_NO_BUF (1U << 2) + unsigned state; + pid_t tid; + pthread_t thread; +}; + +struct ublk_dev { + struct ublk_tgt tgt; + struct ublksrv_ctrl_dev_info dev_info; + struct ublk_queue q[UBLK_MAX_QUEUES]; + + int fds[2]; /* fds[0] points to /dev/ublkcN */ + int nr_fds; + int ctrl_fd; + struct io_uring ring; + + int bpf_prog_id; +}; + +#ifndef offsetof +#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) +#endif + +#ifndef container_of +#define container_of(ptr, type, member) ({ \ + unsigned long __mptr = (unsigned long)(ptr); \ + ((type *)(__mptr - offsetof(type, member))); }) +#endif + +#define round_up(val, rnd) \ + (((val) + ((rnd) - 1)) & ~((rnd) - 1)) + +static unsigned int ublk_dbg_mask = UBLK_LOG; + +static const struct ublk_tgt_ops *ublk_find_tgt(const char *name); + +static inline int is_target_io(__u64 user_data) +{ + return (user_data & (1ULL << 63)) != 0; +} + +static inline __u64 build_user_data(unsigned tag, unsigned op, + unsigned tgt_data, unsigned is_target_io) +{ + assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16)); + + return tag | (op << 16) | (tgt_data << 24) | (__u64)is_target_io << 63; +} + +static inline unsigned int user_data_to_tag(__u64 user_data) +{ + return user_data & 0xffff; +} + +static inline unsigned int user_data_to_op(__u64 user_data) +{ + return (user_data >> 16) & 0xff; +} + +static void ublk_err(const char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); +} + +static void ublk_log(const char *fmt, ...) +{ + if (ublk_dbg_mask & UBLK_LOG) { + va_list ap; + + va_start(ap, fmt); + vfprintf(stdout, fmt, ap); + } +} + +static void ublk_dbg(int level, const char *fmt, ...) +{ + if (level & ublk_dbg_mask) { + va_list ap; + va_start(ap, fmt); + vfprintf(stdout, fmt, ap); + } +} + +static inline void *ublk_get_sqe_cmd(const struct io_uring_sqe *sqe) +{ + return (void *)&sqe->cmd; +} + +static inline void ublk_mark_io_done(struct ublk_io *io, int res) +{ + io->flags |= (UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_IO_FREE); + io->result = res; +} + +static inline const struct ublksrv_io_desc *ublk_get_iod( + const struct ublk_queue *q, int tag) +{ + return (struct ublksrv_io_desc *) + &(q->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]); +} + +static inline void ublk_set_sqe_cmd_op(struct io_uring_sqe *sqe, + __u32 cmd_op) +{ + __u32 *addr = (__u32 *)&sqe->off; + + addr[0] = cmd_op; + addr[1] = 0; +} + +static inline int ublk_setup_ring(struct io_uring *r, int depth, + int cq_depth, unsigned flags) +{ + struct io_uring_params p; + + memset(&p, 0, sizeof(p)); + p.flags = flags | IORING_SETUP_CQSIZE; + p.cq_entries = cq_depth; + + return io_uring_queue_init_params(depth, r, &p); +} + +static void ublk_ctrl_init_cmd(struct ublk_dev *dev, + struct io_uring_sqe *sqe, + struct ublk_ctrl_cmd_data *data) +{ + struct ublksrv_ctrl_dev_info *info = &dev->dev_info; + struct ublksrv_ctrl_cmd *cmd = (struct ublksrv_ctrl_cmd *)ublk_get_sqe_cmd(sqe); + + sqe->fd = dev->ctrl_fd; + sqe->opcode = IORING_OP_URING_CMD; + sqe->ioprio = 0; + + if (data->flags & CTRL_CMD_HAS_BUF) { + cmd->addr = data->addr; + cmd->len = data->len; + } + + if (data->flags & CTRL_CMD_HAS_DATA) + cmd->data[0] = data->data[0]; + + cmd->dev_id = info->dev_id; + cmd->queue_id = -1; + + ublk_set_sqe_cmd_op(sqe, data->cmd_op); + + io_uring_sqe_set_data(sqe, cmd); +} + +static int __ublk_ctrl_cmd(struct ublk_dev *dev, + struct ublk_ctrl_cmd_data *data) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + int ret = -EINVAL; + + sqe = io_uring_get_sqe(&dev->ring); + if (!sqe) { + ublk_err("%s: can't get sqe ret %d\n", __func__, ret); + return ret; + } + + ublk_ctrl_init_cmd(dev, sqe, data); + + ret = io_uring_submit(&dev->ring); + if (ret < 0) { + ublk_err("uring submit ret %d\n", ret); + return ret; + } + + ret = io_uring_wait_cqe(&dev->ring, &cqe); + if (ret < 0) { + ublk_err("wait cqe: %s\n", strerror(-ret)); + return ret; + } + io_uring_cqe_seen(&dev->ring, cqe); + + return cqe->res; +} + +static int ublk_ctrl_stop_dev(struct ublk_dev *dev) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_CMD_STOP_DEV, + }; + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_start_dev(struct ublk_dev *dev, + int daemon_pid) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_START_DEV, + .flags = CTRL_CMD_HAS_DATA, + }; + + dev->dev_info.ublksrv_pid = data.data[0] = daemon_pid; + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_add_dev(struct ublk_dev *dev) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_ADD_DEV, + .flags = CTRL_CMD_HAS_BUF, + .addr = (__u64) (uintptr_t) &dev->dev_info, + .len = sizeof(struct ublksrv_ctrl_dev_info), + }; + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_del_dev(struct ublk_dev *dev) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_DEL_DEV, + .flags = 0, + }; + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_get_info(struct ublk_dev *dev) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_GET_DEV_INFO, + .flags = CTRL_CMD_HAS_BUF, + .addr = (__u64) (uintptr_t) &dev->dev_info, + .len = sizeof(struct ublksrv_ctrl_dev_info), + }; + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_set_params(struct ublk_dev *dev, + struct ublk_params *params) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_SET_PARAMS, + .flags = CTRL_CMD_HAS_BUF, + .addr = (__u64) (uintptr_t) params, + .len = sizeof(*params), + }; + params->len = sizeof(*params); + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_get_params(struct ublk_dev *dev, + struct ublk_params *params) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_CMD_GET_PARAMS, + .flags = CTRL_CMD_HAS_BUF, + .addr = (__u64)params, + .len = sizeof(*params), + }; + + params->len = sizeof(*params); + + return __ublk_ctrl_cmd(dev, &data); +} + +static int ublk_ctrl_get_features(struct ublk_dev *dev, + __u64 *features) +{ + struct ublk_ctrl_cmd_data data = { + .cmd_op = UBLK_U_CMD_GET_FEATURES, + .flags = CTRL_CMD_HAS_BUF, + .addr = (__u64) (uintptr_t) features, + .len = sizeof(*features), + }; + + return __ublk_ctrl_cmd(dev, &data); +} + +static const char *ublk_dev_state_desc(struct ublk_dev *dev) +{ + switch (dev->dev_info.state) { + case UBLK_S_DEV_DEAD: + return "DEAD"; + case UBLK_S_DEV_LIVE: + return "LIVE"; + case UBLK_S_DEV_QUIESCED: + return "QUIESCED"; + default: + return "UNKNOWN"; + }; +} + +static void ublk_ctrl_dump(struct ublk_dev *dev, bool show_queue) +{ + struct ublksrv_ctrl_dev_info *info = &dev->dev_info; + int ret; + struct ublk_params p; + + ret = ublk_ctrl_get_params(dev, &p); + if (ret < 0) { + ublk_err("failed to get params %m\n"); + return; + } + + ublk_log("dev id %d: nr_hw_queues %d queue_depth %d block size %d dev_capacity %lld\n", + info->dev_id, + info->nr_hw_queues, info->queue_depth, + 1 << p.basic.logical_bs_shift, p.basic.dev_sectors); + ublk_log("\tmax rq size %d daemon pid %d flags 0x%llx state %s\n", + info->max_io_buf_bytes, + info->ublksrv_pid, info->flags, + ublk_dev_state_desc(dev)); + if (show_queue) { + int i; + + for (i = 0; i < dev->dev_info.nr_hw_queues; i++) + ublk_log("\tqueue 0 tid: %d\n", dev->q[i].tid); + } + fflush(stdout); +} + +static void ublk_ctrl_deinit(struct ublk_dev *dev) +{ + close(dev->ctrl_fd); + free(dev); +} + +static struct ublk_dev *ublk_ctrl_init(void) +{ + struct ublk_dev *dev = (struct ublk_dev *)calloc(1, sizeof(*dev)); + struct ublksrv_ctrl_dev_info *info = &dev->dev_info; + int ret; + + dev->ctrl_fd = open(CTRL_DEV, O_RDWR); + if (dev->ctrl_fd < 0) { + free(dev); + return NULL; + } + + info->max_io_buf_bytes = UBLK_IO_MAX_BYTES; + + ret = ublk_setup_ring(&dev->ring, UBLK_CTRL_RING_DEPTH, + UBLK_CTRL_RING_DEPTH, IORING_SETUP_SQE128); + if (ret < 0) { + ublk_err("queue_init: %s\n", strerror(-ret)); + free(dev); + return NULL; + } + dev->nr_fds = 1; + + return dev; +} + +static int __ublk_queue_cmd_buf_sz(unsigned depth) +{ + int size = depth * sizeof(struct ublksrv_io_desc); + unsigned int page_sz = getpagesize(); + + return round_up(size, page_sz); +} + +static int ublk_queue_max_cmd_buf_sz(void) +{ + return __ublk_queue_cmd_buf_sz(UBLK_MAX_QUEUE_DEPTH); +} + +static int ublk_queue_cmd_buf_sz(struct ublk_queue *q) +{ + return __ublk_queue_cmd_buf_sz(q->q_depth); +} + +static void ublk_queue_deinit(struct ublk_queue *q) +{ + int i; + int nr_ios = q->q_depth; + + io_uring_unregister_ring_fd(&q->ring); + + if (q->ring.ring_fd > 0) { + io_uring_unregister_files(&q->ring); + close(q->ring.ring_fd); + q->ring.ring_fd = -1; + } + + if (q->io_cmd_buf) + munmap(q->io_cmd_buf, ublk_queue_cmd_buf_sz(q)); + + for (i = 0; i < nr_ios; i++) + free(q->ios[i].buf_addr); +} + +static int ublk_queue_init(struct ublk_queue *q) +{ + struct ublk_dev *dev = q->dev; + int depth = dev->dev_info.queue_depth; + int i, ret = -1; + int cmd_buf_size, io_buf_size; + unsigned long off; + int ring_depth = dev->tgt.sq_depth, cq_depth = dev->tgt.cq_depth; + + q->tgt_ops = dev->tgt.ops; + q->state = 0; + q->q_depth = depth; + q->cmd_inflight = 0; + q->tid = gettid(); + if (dev->dev_info.flags & UBLK_F_BPF) + q->state |= UBLKSRV_NO_BUF; + + cmd_buf_size = ublk_queue_cmd_buf_sz(q); + off = UBLKSRV_CMD_BUF_OFFSET + q->q_id * ublk_queue_max_cmd_buf_sz(); + q->io_cmd_buf = (char *)mmap(0, cmd_buf_size, PROT_READ, + MAP_SHARED | MAP_POPULATE, dev->fds[0], off); + if (q->io_cmd_buf == MAP_FAILED) { + ublk_err("ublk dev %d queue %d map io_cmd_buf failed %m\n", + q->dev->dev_info.dev_id, q->q_id); + goto fail; + } + + io_buf_size = dev->dev_info.max_io_buf_bytes; + for (i = 0; i < q->q_depth; i++) { + q->ios[i].buf_addr = NULL; + q->ios[i].flags = UBLKSRV_NEED_FETCH_RQ | UBLKSRV_IO_FREE; + + if (q->state & UBLKSRV_NO_BUF) + continue; + + if (posix_memalign((void **)&q->ios[i].buf_addr, + getpagesize(), io_buf_size)) { + ublk_err("ublk dev %d queue %d io %d posix_memalign failed %m\n", + dev->dev_info.dev_id, q->q_id, i); + goto fail; + } + } + + ret = ublk_setup_ring(&q->ring, ring_depth, cq_depth, + IORING_SETUP_COOP_TASKRUN); + if (ret < 0) { + ublk_err("ublk dev %d queue %d setup io_uring failed %d\n", + q->dev->dev_info.dev_id, q->q_id, ret); + goto fail; + } + + io_uring_register_ring_fd(&q->ring); + + ret = io_uring_register_files(&q->ring, dev->fds, dev->nr_fds); + if (ret) { + ublk_err("ublk dev %d queue %d register files failed %d\n", + q->dev->dev_info.dev_id, q->q_id, ret); + goto fail; + } + + return 0; + fail: + ublk_queue_deinit(q); + ublk_err("ublk dev %d queue %d failed\n", + dev->dev_info.dev_id, q->q_id); + return -ENOMEM; +} + +static int ublk_dev_prep(struct ublk_dev *dev) +{ + int dev_id = dev->dev_info.dev_id; + char buf[64]; + int ret = 0; + + snprintf(buf, 64, "%s%d", UBLKC_DEV, dev_id); + dev->fds[0] = open(buf, O_RDWR); + if (dev->fds[0] < 0) { + ret = -EBADF; + ublk_err("can't open %s, ret %d\n", buf, dev->fds[0]); + goto fail; + } + + if (dev->tgt.ops->init_tgt) + ret = dev->tgt.ops->init_tgt(dev); + + return ret; +fail: + close(dev->fds[0]); + return ret; +} + +static void ublk_dev_unprep(struct ublk_dev *dev) +{ + if (dev->tgt.ops->deinit_tgt) + dev->tgt.ops->deinit_tgt(dev); + close(dev->fds[0]); +} + +static int ublk_queue_io_cmd(struct ublk_queue *q, + struct ublk_io *io, unsigned tag) +{ + struct ublksrv_io_cmd *cmd; + struct io_uring_sqe *sqe; + unsigned int cmd_op = 0; + __u64 user_data; + + /* only freed io can be issued */ + if (!(io->flags & UBLKSRV_IO_FREE)) + return 0; + + /* we issue because we need either fetching or committing */ + if (!(io->flags & + (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP))) + return 0; + + if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP) + cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ; + else if (io->flags & UBLKSRV_NEED_FETCH_RQ) + cmd_op = UBLK_U_IO_FETCH_REQ; + + sqe = io_uring_get_sqe(&q->ring); + if (!sqe) { + ublk_err("%s: run out of sqe %d, tag %d\n", + __func__, q->q_id, tag); + return -1; + } + + cmd = (struct ublksrv_io_cmd *)ublk_get_sqe_cmd(sqe); + + if (cmd_op == UBLK_U_IO_COMMIT_AND_FETCH_REQ) + cmd->result = io->result; + + /* These fields should be written once, never change */ + ublk_set_sqe_cmd_op(sqe, cmd_op); + sqe->fd = 0; /* dev->fds[0] */ + sqe->opcode = IORING_OP_URING_CMD; + sqe->flags = IOSQE_FIXED_FILE; + sqe->rw_flags = 0; + cmd->tag = tag; + cmd->q_id = q->q_id; + if (!(q->state & UBLKSRV_NO_BUF)) + cmd->addr = (__u64) (uintptr_t) io->buf_addr; + else + cmd->addr = 0; + + user_data = build_user_data(tag, _IOC_NR(cmd_op), 0, 0); + io_uring_sqe_set_data64(sqe, user_data); + + io->flags = 0; + + q->cmd_inflight += 1; + + ublk_dbg(UBLK_DBG_IO_CMD, "%s: (qid %d tag %u cmd_op %u) iof %x stopping %d\n", + __func__, q->q_id, tag, cmd_op, + io->flags, !!(q->state & UBLKSRV_QUEUE_STOPPING)); + return 1; +} + +__maybe_unused static int ublk_complete_io(struct ublk_queue *q, + unsigned tag, int res) +{ + struct ublk_io *io = &q->ios[tag]; + + ublk_mark_io_done(io, res); + + return ublk_queue_io_cmd(q, io, tag); +} + +static void ublk_submit_fetch_commands(struct ublk_queue *q) +{ + int i = 0; + + for (i = 0; i < q->q_depth; i++) + ublk_queue_io_cmd(q, &q->ios[i], i); +} + +static int ublk_queue_is_idle(struct ublk_queue *q) +{ + return !io_uring_sq_ready(&q->ring) && !q->io_inflight; +} + +static int ublk_queue_is_done(struct ublk_queue *q) +{ + return (q->state & UBLKSRV_QUEUE_STOPPING) && ublk_queue_is_idle(q); +} + +static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q, + struct io_uring_cqe *cqe) +{ + unsigned tag = user_data_to_tag(cqe->user_data); + + if (cqe->res < 0 && cqe->res != -EAGAIN) + ublk_err("%s: failed tgt io: res %d qid %u tag %u, cmd_op %u\n", + __func__, cqe->res, q->q_id, + user_data_to_tag(cqe->user_data), + user_data_to_op(cqe->user_data)); + + if (q->tgt_ops->tgt_io_done) + q->tgt_ops->tgt_io_done(q, tag, cqe); +} + +static void ublk_handle_cqe(struct io_uring *r, + struct io_uring_cqe *cqe, void *data) +{ + struct ublk_queue *q = container_of(r, struct ublk_queue, ring); + unsigned tag = user_data_to_tag(cqe->user_data); + unsigned cmd_op = user_data_to_op(cqe->user_data); + int fetch = (cqe->res != UBLK_IO_RES_ABORT) && + !(q->state & UBLKSRV_QUEUE_STOPPING); + struct ublk_io *io; + + ublk_dbg(UBLK_DBG_IO_CMD, "%s: res %d (qid %d tag %u cmd_op %u target %d) stopping %d\n", + __func__, cqe->res, q->q_id, tag, cmd_op, + is_target_io(cqe->user_data), + (q->state & UBLKSRV_QUEUE_STOPPING)); + + /* Don't retrieve io in case of target io */ + if (is_target_io(cqe->user_data)) { + ublksrv_handle_tgt_cqe(q, cqe); + return; + } + + io = &q->ios[tag]; + q->cmd_inflight--; + + if (!fetch) { + q->state |= UBLKSRV_QUEUE_STOPPING; + io->flags &= ~UBLKSRV_NEED_FETCH_RQ; + } + + if (cqe->res == UBLK_IO_RES_OK) { + assert(tag < q->q_depth); + if (q->tgt_ops->queue_io) + q->tgt_ops->queue_io(q, tag); + } else { + /* + * COMMIT_REQ will be completed immediately since no fetching + * piggyback is required. + * + * Marking IO_FREE only, then this io won't be issued since + * we only issue io with (UBLKSRV_IO_FREE | UBLKSRV_NEED_*) + * + * */ + io->flags = UBLKSRV_IO_FREE; + } +} + +static int ublk_reap_events_uring(struct io_uring *r) +{ + struct io_uring_cqe *cqe; + unsigned head; + int count = 0; + + io_uring_for_each_cqe(r, head, cqe) { + ublk_handle_cqe(r, cqe, NULL); + count += 1; + } + io_uring_cq_advance(r, count); + + return count; +} + +static int ublk_process_io(struct ublk_queue *q) +{ + int ret, reapped; + + ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: to_submit %d inflight cmd %u stopping %d\n", + q->dev->dev_info.dev_id, + q->q_id, io_uring_sq_ready(&q->ring), + q->cmd_inflight, + (q->state & UBLKSRV_QUEUE_STOPPING)); + + if (ublk_queue_is_done(q)) + return -ENODEV; + + ret = io_uring_submit_and_wait(&q->ring, 1); + reapped = ublk_reap_events_uring(&q->ring); + + ublk_dbg(UBLK_DBG_QUEUE, "submit result %d, reapped %d stop %d idle %d\n", + ret, reapped, (q->state & UBLKSRV_QUEUE_STOPPING), + (q->state & UBLKSRV_QUEUE_IDLE)); + + return reapped; +} + +static void *ublk_io_handler_fn(void *data) +{ + struct ublk_queue *q = data; + int dev_id = q->dev->dev_info.dev_id; + int ret; + + ret = ublk_queue_init(q); + if (ret) { + ublk_err("ublk dev %d queue %d init queue failed\n", + dev_id, q->q_id); + return NULL; + } + ublk_dbg(UBLK_DBG_QUEUE, "tid %d: ublk dev %d queue %d started\n", + q->tid, dev_id, q->q_id); + + /* submit all io commands to ublk driver */ + ublk_submit_fetch_commands(q); + do { + if (ublk_process_io(q) < 0) + break; + } while (1); + + ublk_dbg(UBLK_DBG_QUEUE, "ublk dev %d queue %d exited\n", dev_id, q->q_id); + ublk_queue_deinit(q); + return NULL; +} + +static void ublk_set_parameters(struct ublk_dev *dev) +{ + int ret; + + ret = ublk_ctrl_set_params(dev, &dev->tgt.params); + if (ret) + ublk_err("dev %d set basic parameter failed %d\n", + dev->dev_info.dev_id, ret); +} + +static int ublk_start_daemon(struct ublk_dev *dev) +{ + int ret, i; + void *thread_ret; + const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info; + + if (daemon(1, 1) < 0) + return -errno; + + ublk_dbg(UBLK_DBG_DEV, "%s enter\n", __func__); + + ret = ublk_dev_prep(dev); + if (ret) + return ret; + + for (i = 0; i < dinfo->nr_hw_queues; i++) { + dev->q[i].dev = dev; + dev->q[i].q_id = i; + pthread_create(&dev->q[i].thread, NULL, + ublk_io_handler_fn, + &dev->q[i]); + } + + /* everything is fine now, start us */ + ublk_set_parameters(dev); + ret = ublk_ctrl_start_dev(dev, getpid()); + if (ret < 0) { + ublk_err("%s: ublk_ctrl_start_dev failed: %d\n", __func__, ret); + goto fail; + } + + ublk_ctrl_get_info(dev); + ublk_ctrl_dump(dev, true); + + /* wait until we are terminated */ + for (i = 0; i < dinfo->nr_hw_queues; i++) + pthread_join(dev->q[i].thread, &thread_ret); + fail: + ublk_dev_unprep(dev); + ublk_dbg(UBLK_DBG_DEV, "%s exit\n", __func__); + + return ret; +} + +static int wait_ublk_dev(char *dev_name, int evt_mask, unsigned timeout) +{ +#define EV_SIZE (sizeof(struct inotify_event)) +#define EV_BUF_LEN (128 * (EV_SIZE + 16)) + struct pollfd pfd; + int fd, wd; + int ret = -EINVAL; + + fd = inotify_init(); + if (fd < 0) { + ublk_dbg(UBLK_DBG_DEV, "%s: inotify init failed\n", __func__); + return fd; + } + + wd = inotify_add_watch(fd, "/dev", evt_mask); + if (wd == -1) { + ublk_dbg(UBLK_DBG_DEV, "%s: add watch for /dev failed\n", __func__); + goto fail; + } + + pfd.fd = fd; + pfd.events = POLL_IN; + while (1) { + int i = 0; + char buffer[EV_BUF_LEN]; + ret = poll(&pfd, 1, 1000 * timeout); + + if (ret == -1) { + ublk_err("%s: poll inotify failed: %d\n", __func__, ret); + goto rm_watch; + } else if (ret == 0) { + ublk_err("%s: poll inotify timeout\n", __func__); + ret = -ETIMEDOUT; + goto rm_watch; + } + + ret = read(fd, buffer, EV_BUF_LEN); + if (ret < 0) { + ublk_err("%s: read inotify fd failed\n", __func__); + goto rm_watch; + } + + while (i < ret) { + struct inotify_event *event = (struct inotify_event *)&buffer[i]; + + ublk_dbg(UBLK_DBG_DEV, "%s: inotify event %x %s\n", + __func__, event->mask, event->name); + if (event->mask & evt_mask) { + if (!strcmp(event->name, dev_name)) { + ret = 0; + goto rm_watch; + } + } + i += EV_SIZE + event->len; + } + } +rm_watch: + inotify_rm_watch(fd, wd); +fail: + close(fd); + return ret; +} + +static int ublk_stop_io_daemon(const struct ublk_dev *dev) +{ + int daemon_pid = dev->dev_info.ublksrv_pid; + int dev_id = dev->dev_info.dev_id; + char ublkc[64]; + int ret; + + /* daemon may be dead already */ + if (kill(daemon_pid, 0) < 0) + goto wait; + + /* + * Wait until ublk char device is closed, when our daemon is shutdown + */ + snprintf(ublkc, sizeof(ublkc), "%s%d", "ublkc", dev_id); + ret = wait_ublk_dev(ublkc, IN_CLOSE_WRITE, 3); + /* double check and inotify may not be 100% reliable */ + if (ret == -ETIMEDOUT) + /* the daemon doesn't exist now if kill(0) fails */ + ret = kill(daemon_pid, 0) < 0; +wait: + waitpid(daemon_pid, NULL, 0); + ublk_dbg(UBLK_DBG_DEV, "%s: pid %d dev_id %d ret %d\n", + __func__, daemon_pid, dev_id, ret); + + return ret; +} + +static int cmd_dev_add(struct dev_ctx *ctx) +{ + char *tgt_type = ctx->tgt_type; + unsigned depth = ctx->queue_depth; + unsigned nr_queues = ctx->nr_hw_queues; + __u64 features; + const struct ublk_tgt_ops *ops; + struct ublksrv_ctrl_dev_info *info; + struct ublk_dev *dev; + int dev_id = ctx->dev_id; + char ublkb[64]; + int ret; + + ops = ublk_find_tgt(tgt_type); + if (!ops) { + ublk_err("%s: no such tgt type, type %s\n", + __func__, tgt_type); + return -ENODEV; + } + + if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) { + ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n", + __func__, nr_queues, depth); + return -EINVAL; + } + + dev = ublk_ctrl_init(); + if (!dev) { + ublk_err("%s: can't alloc dev id %d, type %s\n", + __func__, dev_id, tgt_type); + return -ENOMEM; + } + + /* kernel doesn't support get_features */ + ret = ublk_ctrl_get_features(dev, &features); + if (ret < 0) + return -EINVAL; + + if (!(features & UBLK_F_CMD_IOCTL_ENCODE)) + return -ENOTSUP; + + info = &dev->dev_info; + info->dev_id = ctx->dev_id; + info->nr_hw_queues =nr_queues; + info->queue_depth = depth; + info->flags = ctx->flags; + dev->tgt.ops = ops; + dev->tgt.sq_depth = depth; + dev->tgt.cq_depth = depth; + dev->bpf_prog_id = ctx->bpf_prog_id; + + ret = ublk_ctrl_add_dev(dev); + if (ret < 0) { + ublk_err("%s: can't add dev id %d, type %s ret %d\n", + __func__, dev_id, tgt_type, ret); + goto fail; + } + + ret = -EINVAL; + switch (fork()) { + case -1: + goto fail; + case 0: + ublk_start_daemon(dev); + ublk_dbg(UBLK_DBG_DEV, "%s: daemon is started in children"); + exit(EXIT_SUCCESS); + } + + /* + * Wait until ublk disk is added, when our daemon is started + * successfully + */ + snprintf(ublkb, sizeof(ublkb), "%s%u", "ublkb", dev->dev_info.dev_id); + ret = wait_ublk_dev(ublkb, IN_CREATE, 3); + if (ret < 0) { + ublk_err("%s: can't start daemon id %d, type %s\n", + __func__, dev_id, tgt_type); + ublk_ctrl_del_dev(dev); + } else { + ctx->dev_id = dev->dev_info.dev_id; + } + ublk_dbg(UBLK_DBG_DEV, "%s: start daemon id %d, type %s\n", + __func__, ctx->dev_id, tgt_type); +fail: + ublk_ctrl_deinit(dev); + return ret; +} + +static int __cmd_dev_del(struct dev_ctx *ctx) +{ + int number = ctx->dev_id; + struct ublk_dev *dev; + int ret; + + dev = ublk_ctrl_init(); + dev->dev_info.dev_id = number; + + ret = ublk_ctrl_get_info(dev); + if (ret < 0) + goto fail; + + ret = ublk_ctrl_stop_dev(dev); + if (ret < 0) + ublk_err("%s: stop dev %d failed ret %d\n", __func__, number, ret); + + ret = ublk_stop_io_daemon(dev); + if (ret < 0) + ublk_err("%s: stop daemon id %d dev %d, ret %d\n", + __func__, dev->dev_info.ublksrv_pid, number, ret); + ublk_ctrl_del_dev(dev); +fail: + if (ret >= 0) + ret = ublk_ctrl_get_info(dev); + ublk_ctrl_deinit(dev); + + return (ret >= 0) ? 0 : ret; +} + +static int cmd_dev_del(struct dev_ctx *ctx) +{ + int i; + + if (ctx->dev_id >= 0 || !ctx->all) + return __cmd_dev_del(ctx); + + for (i = 0; i < 255; i++) { + ctx->dev_id = i; + __cmd_dev_del(ctx); + } + return 0; +} + +static int __cmd_dev_list(struct dev_ctx *ctx) +{ + struct ublk_dev *dev = ublk_ctrl_init(); + int ret; + + if (!dev) + return -ENODEV; + + dev->dev_info.dev_id = ctx->dev_id; + + ret = ublk_ctrl_get_info(dev); + if (ret < 0) { + if (ctx->logging) + ublk_err("%s: can't get dev info from %d: %d\n", + __func__, ctx->dev_id, ret); + } else { + ublk_ctrl_dump(dev, false); + } + + ublk_ctrl_deinit(dev); + + return ret; +} + +static int cmd_dev_list(struct dev_ctx *ctx) +{ + int i; + + if (ctx->dev_id >= 0 || !ctx->all) + return __cmd_dev_list(ctx); + + ctx->logging = false; + for (i = 0; i < 255; i++) { + ctx->dev_id = i; + __cmd_dev_list(ctx); + } + return 0; +} + +static int cmd_dev_unreg_bpf(struct dev_ctx *ctx) +{ + char path[PATH_MAX]; + char cmd[PATH_MAX + 16]; + struct stat st; + + snprintf(path, PATH_MAX, "/sys/fs/bpf/%s/%s", UBLK_BPF_PIN_PATH, ctx->tgt_type); + if (stat(path, &st) != 0) { + ublk_err("bpf prog %s isn't registered on %s\n", ctx->tgt_type, path); + return -ENOENT; + } + + sprintf(cmd, "rm -r %s", path); + if (system(cmd)) { + ublk_err("fail to run %s\n", cmd); + return -ENOENT; + } + + return 0; +} + +static int pathname_concat(char *buf, int buf_sz, const char *path, + const char *name) +{ + int len; + + len = snprintf(buf, buf_sz, "%s/%s", path, name); + if (len < 0) + return -EINVAL; + if (len >= buf_sz) + return -ENAMETOOLONG; + + return 0; +} + +static int pin_map(struct bpf_map *map, const char *pindir, + const char *name) +{ + char pinfile[PATH_MAX]; + int err; + + err = pathname_concat(pinfile, sizeof(pinfile), pindir, name); + if (err) + return -1; + + return bpf_map__pin(map, pinfile); +} + +static int pin_link(struct bpf_link *link, const char *pindir, + const char *name) +{ + char pinfile[PATH_MAX]; + int err; + + err = pathname_concat(pinfile, sizeof(pinfile), pindir, name); + if (err) + return -1; + + return bpf_link__pin(link, pinfile); +} + +static int cmd_dev_reg_bpf(struct dev_ctx *ctx) +{ + LIBBPF_OPTS(bpf_object_open_opts, open_opts); + struct bpf_object *obj; + struct bpf_map *map; + char path[PATH_MAX]; + struct stat st; + + assert(ctx->nr_files == 1); + + snprintf(path, PATH_MAX, "/sys/fs/bpf"); + if (stat(path, &st) != 0) { + ublk_err("bpf fs isn't mounted on %s\n", path); + return -ENOENT; + } + + snprintf(path, PATH_MAX, "/sys/fs/bpf/%s", UBLK_BPF_PIN_PATH); + if (stat(path, &st) != 0) { + if (mkdir(path, 0700) != 0) { + ublk_err("fail to create ublk bpf on %s\n", path); + return -ENOENT; + } + } + + snprintf(path, PATH_MAX, "/sys/fs/bpf/%s/%s", UBLK_BPF_PIN_PATH, ctx->tgt_type); + if (stat(path, &st) == 0) { + ublk_err("fail to pin ublk bpf on %s\n", path); + return -EEXIST; + } + + obj = bpf_object__open_file(ctx->files[0], &open_opts); + if (!obj) + return -1; + + if (bpf_object__load(obj)) { + ublk_err("fail to load bpf obj from %s\n", ctx->files[0]); + bpf_object__close(obj); + return -1; + } + + bpf_object__for_each_map(map, obj) { + struct bpf_link *link; + + if (bpf_map__type(map) != BPF_MAP_TYPE_STRUCT_OPS) { + if (!bpf_map__is_internal(map)) + pin_map(map, path, bpf_map__name(map)); + continue; + } + + link = bpf_map__attach_struct_ops(map); + if (!link) { + ublk_err("can't register struct_ops %s: %s", + bpf_map__name(map), strerror(errno)); + continue; + } + pin_link(link, path, bpf_map__name(map)); + + bpf_link__disconnect(link); + bpf_link__destroy(link); + } + + bpf_object__close(obj); + return 0; +} + +static int cmd_dev_help(char *exe) +{ + printf("%s add -t [null] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [backfile1] [backfile2] ...\n", exe); + printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n"); + printf("%s del [-n dev_id] -a \n", exe); + printf("\t -a delete all devices -n delete specified device\n"); + printf("%s list [-n dev_id] -a \n", exe); + printf("\t -a list all devices, -n list specified device, default -a \n"); + printf("%s reg -t [null] bpf_prog_obj_path \n", exe); + printf("%s unreg -t [null]\n", exe); + return 0; +} + +/****************** part 2: target implementation ********************/ + +static int ublk_null_tgt_init(struct ublk_dev *dev) +{ + const struct ublksrv_ctrl_dev_info *info = &dev->dev_info; + unsigned long dev_size = 250UL << 30; + bool use_bpf = info->flags & UBLK_F_BPF; + + dev->tgt.dev_size = dev_size; + dev->tgt.params = (struct ublk_params) { + .types = UBLK_PARAM_TYPE_BASIC | + (use_bpf ? UBLK_PARAM_TYPE_BPF : 0), + .basic = { + .logical_bs_shift = 9, + .physical_bs_shift = 12, + .io_opt_shift = 12, + .io_min_shift = 9, + .max_sectors = info->max_io_buf_bytes >> 9, + .dev_sectors = dev_size >> 9, + }, + .bpf = { + .flags = UBLK_BPF_HAS_OPS_ID, + .ops_id = dev->bpf_prog_id, + }, + }; + + return 0; +} + +static int ublk_null_queue_io(struct ublk_queue *q, int tag) +{ + const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag); + + /* won't be called for UBLK_F_BPF */ + assert(!(q->dev->dev_info.flags & UBLK_F_BPF)); + + ublk_complete_io(q, tag, iod->nr_sectors << 9); + + return 0; +} + +static const struct ublk_tgt_ops tgt_ops_list[] = { + { + .name = "null", + .init_tgt = ublk_null_tgt_init, + .queue_io = ublk_null_queue_io, + }, +}; + +static const struct ublk_tgt_ops *ublk_find_tgt(const char *name) +{ + const struct ublk_tgt_ops *ops; + int i; + + if (name == NULL) + return NULL; + + for (i = 0; sizeof(tgt_ops_list) / sizeof(*ops); i++) + if (strcmp(tgt_ops_list[i].name, name) == 0) + return &tgt_ops_list[i]; + return NULL; +} + +int main(int argc, char *argv[]) +{ + static const struct option longopts[] = { + { "all", 0, NULL, 'a' }, + { "type", 1, NULL, 't' }, + { "number", 1, NULL, 'n' }, + { "queues", 1, NULL, 'q' }, + { "depth", 1, NULL, 'd' }, + { "debug_mask", 1, NULL, 0 }, + { "quiet", 0, NULL, 0 }, + { "bpf_prog", 1, NULL, 0 }, + { 0, 0, 0, 0 } + }; + int option_idx, opt; + const char *cmd = argv[1]; + struct dev_ctx ctx = { + .queue_depth = 128, + .nr_hw_queues = 2, + .dev_id = -1, + .bpf_prog_id = -1, + }; + int ret = -EINVAL, i; + + if (argc == 1) + return ret; + + optind = 2; + while ((opt = getopt_long(argc, argv, "t:n:d:q:a", + longopts, &option_idx)) != -1) { + switch (opt) { + case 'a': + ctx.all = 1; + break; + case 'n': + ctx.dev_id = strtol(optarg, NULL, 10); + break; + case 't': + strncpy(ctx.tgt_type, optarg, + min(sizeof(ctx.tgt_type), strlen(optarg))); + break; + case 'q': + ctx.nr_hw_queues = strtol(optarg, NULL, 10); + break; + case 'd': + ctx.queue_depth = strtol(optarg, NULL, 10); + break; + case 0: + if (!strcmp(longopts[option_idx].name, "debug_mask")) + ublk_dbg_mask = strtol(optarg, NULL, 16); + if (!strcmp(longopts[option_idx].name, "quiet")) + ublk_dbg_mask = 0; + if (!strcmp(longopts[option_idx].name, "bpf_prog")) { + ctx.bpf_prog_id = strtol(optarg, NULL, 10); + ctx.flags |= UBLK_F_BPF; + } + break; + } + } + + i = optind; + while (i < argc && ctx.nr_files < MAX_BACK_FILES) { + ctx.files[ctx.nr_files++] = argv[i++]; + } + + if (!strcmp(cmd, "add")) + ret = cmd_dev_add(&ctx); + else if (!strcmp(cmd, "del")) + ret = cmd_dev_del(&ctx); + else if (!strcmp(cmd, "list")) { + ctx.all = 1; + ret = cmd_dev_list(&ctx); + } else if (!strcmp(cmd, "reg")) + ret = cmd_dev_reg_bpf(&ctx); + else if (!strcmp(cmd, "unreg")) + ret = cmd_dev_unreg_bpf(&ctx); + else if (!strcmp(cmd, "help")) + ret = cmd_dev_help(argv[0]); + else + cmd_dev_help(argv[0]); + + return ret; +} From patchwork Tue Jan 7 12:04:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928820 Received: from mail-il1-f174.google.com (mail-il1-f174.google.com [209.85.166.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEE5D1EF080; Tue, 7 Jan 2025 12:16:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736252200; cv=none; b=AdMseDs/kzW9TR30ni6gC0Y+uDnEW/tyP2NnJh2cFE4qo9zHBwZWv58iLOnoDMUZKcMNIwJ4Zs1mCLvhbRqzSuasJYykMYiiso7WRw39vl3V9z4nJ+iwmGlpEMiZDowsml2Sky6+hwShzv7vcIdWn/YCM5UUPoInLIqN5h88Mvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736252200; c=relaxed/simple; bh=QixZzGDTLfx9ml72UvoLU0BKmAyEobQBCGi582kxI30=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jy09OBYhWDl1O1vWJAVy1eUelWQSQlSngA2SujthQFuv4NpIfgxBYajebzQs468I3zeD4mAVcstQk+BKX4f9W0FYbaVtziugwANK/sbP6ZI+IfGKHdEBC8vWUVJ2E57/HBPqtVFyQou3qMAx+trhWai5pd6zGzORh9OMb10hgBc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PAEXGE0L; arc=none smtp.client-ip=209.85.166.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PAEXGE0L" Received: by mail-il1-f174.google.com with SMTP id e9e14a558f8ab-3c6b0be237cso118226435ab.2; Tue, 07 Jan 2025 04:16:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736252194; x=1736856994; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wzIJME8RLUAnwvXSqDJckfgZjEuohf3fdOUuNjSDyBk=; b=PAEXGE0LvcfBKfG+PIBl6CioK+v+hZhe5EAlXdkvGulRsZS3mBS8gTRNnn6gYMgiDV NwycoGEeXGntdEzmMlvStGVef4RIb/y0KZzFtI1hFvB42andzp43RVevrs/XYvUR2ox5 jloCj4LL2S0txsGupfosjaCw/TP1XP9Lg3zLoA11VyXdStU5W6uiEv5BFb6ciIOQCY1h gxhH+dR/FgdGfUOEAMtPvuKubD2KUqrjbGbY0cm4TWue5v85nMQTrWBbBJxPDmjD8TmU keUwVYQ8nL46YyufR43i8uafzUWp+A5gRgl0vVAKNHmWxy0qa3dDVTZ9QqOrNwU0lWIM 2WAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736252194; x=1736856994; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wzIJME8RLUAnwvXSqDJckfgZjEuohf3fdOUuNjSDyBk=; b=gzK/c1uFOaga1PwTnDRXdewWnI1+npFSXw36ig6OyiTZvbd6CdcIeabx6g1Kcls7kP Jdc2u/nqnQUKmCjOhy6fEX+/6e1je4X1qjFsPIf0WQp85ysrvz6etWDqHW1wkmVxaOzH +th/6ZlDWmorbXN+nwcorqHlIhkgskCD410cdah+DFSjnP3xDVX9T3eEfG8MXtmTBRH7 POX2w1+UpB8ilhUVfER6b5xMk+u/qkTrujbjdnPv9cFWtu+nBmvESIu3U+TRPmifpVrC 0glzA53c/GxavUt0AU7SzTOCLjVp15mDrOMTIT2qMk06rcp+xefTHvjy+i+9aLHLteRd 9SPA== X-Forwarded-Encrypted: i=1; AJvYcCUtKBbUUex9WgSo0CICDacSQOcZweaswDsDiinbwAPdKnguQckYzSLcj1mzoz7mu33JlEky2hImW9vSUw==@vger.kernel.org X-Gm-Message-State: AOJu0YxdOeUTkbLkqBndkfJbuweF95tmBYyuXGPfeBlBViYVrH0voftC AHZRddJAkvInbfsCtlS1YfT96MIpBu/Fr0pxCRNmBVWzXb2jIeHTOdnKGQwTCqI= X-Gm-Gg: ASbGncuf3XunpMBPK7z2xT02142vsOaqrdGszNlUbchnnN4uMbkj1IShviPn5xPlj4K CBw54Oj8GV4tAzDZQs8pezXvczvkG2loKuO7dAUFWPM1XCU/BjtXloUtMrywV7GCyzKl7RMSEMf Y2ye5jHSPIp+B8VXwN6ZrpuB2YG51wsMNEOacOKsJD7WRePX695cvs4Rvh4gO7Drt3yi6ugs9uw GtTZlj9mIH6EjwxV1P0zKtHEJyNj1bh5e8bZH/Qr51RuP+7ZgDpk17MLWFUoglW6R3R X-Google-Smtp-Source: AGHT+IGd07cvqYw8IK2i5W4gQPn3YsvELvPQQoh3dJtZGTdHmPILf3nE0s5d6hpdPfTwEvp1E126sQ== X-Received: by 2002:a05:6a00:6387:b0:72b:5a7a:e5a6 with SMTP id d2e1a72fcca58-72b5a7ae7edmr25402038b3a.26.1736251715917; Tue, 07 Jan 2025 04:08:35 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:35 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Date: Tue, 7 Jan 2025 20:04:04 +0800 Message-ID: <20250107120417.1237392-14-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC One io command can be queued in split way, add test case for covering this way: - split the io command into two sub-io if the io size is bigger than 512 - the 1st sub-io size is 512byte, and the 2nd sub-io is the remained bytes Complete the whole io command until the two sub-io are queued. Signed-off-by: Ming Lei --- tools/testing/selftests/ublk/Makefile | 1 + .../testing/selftests/ublk/progs/ublk_null.c | 46 +++++++++++++++++++ tools/testing/selftests/ublk/test_null_03.sh | 21 +++++++++ 3 files changed, 68 insertions(+) create mode 100755 tools/testing/selftests/ublk/test_null_03.sh diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile index a95f317211e7..5a940bae9cbb 100644 --- a/tools/testing/selftests/ublk/Makefile +++ b/tools/testing/selftests/ublk/Makefile @@ -21,6 +21,7 @@ endif TEST_PROGS := test_null_01.sh TEST_PROGS += test_null_02.sh +TEST_PROGS += test_null_03.sh # Order correspond to 'make run_tests' order TEST_GEN_PROGS_EXTENDED = ublk_bpf diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c index 3225b52dcd24..523bf8ff3ef8 100644 --- a/tools/testing/selftests/ublk/progs/ublk_null.c +++ b/tools/testing/selftests/ublk/progs/ublk_null.c @@ -11,6 +11,40 @@ /* libbpf v1.4.5 is required for struct_ops to work */ +static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bpf_io *io, unsigned int _off) +{ + unsigned long off = -1, sects = -1; + const struct ublksrv_io_desc *iod; + int res; + + iod = ublk_bpf_get_iod(io); + if (iod) { + res = iod->nr_sectors << 9; + off = iod->start_sector; + sects = iod->nr_sectors; + } else + res = -EINVAL; + + BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d", + ublk_bpf_get_dev_id(io), + ublk_bpf_get_queue_id(io), + ublk_bpf_get_io_tag(io), + off, sects, res); + if (res < 0) { + ublk_bpf_complete_io(io, res); + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + } + + /* split this io to one 512bytes sub-io and the remainder */ + if (_off < 512 && res > 512) + return ublk_bpf_return_val(UBLK_BPF_IO_CONTINUE, 512); + + /* complete the whole io command after the 2nd sub-io is queued */ + ublk_bpf_complete_io(io, res); + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); +} + + static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off) { unsigned long off = -1, sects = -1; @@ -60,4 +94,16 @@ struct ublk_bpf_ops null_ublk_bpf_ops = { .detach_dev = (void *)ublk_null_detach_dev, }; +SEC("struct_ops/ublk_bpf_queue_io_cmd") +ublk_bpf_return_t BPF_PROG(ublk_null_handle_io_split, struct ublk_bpf_io *io, unsigned int off) +{ + return __ublk_null_handle_io_split(io, off); +} + +SEC(".struct_ops.link") +struct ublk_bpf_ops null_ublk_bpf_ops_split = { + .id = 1, + .queue_io_cmd = (void *)ublk_null_handle_io_split, +}; + char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/ublk/test_null_03.sh b/tools/testing/selftests/ublk/test_null_03.sh new file mode 100755 index 000000000000..c0b3a4d941c9 --- /dev/null +++ b/tools/testing/selftests/ublk/test_null_03.sh @@ -0,0 +1,21 @@ +#!/bin/bash + +. test_common.sh + +TID="null_03" +ERR_CODE=0 + +# prepare and register & pin bpf prog +_prep_bpf_test "null" ublk_null.bpf.o + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t null -n 0 --bpf_prog 1 --quiet + +# run fio over the ublk disk +fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1 +ERR_CODE=$? + +# clean and unregister & unpin the bpf prog +_cleanup_bpf_test "null" + +_show_result $TID $ERR_CODE From patchwork Tue Jan 7 12:04:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928821 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 300381EC01E; Tue, 7 Jan 2025 12:16:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736252200; cv=none; b=EJDAfXsPotgWtYGVIrn5Y+CpPH54hPT2d+vLMi35SI4aX0XOPzIN8kc01W1TQa4pAMHn88kFmlDhx95zqIR3nfYsn/fRMduKzufEYUmU3KgcjsmlF0pMk8HzdFsU/HAKWl5GmPQWQOhWg1PDGfMzq8yy8eoNTJ5trYQHiwcgrUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736252200; c=relaxed/simple; bh=2qD7JRXYOvlfkmYR5r6JnWWjk3xDVrsZmCrn+a3Pj5U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n7jHzqwNmOubXF2rLV0Fv4gj0buerxLIEvUj11wXVLMBzzEdouVcLNV1AOr6UNUz41KYYWL0rCxBrHtlB2oq20BSFCwLwOLKsKJeuFdXd+b3OTJp2e1GjemmAv7fQHfaFMJpHO1ayZyO8Qb8jDT6J6iHsTy8grbutK0++QH4/ZA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Vlltfxr+; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Vlltfxr+" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-21644aca3a0so47107155ad.3; Tue, 07 Jan 2025 04:16:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736252196; x=1736856996; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l1IzBXLoCjFub0ZAHnwNc3v4ROMZgYPRLWuwGG7QubQ=; b=Vlltfxr+7UJ+7f+szEvT9OLKwDYbKLtevr6Z6OCcJmUJ3FBkYfTrcWtot+EIXWlW3p aIS3iE/csKOQwSGUyterhBS62iF4rJQVwLDifO6z+KtKO7hmy/9w5BH85s8ya19aRfwD sOjHTyxPUW14s/W6lQgWmz+uTF2hYf6K6aSsAGBddrIXxL3cBHpm9vEX3jNFVscEPG1g WroSTNc5JRj6u/FWd2yNDXCN4WUeNIvuGyfFvJZSNRstG3DqoAno0E+zTSe7tYELZ6mx w/txUgpXkRaDKcS2JFTbdtGFnffSDXG8T8tLuySiomk04OigPPL+pUA09G0z1q9mmk1K px+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736252196; x=1736856996; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l1IzBXLoCjFub0ZAHnwNc3v4ROMZgYPRLWuwGG7QubQ=; b=iQpaPIUjb5aRqqQH5R1q/X7uBORgG6XmIGh0eVuFqm+hGznTHsqPoZjtVwRjGmvprw G31ku4dPSqongLmf07cj29BBCv2oSEKjM85mwjb91aQuuDXPJy7N5hVYr9VSyOiyW92i DD0qUyTSADEbfR4d1Y4nWjIV6tW9ZWaWM1yWmXHQU1sGjn+WaisqsegYlppI9ciYoiSg m5aA1XKzbrxbxpJ4ruE0YhRqe+j1634ANrOIqKMnA7UhlieLEwiRwbzyfODswm5nQh8p +k8pP98FTjIKZZ7zld5B20Buowsc8t2ILrwp0/KtQVDYxHgp6ZKIhn6Vmx5oXcdDEZ4U vr4w== X-Forwarded-Encrypted: i=1; AJvYcCWpiR9jY9eZ2pR2SvpAXQE3ES+pw8CjcFOSvG+h93sNr88z+NPY67VOX+bGTB+cW8Di+yocNMUnsoS7og==@vger.kernel.org X-Gm-Message-State: AOJu0YxYf6ChctPUJMagIaqZZE5/aTnK0l0HtPakVLmodVBkcmU5hCKz 7ZOHa2R6Z+mgHWp4RXQP7HZG2b8pstyExS2rHqItxAEmo5J7W48XHq8tmBtuflc= X-Gm-Gg: ASbGnctPWcifGtl2ImzcH3ZJR9ws/CCx/YNmgerm35PxkpjvenO/JhiKmC/gS+OVhVM rMcgo8XOFn0hmKXwqAFdmM74DcJSs1jBgzL0ikhuCshwgysOmU5dD19b3darj4cbnMebXiL+Mv0 Y7ZHSRvmPJ2KSZ9NBc3aUH4NoOEQzUFHMPq8g0yMOIOUmd85qr0RNRJLzPWJSYvNy8Ve6SMgzgU mv0/Rp0y0Goo+xcOnhD0xOw0S7+FFx8Z0cq2V8Rr/z0bL2upIrjXPol8OtWM+J0sisg X-Google-Smtp-Source: AGHT+IGTsINaaLmInB+0gPDXxGddb3lHVZPN4pP3+Y/7JAdNdqaFpnGgCOZJ0whUpgZ3dwwQkfLLEQ== X-Received: by 2002:a05:6a20:8427:b0:1e0:ca95:3cb2 with SMTP id adf61e73a8af0-1e5e0458eb9mr109041326637.8.1736251718952; Tue, 07 Jan 2025 04:08:38 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:38 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Date: Tue, 7 Jan 2025 20:04:05 +0800 Message-ID: <20250107120417.1237392-15-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Reuse ublk-null for testing UBLK_BPF_IO_REDIRECT: - queue & complete io with odd tag number - redirect io with even tag number, and let userspace handle their queueing & completion - also select some ios, and returns -EAGAIN from userspace & marking it as ready for bpf prog to handle, then finally completed with bpf prog in 2nd time So we can cover code path for UBLK_BPF_IO_REDIRECT. Signed-off-by: Ming Lei --- tools/testing/selftests/ublk/Makefile | 1 + .../selftests/ublk/progs/ublk_bpf_kfunc.h | 10 +++ .../testing/selftests/ublk/progs/ublk_null.c | 68 +++++++++++++++++++ tools/testing/selftests/ublk/test_null_04.sh | 21 ++++++ tools/testing/selftests/ublk/ublk_bpf.c | 39 ++++++++++- 5 files changed, 136 insertions(+), 3 deletions(-) create mode 100755 tools/testing/selftests/ublk/test_null_04.sh diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile index 5a940bae9cbb..38903f05d99d 100644 --- a/tools/testing/selftests/ublk/Makefile +++ b/tools/testing/selftests/ublk/Makefile @@ -22,6 +22,7 @@ endif TEST_PROGS := test_null_01.sh TEST_PROGS += test_null_02.sh TEST_PROGS += test_null_03.sh +TEST_PROGS += test_null_04.sh # Order correspond to 'make run_tests' order TEST_GEN_PROGS_EXTENDED = ublk_bpf diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h index acab490d933c..1db8870b57d6 100644 --- a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h +++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h @@ -20,4 +20,14 @@ extern void ublk_bpf_complete_io(const struct ublk_bpf_io *io, int res) __ksym; extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym; extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym; extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym; + +static inline unsigned long long build_io_key(const struct ublk_bpf_io *io) +{ + unsigned long long dev_id = (unsigned short)ublk_bpf_get_dev_id(io); + unsigned long long q_id = (unsigned short)ublk_bpf_get_queue_id(io); + unsigned long long tag = ublk_bpf_get_io_tag(io); + + return (dev_id << 32) | (q_id << 16) | tag; +} + #endif diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c index 523bf8ff3ef8..cebdc8a2a214 100644 --- a/tools/testing/selftests/ublk/progs/ublk_null.c +++ b/tools/testing/selftests/ublk/progs/ublk_null.c @@ -9,6 +9,14 @@ //#define DEBUG #include "ublk_bpf.h" +/* todo: make it writable payload of ublk_bpf_io */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, unsigned long long); /* dev_id + q_id + tag */ + __type(value, int); +} io_map SEC(".maps"); + /* libbpf v1.4.5 is required for struct_ops to work */ static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bpf_io *io, unsigned int _off) @@ -44,6 +52,54 @@ static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bp return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); } +static inline ublk_bpf_return_t __ublk_null_handle_io_redirect(const struct ublk_bpf_io *io, unsigned int _off) +{ + unsigned int tag = ublk_bpf_get_io_tag(io); + unsigned long off = -1, sects = -1; + const struct ublksrv_io_desc *iod; + int res; + + iod = ublk_bpf_get_iod(io); + if (iod) { + res = iod->nr_sectors << 9; + off = iod->start_sector; + sects = iod->nr_sectors; + } else + res = -EINVAL; + + BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d", + ublk_bpf_get_dev_id(io), + ublk_bpf_get_queue_id(io), + ublk_bpf_get_io_tag(io), + off, sects, res); + if (res < 0) { + ublk_bpf_complete_io(io, res); + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + } + + if (tag & 0x1) { + /* complete the whole io command after the 2nd sub-io is queued */ + ublk_bpf_complete_io(io, res); + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + } else { + unsigned long long key = build_io_key(io); + int *pv; + + /* stored value means if it is ready to complete IO */ + pv = bpf_map_lookup_elem(&io_map, &key); + if (pv && *pv) { + ublk_bpf_complete_io(io, res); + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + } else { + int v = 0; + res = bpf_map_update_elem(&io_map, &key, &v, BPF_ANY); + if (res) + bpf_printk("update io map element failed %d key %llx\n", res, key); + return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0); + } + } +} + static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off) { @@ -106,4 +162,16 @@ struct ublk_bpf_ops null_ublk_bpf_ops_split = { .queue_io_cmd = (void *)ublk_null_handle_io_split, }; +SEC("struct_ops/ublk_bpf_queue_io_cmd") +ublk_bpf_return_t BPF_PROG(ublk_null_handle_io_redirect, struct ublk_bpf_io *io, unsigned int off) +{ + return __ublk_null_handle_io_redirect(io, off); +} + +SEC(".struct_ops.link") +struct ublk_bpf_ops null_ublk_bpf_ops_redirect = { + .id = 2, + .queue_io_cmd = (void *)ublk_null_handle_io_redirect, +}; + char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/ublk/test_null_04.sh b/tools/testing/selftests/ublk/test_null_04.sh new file mode 100755 index 000000000000..f175e2ddb5cd --- /dev/null +++ b/tools/testing/selftests/ublk/test_null_04.sh @@ -0,0 +1,21 @@ +#!/bin/bash + +. test_common.sh + +TID="null_04" +ERR_CODE=0 + +# prepare and register & pin bpf prog +_prep_bpf_test "null" ublk_null.bpf.o + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t null -n 0 --bpf_prog 2 --quiet + +# run fio over the ublk disk +fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1 +ERR_CODE=$? + +# clean and unregister & unpin the bpf prog +_cleanup_bpf_test "null" + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c index 2d923e42845d..e2c2e92268e1 100644 --- a/tools/testing/selftests/ublk/ublk_bpf.c +++ b/tools/testing/selftests/ublk/ublk_bpf.c @@ -1283,6 +1283,16 @@ static int cmd_dev_help(char *exe) } /****************** part 2: target implementation ********************/ +//extern int bpf_map_update_elem(int fd, const void *key, const void *value, +// __u64 flags); + +static inline unsigned long long build_io_key(struct ublk_queue *q, int tag) +{ + unsigned long long dev_id = (unsigned short)q->dev->dev_info.dev_id; + unsigned long long q_id = (unsigned short)q->q_id; + + return (dev_id << 32) | (q_id << 16) | tag; +} static int ublk_null_tgt_init(struct ublk_dev *dev) { @@ -1314,12 +1324,35 @@ static int ublk_null_tgt_init(struct ublk_dev *dev) static int ublk_null_queue_io(struct ublk_queue *q, int tag) { const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag); + bool bpf = q->dev->dev_info.flags & UBLK_F_BPF; - /* won't be called for UBLK_F_BPF */ - assert(!(q->dev->dev_info.flags & UBLK_F_BPF)); + /* either !UBLK_F_BPF or UBLK_F_BPF with redirect */ + assert(!bpf || (bpf && !(tag & 0x1))); - ublk_complete_io(q, tag, iod->nr_sectors << 9); + if (bpf && (tag % 4)) { + unsigned long long key = build_io_key(q, tag); + int map_fd; + int err; + int val = 1; + + map_fd = bpf_obj_get("/sys/fs/bpf/ublk/null/io_map"); + if (map_fd < 0) { + ublk_err("Error finding BPF map fd from pinned path\n"); + goto exit; + } + + /* make this io ready for bpf prog to handle */ + err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY); + if (err) { + ublk_err("Error updating map element: %d\n", errno); + goto exit; + } + ublk_complete_io(q, tag, -EAGAIN); + return 0; + } +exit: + ublk_complete_io(q, tag, iod->nr_sectors << 9); return 0; } From patchwork Tue Jan 7 12:04:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928807 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A96611F0E38; Tue, 7 Jan 2025 12:08:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251782; cv=none; b=oXnkkWaO0EATK7vjhNIfxpUCPGxmtE2ccIMy/DQ4QZ9eAuQ4QhcEXkbWugc8kgWckmavsUdSqxJ6TCHwsy3rAA8hh44L3La/8dHmcCi6cNgiPFPpDb+5x3ddwmuJTe4TUhJkRlxCIM9e3PZ+brfr4g4EFoi102R3DOIVlvvVzOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251782; c=relaxed/simple; bh=RA3agkgGPoh3lll3WbU1VW0805xDHkJnUwLXxAsotgw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cwlVUz11WhlzFjXl8sEKVs6PweujnKKs7tgU5vAG8KOWyzb9MweJFM1DHf5eko8GJg2gJK8PlxiD1S+BaQ/Rd5c3uFBW3mCXSSr6YQm9Extpd1O04YmMwBRNeUTqHZ3vgMhRg6cKlK8ora6h7/8H+kySrj1DQLamuHzx2v2K0Ng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=P2hsUFGZ; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P2hsUFGZ" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21669fd5c7cso228823805ad.3; Tue, 07 Jan 2025 04:08:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251722; x=1736856522; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3G38mp2BMfe/0ANq/zJ1yKW8K0dgj9kh1DASCV+yEbE=; b=P2hsUFGZSNz3dAciWLz9svBy6DYas8hGLVQ6HcKM5PujCZaGu84v6uk78opVv1UPoo rpgcGRcRQQtQ4qyzBgx90+UUO5ET4rD146nf/OGBslKpdOrG+v8ZX/4syagKOkQxjwQM 2oKNlJTaVleFJt26djlqNu9CPQO1J/GM3ZVVsFhiEzacCEdmyMesuDN44JlxqSuFRjI5 FvrHtoaKcbwvN1m1pwyT/YKDYzSc4VncGouGRs0oK18BmL96nO3tLqseYvyD8TlE3koM Sz/1lT9w3U6SW3lF5rBU2orUXwKt+ZfuenKrsMnDtnbqPwErbmbXmYCgUE/kuU7sYhOt 27VQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251722; x=1736856522; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3G38mp2BMfe/0ANq/zJ1yKW8K0dgj9kh1DASCV+yEbE=; b=OTkobXi1LAc/psstd+CphLowhxhSqATFr8AG3vHVtz0RPgg0oKnT+DjJX9VrnAflSy 0/DE/J8yHXHqoypzOBDS4grdnB2riNHr1SueHm9Z2OHYeWPSHfzFiqBoSS9vsIjUHNzF JHKo0yN7kk6/z047W7PhZ5dNBWXrnbfyPerDR0xyRtQb5QwiETO7BXUm7X2YEa5AqBNs FeI+JG7OmDrghr5na+jrUYCUVcAQSdV47sEFnBDntM6ZaUx2sBVwPX+/DZ1PdcoE7QSr X58zGmBatUCkSNvigxv6QlHGhDpF1VV8yzRmFakTA/CTlUgfYod6ZaO3+wdG4JuFDTeo MRgg== X-Forwarded-Encrypted: i=1; AJvYcCU2SgWam3Cj1hIrgeml4gjFXzYCuW6vbI/dkh+9yxSulQ6W1eZWaM3xLJd/ccBXB0++FCorc9TCsvs5/A==@vger.kernel.org X-Gm-Message-State: AOJu0YwvKIMWhaASwXN0n0AMJQd4B06MFcwD+i41SlAKfhogDWVWf6K/ Q8LaV/nLb+7e396zqnRx3CHxqGg9/jd2ZBWr8HakJwEbELyfcsIv X-Gm-Gg: ASbGncuXZ0+XCs7OFl/XSBBgIvk1Es9CIYKy2zaQbZUduHSacP5qYYlXA/MGyA44cBc PP7VkWbEleVkaSgG921xJ0FsUwP9Edm7hInw0hkQLMLRD9aCr15IqXgZUAueALXZumanJ3ZhzMl c/pN95Q/F+XL3dC2yWKGu2U7oyYBWpGMd9pStyoZBz+KCqG+f0twmI8YfE6ae5J5zvmRZYk2s1t r+u2KhkodnyNSnGJbALyKx9TQfY7DePrTj4CEL5WJB0/WO1FhUy0Ziw2PYsSeSqMnWf X-Google-Smtp-Source: AGHT+IH+93tib9I3CsDT9s05atKapyruR2G4c7itWJkFbpoz+llX05XKNRiFeZJKnIGE8qw8pVE66g== X-Received: by 2002:a05:6a21:3985:b0:1e1:9662:a6f2 with SMTP id adf61e73a8af0-1e5e07f973dmr108038894637.35.1736251722083; Tue, 07 Jan 2025 04:08:42 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:41 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Date: Tue, 7 Jan 2025 20:04:06 +0800 Message-ID: <20250107120417.1237392-16-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Define bpf aio kfunc for bpf prog to submit AIO, so far it begins with filesystem IO only, and in the future, it may be extended for network IO. Only bvec buffer is covered for doing FS IO over this buffer, but it is easy to cover UBUF because we have the great iov iter. With bpf aio, not only user-kernel context switch is avoided, but also user-kernel buffer copy is saved. It is very similar with loop's direct IO implementation. These kfuncs can be used for other subsystems, and should have belong to lib/, but let's start from ublk first. When it becomes mature or gets more use cases, it can be moved to /lib. Define bpf struct_ops of bpf_aio_complete_ops which needs to be implemented by the caller for completing bpf aio via bpf prog, which will be done in the following patches. Signed-off-by: Ming Lei --- drivers/block/ublk/Makefile | 2 +- drivers/block/ublk/bpf.c | 40 +++++- drivers/block/ublk/bpf.h | 1 + drivers/block/ublk/bpf_aio.c | 251 +++++++++++++++++++++++++++++++++++ drivers/block/ublk/bpf_aio.h | 66 +++++++++ 5 files changed, 358 insertions(+), 2 deletions(-) create mode 100644 drivers/block/ublk/bpf_aio.c create mode 100644 drivers/block/ublk/bpf_aio.h diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile index f843a9005cdb..7094607c040d 100644 --- a/drivers/block/ublk/Makefile +++ b/drivers/block/ublk/Makefile @@ -5,6 +5,6 @@ ccflags-y += -I$(src) ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o ifeq ($(CONFIG_UBLK_BPF), y) -ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o +ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o endif obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c index ef1546a7ccda..d5880d61abe5 100644 --- a/drivers/block/ublk/bpf.c +++ b/drivers/block/ublk/bpf.c @@ -155,8 +155,23 @@ BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL) BTF_ID_FLAGS(func, ublk_bpf_get_io_tag, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, ublk_bpf_get_queue_id, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, ublk_bpf_get_dev_id, KF_TRUSTED_ARGS) + +/* bpf aio kfunc */ +BTF_ID_FLAGS(func, bpf_aio_alloc, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_aio_alloc_sleepable, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_aio_release) +BTF_ID_FLAGS(func, bpf_aio_submit) BTF_KFUNCS_END(ublk_bpf_kfunc_ids) +__bpf_kfunc void bpf_aio_release_dtor(void *aio) +{ + bpf_aio_release(aio); +} +CFI_NOSEAL(bpf_aio_release_dtor); +BTF_ID_LIST(bpf_aio_dtor_ids) +BTF_ID(struct, bpf_aio) +BTF_ID(func, bpf_aio_release_dtor) + static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = { .owner = THIS_MODULE, .set = &ublk_bpf_kfunc_ids, @@ -164,6 +179,12 @@ static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = { int __init ublk_bpf_init(void) { + const struct btf_id_dtor_kfunc aio_dtors[] = { + { + .btf_id = bpf_aio_dtor_ids[0], + .kfunc_btf_id = bpf_aio_dtor_ids[1] + }, + }; int err; err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, @@ -172,5 +193,22 @@ int __init ublk_bpf_init(void) pr_warn("error while setting UBLK BPF tracing kfuncs: %d", err); return err; } - return ublk_bpf_struct_ops_init(); + + err = ublk_bpf_struct_ops_init(); + if (err) { + pr_warn("error while initializing ublk bpf struct_ops: %d", err); + return err; + } + + err = register_btf_id_dtor_kfuncs(aio_dtors, ARRAY_SIZE(aio_dtors), + THIS_MODULE); + if (err) { + pr_warn("error while registering aio destructor: %d", err); + return err; + } + + err = bpf_aio_init(); + if (err) + pr_warn("error while initializing bpf aio kfunc: %d", err); + return err; } diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h index 4e178cbecb74..0ab25743ae7d 100644 --- a/drivers/block/ublk/bpf.h +++ b/drivers/block/ublk/bpf.h @@ -3,6 +3,7 @@ #define UBLK_INT_BPF_HEADER #include "bpf_reg.h" +#include "bpf_aio.h" typedef unsigned long ublk_bpf_return_t; typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int); diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c new file mode 100644 index 000000000000..65013fe8054f --- /dev/null +++ b/drivers/block/ublk/bpf_aio.c @@ -0,0 +1,251 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Red Hat */ + +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_aio.h" + +static int __bpf_aio_submit(struct bpf_aio *aio); + +static struct kmem_cache *bpf_aio_cachep; +static struct kmem_cache *bpf_aio_work_cachep; +static struct workqueue_struct *bpf_aio_wq; + +static inline bool bpf_aio_is_rw(int op) +{ + return op == BPF_AIO_OP_FS_READ || op == BPF_AIO_OP_FS_WRITE; +} + +/* check if it is short read */ +static bool bpf_aio_is_short_read(const struct bpf_aio *aio, long ret) +{ + return ret >= 0 && ret < aio->bytes && + bpf_aio_get_op(aio) == BPF_AIO_OP_FS_READ; +} + +/* zeroing the remained bytes starting from `off` to end */ +static void bpf_aio_zero_remained(const struct bpf_aio *aio, long off) +{ + struct iov_iter iter; + + iov_iter_bvec(&iter, ITER_DEST, aio->buf.bvec, aio->buf.nr_bvec, aio->bytes); + iter.iov_offset = aio->buf.bvec_off; + + iov_iter_advance(&iter, off); + iov_iter_zero(aio->bytes - off, &iter); +} + +static void bpf_aio_do_completion(struct bpf_aio *aio) +{ + if (aio->iocb.ki_filp) + fput(aio->iocb.ki_filp); + if (aio->work) + kmem_cache_free(bpf_aio_work_cachep, aio->work); +} + +/* ->ki_complete callback */ +static void bpf_aio_complete(struct kiocb *iocb, long ret) +{ + struct bpf_aio *aio = container_of(iocb, struct bpf_aio, iocb); + + if (unlikely(ret == -EAGAIN)) { + aio->opf |= BPF_AIO_FORCE_WQ; + ret = __bpf_aio_submit(aio); + if (!ret) + return; + } + + /* zero the remained bytes in case of short read */ + if (bpf_aio_is_short_read(aio, ret)) + bpf_aio_zero_remained(aio, ret); + + bpf_aio_do_completion(aio); + aio->ops->bpf_aio_complete_cb(aio, ret); +} + +static void bpf_aio_prep_rw(struct bpf_aio *aio, unsigned int rw, + struct iov_iter *iter) +{ + iov_iter_bvec(iter, rw, aio->buf.bvec, aio->buf.nr_bvec, aio->bytes); + iter->iov_offset = aio->buf.bvec_off; + + if (unlikely(aio->opf & BPF_AIO_FORCE_WQ)) { + aio->iocb.ki_flags &= ~IOCB_NOWAIT; + aio->iocb.ki_complete = NULL; + } else { + aio->iocb.ki_flags |= IOCB_NOWAIT; + aio->iocb.ki_complete = bpf_aio_complete; + } +} + +static int bpf_aio_do_submit(struct bpf_aio *aio) +{ + int op = bpf_aio_get_op(aio); + struct iov_iter iter; + struct file *file = aio->iocb.ki_filp; + int ret; + + switch (op) { + case BPF_AIO_OP_FS_READ: + bpf_aio_prep_rw(aio, ITER_DEST, &iter); + if (file->f_op->read_iter) + ret = file->f_op->read_iter(&aio->iocb, &iter); + else + ret = -EOPNOTSUPP; + break; + case BPF_AIO_OP_FS_WRITE: + bpf_aio_prep_rw(aio, ITER_SOURCE, &iter); + if (file->f_op->write_iter) + ret = file->f_op->write_iter(&aio->iocb, &iter); + else + ret = -EOPNOTSUPP; + break; + case BPF_AIO_OP_FS_FSYNC: + ret = vfs_fsync_range(aio->iocb.ki_filp, aio->iocb.ki_pos, + aio->iocb.ki_pos + aio->bytes - 1, 0); + if (unlikely(ret && ret != -EINVAL)) + ret = -EIO; + break; + case BPF_AIO_OP_FS_FALLOCATE: + ret = vfs_fallocate(aio->iocb.ki_filp, aio->iocb.ki_flags, + aio->iocb.ki_pos, aio->bytes); + break; + default: + ret = -EINVAL; + } + + if (ret == -EIOCBQUEUED) { + ret = 0; + } else if (ret != -EAGAIN) { + bpf_aio_complete(&aio->iocb, ret); + ret = 0; + } + + return ret; +} + +static void bpf_aio_submit_work(struct work_struct *work) +{ + struct bpf_aio_work *aio_work = container_of(work, struct bpf_aio_work, work); + + bpf_aio_do_submit(aio_work->aio); +} + +static int __bpf_aio_submit(struct bpf_aio *aio) +{ + struct work_struct *work; + +do_submit: + if (likely(!(aio->opf & BPF_AIO_FORCE_WQ))) { + int ret = bpf_aio_do_submit(aio); + + /* retry via workqueue in case of -EAGAIN */ + if (ret != -EAGAIN) + return ret; + aio->opf |= BPF_AIO_FORCE_WQ; + } + + if (!aio->work) { + bool in_irq = in_interrupt(); + gfp_t gfpflags = in_irq ? GFP_ATOMIC : GFP_NOIO; + + aio->work = kmem_cache_alloc(bpf_aio_work_cachep, gfpflags); + if (unlikely(!aio->work)) { + if (in_irq) + return -ENOMEM; + aio->opf &= ~BPF_AIO_FORCE_WQ; + goto do_submit; + } + } + + aio->work->aio = aio; + work = &aio->work->work; + INIT_WORK(work, bpf_aio_submit_work); + queue_work(bpf_aio_wq, work); + + return 0; +} + +static struct bpf_aio *__bpf_aio_alloc(gfp_t gfpflags, unsigned op, + enum bpf_aio_flag aio_flags) +{ + struct bpf_aio *aio; + + if (op >= BPF_AIO_OP_LAST) + return NULL; + + if (aio_flags & BPF_AIO_OP_MASK) + return NULL; + + aio = kmem_cache_alloc(bpf_aio_cachep, gfpflags); + if (!aio) + return NULL; + + memset(aio, 0, sizeof(*aio)); + aio->opf = op | (unsigned int)aio_flags; + return aio; +} + +__bpf_kfunc struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags) +{ + return __bpf_aio_alloc(GFP_ATOMIC, op, aio_flags); +} + +__bpf_kfunc struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags) +{ + return __bpf_aio_alloc(GFP_NOIO, op, aio_flags); +} + +__bpf_kfunc void bpf_aio_release(struct bpf_aio *aio) +{ + kmem_cache_free(bpf_aio_cachep, aio); +} + +/* Submit AIO from bpf prog */ +__bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, + unsigned bytes, unsigned io_flags) +{ + struct file *file; + + if (!aio->ops) + return -EINVAL; + + file = fget(fd); + if (!file) + return -EINVAL; + + /* we could be called from io completion handler */ + if (in_interrupt()) + aio->opf |= BPF_AIO_FORCE_WQ; + + aio->iocb.ki_pos = pos; + aio->iocb.ki_filp = file; + aio->iocb.ki_flags = io_flags; + aio->bytes = bytes; + if (bpf_aio_is_rw(bpf_aio_get_op(aio))) { + if (file->f_flags & O_DIRECT) + aio->iocb.ki_flags |= IOCB_DIRECT; + else + aio->opf |= BPF_AIO_FORCE_WQ; + aio->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0); + } else { + aio->opf |= BPF_AIO_FORCE_WQ; + } + + return __bpf_aio_submit(aio); +} + +int __init bpf_aio_init(void) +{ + bpf_aio_cachep = KMEM_CACHE(bpf_aio, SLAB_PANIC); + bpf_aio_work_cachep = KMEM_CACHE(bpf_aio_work, SLAB_PANIC); + bpf_aio_wq = alloc_workqueue("bpf_aio", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + + return 0; +} diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h new file mode 100644 index 000000000000..625737965c90 --- /dev/null +++ b/drivers/block/ublk/bpf_aio.h @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Copyright (c) 2024 Red Hat */ +#ifndef UBLK_BPF_AIO_HEADER +#define UBLK_BPF_AIO_HEADER + +#define BPF_AIO_OP_BITS 8 +#define BPF_AIO_OP_MASK ((1 << BPF_AIO_OP_BITS) - 1) + +enum bpf_aio_op { + BPF_AIO_OP_FS_READ = 0, + BPF_AIO_OP_FS_WRITE, + BPF_AIO_OP_FS_FSYNC, + BPF_AIO_OP_FS_FALLOCATE, + BPF_AIO_OP_LAST, +}; + +enum bpf_aio_flag_bits { + /* force to submit io from wq */ + __BPF_AIO_FORCE_WQ = BPF_AIO_OP_BITS, + __BPF_AIO_NR_BITS, /* stops here */ +}; + +enum bpf_aio_flag { + BPF_AIO_FORCE_WQ = (1 << __BPF_AIO_FORCE_WQ), +}; + +struct bpf_aio_work { + struct bpf_aio *aio; + struct work_struct work; +}; + +/* todo: support ubuf & iovec in future */ +struct bpf_aio_buf { + unsigned int bvec_off; + int nr_bvec; + const struct bio_vec *bvec; +}; + +struct bpf_aio { + unsigned int opf; + unsigned int bytes; + struct bpf_aio_buf buf; + struct bpf_aio_work *work; + const struct bpf_aio_complete_ops *ops; + struct kiocb iocb; +}; + +typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret); + +struct bpf_aio_complete_ops { + unsigned int id; + bpf_aio_complete_t bpf_aio_complete_cb; +}; + +static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio) +{ + return aio->opf & BPF_AIO_OP_MASK; +} + +int bpf_aio_init(void); +struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags); +struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags); +void bpf_aio_release(struct bpf_aio *aio); +int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, unsigned bytes, + unsigned io_flags); +#endif From patchwork Tue Jan 7 12:04:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928804 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4206E1EF0BD; Tue, 7 Jan 2025 12:08:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251760; cv=none; b=B4yOL77K1oJkCXeKMSzvqQ/rbcl30/oNVQjWhcZUfDWNoOtkC+GzyOnrdzDkcMIoOWXXLjuwl6s8zXAUfZeagmqiba4duijYsPtszDqLvBloeefu+Vy3CiRv4Z/vu7SfVR1df68zYOAMeWCArEg77SAK8LfssfUS5yXyAzXZ194= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251760; c=relaxed/simple; bh=KOPDlz0ubcytn137RK5uLmqeFSDFDGCRERP7Ce2gcWI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j3zQQ4t/SXPzKPs7y+7IweNmmCkRukY5aG+CKcooughC/XRxcVgiMeEqYi4+VinMMW1OFW+jH/HSf0mkVZR1JcfocRysayJxEajukU9K5pLyD2dgrO4qHFXvuCr7PkKyvAmFKtwbzAx7xluSTCJyYP+EKbaEsuDT+MyLFCkvSoE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=e5U81iPc; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="e5U81iPc" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-21631789fcdso165274375ad.1; Tue, 07 Jan 2025 04:08:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251725; x=1736856525; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r7mDzi2V2kDe56cR1ad6kssWnrtOL5M6WFmdeiINw/g=; b=e5U81iPcshfXEq/iXBJz9O1V52t0Z720NL7KoZLcfm/WyQEeF7cklQ3Ulqh6ZaZrAr V/oso9sr0aIg31CTjHTd87ic5J75ELgJoVW2AotFpKOHtTRDWNqMhkxT0Kja0cFcyCLu E/+WPhKR/KhbiAq485w9RjjU5Ti/T5IdjYOuqn1Gnn+nLCqWKRPjj9kdpZ1ZKHsfsXXx Zd+gC8+KmX/34BHQ5L6/g/EKc2hCyC6Yc9GDxHJR+yOIv+6Sdv6+rvNUSaB3iUwjHTzQ PoDHsIZHEdbKMS4naCkh3LCFs36TuC7aTw4egNk/qt3eFL7Xu76EejIwP5EqpUu9ng6T hqIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251725; x=1736856525; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r7mDzi2V2kDe56cR1ad6kssWnrtOL5M6WFmdeiINw/g=; b=pXedTlYzdDJmeqrpEOT6vbnUVRUHPdR4Ca6njSyrJLievX8nvAAFbB/m8MhDsXfGA9 RjnwVzZW9s4NliWNNj0AXZFgVjltR4XlULEkSPg6nphFwL2onVXIMsYJUGh3fcGPQcev edt1QJlEeHz2/O8CHhJGxULHkn5YyN4AcWhsMJiT/ZyAFUePbSpNtpGNfaU23UJpqt7B /yXvA/xKnx7BtfmFuQylYMIsJj9WVRfO8845VxzON9wZ1J0RZMKCBCWkto6H51dNXlC/ Fsty2H6Tz58rM1NGYEOIuqWyO5DXQwpst6viPw/LOHzRuqJd8PvAuYds3l9yPzVncdxR CCoA== X-Forwarded-Encrypted: i=1; AJvYcCVPluuMIcCEAXm249R9s0EiwTYSvV44ze7GpolQEMh7r9oKCw2TEg4CSBV/Q9IgGFf1Qip/+HY9ohEdvg==@vger.kernel.org X-Gm-Message-State: AOJu0YyOQ4hBJxa+U6Nw2Ha2nSS0MHvAiZ8wF+6FMelXNfqrG9KMdEWw nTPo5ZC9PYDGpOsu8pJCuBStujv1x/x31RGhA3P97o+mhTcAbVcv X-Gm-Gg: ASbGnctm9PcLleLpYfh7BLmXA1aso/j2MzPYvIDUTbtOM+H7mghXwnKEsyKfw4EkZF6 KvH32yETDtxYPI8i1/MS+cQyLuq9nKGmwCkwt89d9u7uq5nTcqzIKA/nvf/ygPLsd5yyIJBhQDS 7FAPRV6559NdZIaWHwSUYjeK3Xtg/6TJAMQMrFM24zBHMk2W1DECCRopQQ0yLoidmnfqGR48N/X wDZriyYfEP7Ue8q1MIuRmQhdz+Yy33v+3IH9ouFt0rzpK1e0sw1J3jRiLXfGJzEglq7 X-Google-Smtp-Source: AGHT+IHq3eUPzQggEyxNltVEksBMHROn2XG5o87qcYDtvp72CpXlxLsl8R3nBVqtokdMIDakCP6ElQ== X-Received: by 2002:a05:6a00:2445:b0:725:d64c:f122 with SMTP id d2e1a72fcca58-72d1036a99fmr4581556b3a.2.1736251725091; Tue, 07 Jan 2025 04:08:45 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:44 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Date: Tue, 7 Jan 2025 20:04:07 +0800 Message-ID: <20250107120417.1237392-17-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add bpf aio struct_ops, so that application code can provide bpf aio completion callback in the struct_ops prog, then bpf aio can be supported. Signed-off-by: Ming Lei --- drivers/block/ublk/Makefile | 2 +- drivers/block/ublk/bpf_aio.c | 7 ++ drivers/block/ublk/bpf_aio.h | 12 +++ drivers/block/ublk/bpf_aio_ops.c | 152 +++++++++++++++++++++++++++++++ 4 files changed, 172 insertions(+), 1 deletion(-) create mode 100644 drivers/block/ublk/bpf_aio_ops.c diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile index 7094607c040d..a47f65eb97f8 100644 --- a/drivers/block/ublk/Makefile +++ b/drivers/block/ublk/Makefile @@ -5,6 +5,6 @@ ccflags-y += -I$(src) ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o ifeq ($(CONFIG_UBLK_BPF), y) -ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o +ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o bpf_aio_ops.o endif obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c index 65013fe8054f..6e93f28f389b 100644 --- a/drivers/block/ublk/bpf_aio.c +++ b/drivers/block/ublk/bpf_aio.c @@ -243,9 +243,16 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, int __init bpf_aio_init(void) { + int err; + bpf_aio_cachep = KMEM_CACHE(bpf_aio, SLAB_PANIC); bpf_aio_work_cachep = KMEM_CACHE(bpf_aio_work, SLAB_PANIC); bpf_aio_wq = alloc_workqueue("bpf_aio", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + err = bpf_aio_struct_ops_init(); + if (err) { + pr_warn("error while initializing bpf aio struct_ops: %d", err); + return err; + } return 0; } diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h index 625737965c90..07fcd43fd2ac 100644 --- a/drivers/block/ublk/bpf_aio.h +++ b/drivers/block/ublk/bpf_aio.h @@ -3,6 +3,8 @@ #ifndef UBLK_BPF_AIO_HEADER #define UBLK_BPF_AIO_HEADER +#include "bpf_reg.h" + #define BPF_AIO_OP_BITS 8 #define BPF_AIO_OP_MASK ((1 << BPF_AIO_OP_BITS) - 1) @@ -47,9 +49,18 @@ struct bpf_aio { typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret); +/** + * struct bpf_aio_complete_ops - A BPF struct_ops of callbacks allowing to + * complete `bpf_aio` submitted by `bpf_aio_submit()` + * @id: id used by bpf aio consumer, defined by globally + * @bpf_aio_complete_cb: callback for completing submitted `bpf_aio` + * @provider: holding all consumers of this struct_ops prog, used by + * kernel only + */ struct bpf_aio_complete_ops { unsigned int id; bpf_aio_complete_t bpf_aio_complete_cb; + struct bpf_prog_provider provider; }; static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio) @@ -58,6 +69,7 @@ static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio) } int bpf_aio_init(void); +int bpf_aio_struct_ops_init(void); struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags); struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags); void bpf_aio_release(struct bpf_aio *aio); diff --git a/drivers/block/ublk/bpf_aio_ops.c b/drivers/block/ublk/bpf_aio_ops.c new file mode 100644 index 000000000000..12757f634dbd --- /dev/null +++ b/drivers/block/ublk/bpf_aio_ops.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Red Hat */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_aio.h" + +static DEFINE_XARRAY(bpf_aio_all_ops); +static DEFINE_MUTEX(bpf_aio_ops_lock); + +static bool bpf_aio_ops_is_valid_access(int off, int size, + enum bpf_access_type type, const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_aio_ops_btf_struct_access(struct bpf_verifier_log *log, + const struct bpf_reg_state *reg, + int off, int size) +{ + /* bpf_aio prog can change nothing */ + if (size > 0) + return -EACCES; + + return NOT_INIT; +} + +static const struct bpf_verifier_ops bpf_aio_verifier_ops = { + .get_func_proto = bpf_base_func_proto, + .is_valid_access = bpf_aio_ops_is_valid_access, + .btf_struct_access = bpf_aio_ops_btf_struct_access, +}; + +static int bpf_aio_ops_init(struct btf *btf) +{ + return 0; +} + +static int bpf_aio_ops_check_member(const struct btf_type *t, + const struct btf_member *member, + const struct bpf_prog *prog) +{ + if (prog->sleepable) + return -EINVAL; + return 0; +} + +static int bpf_aio_ops_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + const struct bpf_aio_complete_ops *uops; + struct bpf_aio_complete_ops *kops; + u32 moff; + + uops = (const struct bpf_aio_complete_ops *)udata; + kops = (struct bpf_aio_complete_ops*)kdata; + + moff = __btf_member_bit_offset(t, member) / 8; + + switch (moff) { + case offsetof(struct bpf_aio_complete_ops, id): + /* For dev_id, this function has to copy it and return 1 to + * indicate that the data has been handled by the struct_ops + * type, or the verifier will reject the map if the value of + * those fields is not zero. + */ + kops->id = uops->id; + return 1; + } + return 0; +} + +static int bpf_aio_reg(void *kdata, struct bpf_link *link) +{ + struct bpf_aio_complete_ops *ops = kdata; + struct bpf_aio_complete_ops *curr; + int ret = -EBUSY; + + mutex_lock(&bpf_aio_ops_lock); + if (!xa_load(&bpf_aio_all_ops, ops->id)) { + curr = kmalloc(sizeof(*curr), GFP_KERNEL); + if (curr) { + *curr = *ops; + bpf_prog_provider_init(&curr->provider); + ret = xa_err(xa_store(&bpf_aio_all_ops, ops->id, + curr, GFP_KERNEL)); + } else { + ret = -ENOMEM; + } + } + mutex_unlock(&bpf_aio_ops_lock); + + return ret; +} + +static void bpf_aio_unreg(void *kdata, struct bpf_link *link) +{ + struct bpf_aio_complete_ops *ops = kdata; + struct bpf_prog_consumer *consumer, *tmp; + struct bpf_aio_complete_ops *curr; + LIST_HEAD(consumer_list); + + mutex_lock(&bpf_aio_ops_lock); + curr = xa_erase(&bpf_aio_all_ops, ops->id); + if (curr) + list_splice_init(&curr->provider.list, &consumer_list); + mutex_unlock(&bpf_aio_ops_lock); + + list_for_each_entry_safe(consumer, tmp, &consumer_list, node) + bpf_prog_consumer_detach(consumer, true); + kfree(curr); +} + +static void bpf_aio_cb(struct bpf_aio *io, long ret) +{ +} + +static struct bpf_aio_complete_ops __bpf_aio_ops = { + .bpf_aio_complete_cb = bpf_aio_cb, +}; + +static struct bpf_struct_ops bpf_aio_ops = { + .verifier_ops = &bpf_aio_verifier_ops, + .init = bpf_aio_ops_init, + .check_member = bpf_aio_ops_check_member, + .init_member = bpf_aio_ops_init_member, + .reg = bpf_aio_reg, + .unreg = bpf_aio_unreg, + .name = "bpf_aio_complete_ops", + .cfi_stubs = &__bpf_aio_ops, + .owner = THIS_MODULE, +}; + +int __init bpf_aio_struct_ops_init(void) +{ + int err; + + err = register_bpf_struct_ops(&bpf_aio_ops, bpf_aio_complete_ops); + if (err) + pr_warn("error while registering bpf aio struct ops: %d", err); + + return 0; +} From patchwork Tue Jan 7 12:04:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928801 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A692B1F0E31; Tue, 7 Jan 2025 12:08:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251740; cv=none; b=SFssSFI3N4q8Z7WmkiZhsAD0Hfe+u3/QNC+z+d6hzUbem9ZXUpEnLpcgluiL34AZ4cRZ0vmIVnq7CaHn61QanNBpxaN/S8HGcnVyq6crCxMtHhfQSDa8lAlgXsn+Zx14Oij3R7YCpCnBL4I9yrVGjzMmUMk1+0UW3GHXFb7nkNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251740; c=relaxed/simple; bh=EHJeTu9LpZCTV6MNhTXwOlGx9U1Hi/QCyocVOYX16VY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tMHCWPxLHJmZBSeWcFRKImrzC13rW45hIql2A8nt/7oVje/l15r3rnKJEDl4M1tln0D3f9TvRFu7ahYffrHoial9ro8wrfCwKoAi8Zs0jvHXdrfKi0Qg9obLqh//HNhmqefHdaHpiiIRRbk6zyG9RTfvB5brmLwjGhA7ZWIDHbQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kNiBVrvG; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kNiBVrvG" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-21619108a6bso209375995ad.3; Tue, 07 Jan 2025 04:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251728; x=1736856528; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Avr5zrWxvqU6u4D79YsABli7JY6gs1aynzmiVyQrjoo=; b=kNiBVrvGczxojCq3SY7qvQGfsfF4K67BizqWvhW2WqnnyG+iUlbjfu6tadD/eCmal5 /pK52Gh4SsKFotE+HgEn4Veeh+4QT2+Y05cgL06TGYRKo2kb6eEvdbqZvawwg08i2R2L y0KBQLvFjJ5tpjlIAFdSnOuaZSHMbBq+IwndhqDlmbvJ60Q/CL3zxzxpTFGP21EFYWtv esFStfL7T6qvy2eksT3ZCsdsl0fy4PjNKitgEb2A8pSgCl7o/I1fagATYJe3KYcfOd6x fQtnLkKrj85TRNLSjy4hgsSYS5YaUWZZDpTCpSNv9762ClCFJfJUaZ778RGtRMc9iFcY rgjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251728; x=1736856528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Avr5zrWxvqU6u4D79YsABli7JY6gs1aynzmiVyQrjoo=; b=xJF05rhL/xjkBHoSiZu7idLBbUKBRT/P+iJB0oaW5Eab+J+EX2ReFLDZmZsiWDNr5X fUDCTRVxiJ5OSt53DA5rZg3PSbElCcP7be7dKNyoYiK4YmNNzb36bWeOJ0Xrm72y/EC0 2SgSNWShUk3+rCDiXijHGZd+7AevC6BwAoYb//NkbvlSvFieqrOGosU4KEFYZFzYCgmu 7LkcEs8d+e7eAT2HuMAjERKiFjjGt+SuOZSYP6FeAyt2hMUQogd9x2BaRe9JWZaix3yH FQ5bOn8kBKj/xVXOsbgRJrBAekiMAD0rBor+aK8NKE50cPDK3nVMLd1hWqKWfapjqoJP A9wQ== X-Forwarded-Encrypted: i=1; AJvYcCXQ0kn/2UM02jq310hOi8gEigsdaTLgxQPpZcYEEGPGFekQmVx4ZE5GJVFXuJ8OIhMxcrOergVAVipfog==@vger.kernel.org X-Gm-Message-State: AOJu0Yx3Oci2U7UE6lgV8PWcTr6ffrrYsqymPSzjmQJ9hgr8wEjWMm/E ov2IBqnLhMLu/mmP8WlaHu1P5hTfSfHak2fyRRj6OQRhf9phpV5z X-Gm-Gg: ASbGnctte+tJfF85tWGBHzwYMaciQzsoy3LzBY4viyOVmNHX91QltfKKZbOBDMpbYAJ IX7Bredxbuw2P4RMAg8840gshNDIoMXpm+l0ISYf8Z1JbPaNCHqD6DCWLYQaD/ZL9eqQwWWnDzS KfEbv5dlKvLGOzheG+T/qWie/O0PBvUY1N4k0aPsSfiddumHeWgaM2GH1LGYO0krshl8zo+/ENJ c2bZiXdbFnw6afGkIiNUhmjqHWhR3GvWWNx3+zHg2ri8hOvBc+pSQ9PxGCS7uwLpxMg X-Google-Smtp-Source: AGHT+IFQU222wEqut6YxLqWX3vKxdPmjpCZ6ZVIHKaFAl4WbT1+D6QKlfX1lHoQuqt0tkt47p2vjKA== X-Received: by 2002:a05:6a20:430e:b0:1e6:b2d7:4cf0 with SMTP id adf61e73a8af0-1e6b2d74d7amr11044762637.41.1736251728035; Tue, 07 Jan 2025 04:08:48 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:47 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Date: Tue, 7 Jan 2025 20:04:08 +0800 Message-ID: <20250107120417.1237392-18-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Attach bpf aio program to ublk device before adding ublk disk, and detach it after the disk is removed. And when the bpf aio prog is unregistered, all devices will detach from the prog automatically. ublk device needs to provide the bpf aio struct_ops ID for attaching the specific prog, and each ublk device has to attach to only single bpf prog. So that we can use the attached bpf aio prog to submit bpf aio for handling ublk IO. Given bpf aio prog is attached to ublk device, ublk bpf prog has to provide one kfunc to assign 'bpf_aio_complete_ops *' to 'struct bpf_aio' instance. Signed-off-by: Ming Lei --- drivers/block/ublk/bpf.c | 81 +++++++++++++++++++++++++++++++- drivers/block/ublk/bpf_aio.c | 4 ++ drivers/block/ublk/bpf_aio.h | 4 ++ drivers/block/ublk/bpf_aio_ops.c | 22 +++++++++ drivers/block/ublk/ublk.h | 10 ++++ include/uapi/linux/ublk_cmd.h | 4 +- 6 files changed, 123 insertions(+), 2 deletions(-) diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c index d5880d61abe5..921bbbcf4d9e 100644 --- a/drivers/block/ublk/bpf.c +++ b/drivers/block/ublk/bpf.c @@ -19,6 +19,79 @@ static int ublk_set_bpf_ops(struct ublk_device *ub, return 0; } +static int ublk_set_bpf_aio_op(struct ublk_device *ub, + struct bpf_aio_complete_ops *ops) +{ + int i; + + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) { + if (ops && ublk_get_queue(ub, i)->bpf_aio_ops) { + ublk_set_bpf_aio_op(ub, NULL); + return -EBUSY; + } + ublk_get_queue(ub, i)->bpf_aio_ops = ops; + } + return 0; +} + +static int ublk_bpf_aio_prog_attach_cb(struct bpf_prog_consumer *consumer, + struct bpf_prog_provider *provider) +{ + struct ublk_device *ub = container_of(consumer, struct ublk_device, + aio_prog); + struct bpf_aio_complete_ops *ops = container_of(provider, + struct bpf_aio_complete_ops, provider); + int ret = -ENODEV; + + if (ublk_get_device(ub)) { + ret = ublk_set_bpf_aio_op(ub, ops); + if (ret) + ublk_put_device(ub); + } + + return ret; +} + +static void ublk_bpf_aio_prog_detach_cb(struct bpf_prog_consumer *consumer, + bool unreg) +{ + struct ublk_device *ub = container_of(consumer, struct ublk_device, + aio_prog); + + if (unreg) { + blk_mq_freeze_queue(ub->ub_disk->queue); + ublk_set_bpf_aio_op(ub, NULL); + blk_mq_unfreeze_queue(ub->ub_disk->queue); + } else { + ublk_set_bpf_aio_op(ub, NULL); + } + ublk_put_device(ub); +} + +static const struct bpf_prog_consumer_ops ublk_aio_prog_consumer_ops = { + .attach_fn = ublk_bpf_aio_prog_attach_cb, + .detach_fn = ublk_bpf_aio_prog_detach_cb, +}; + +static int ublk_bpf_aio_attach(struct ublk_device *ub) +{ + if (!ublk_dev_support_bpf_aio(ub)) + return 0; + + ub->aio_prog.prog_id = ub->params.bpf.aio_ops_id; + ub->aio_prog.ops = &ublk_aio_prog_consumer_ops; + + return bpf_aio_prog_attach(&ub->aio_prog); +} + +static void ublk_bpf_aio_detach(struct ublk_device *ub) +{ + if (!ublk_dev_support_bpf_aio(ub)) + return; + bpf_aio_prog_detach(&ub->aio_prog); +} + + static int ublk_bpf_prog_attach_cb(struct bpf_prog_consumer *consumer, struct bpf_prog_provider *provider) { @@ -76,19 +149,25 @@ static const struct bpf_prog_consumer_ops ublk_prog_consumer_ops = { int ublk_bpf_attach(struct ublk_device *ub) { + int ret; + if (!ublk_dev_support_bpf(ub)) return 0; ub->prog.prog_id = ub->params.bpf.ops_id; ub->prog.ops = &ublk_prog_consumer_ops; - return ublk_bpf_prog_attach(&ub->prog); + ret = ublk_bpf_prog_attach(&ub->prog); + if (ret) + return ret; + return ublk_bpf_aio_attach(ub); } void ublk_bpf_detach(struct ublk_device *ub) { if (!ublk_dev_support_bpf(ub)) return; + ublk_bpf_aio_detach(ub); ublk_bpf_prog_detach(&ub->prog); } diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c index 6e93f28f389b..da050be4b710 100644 --- a/drivers/block/ublk/bpf_aio.c +++ b/drivers/block/ublk/bpf_aio.c @@ -213,6 +213,10 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, { struct file *file; + /* + * ->ops has to assigned by kfunc of consumer subsystem because + * bpf prog lifetime is aligned with the consumer subsystem + */ if (!aio->ops) return -EINVAL; diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h index 07fcd43fd2ac..d144c5e20dcb 100644 --- a/drivers/block/ublk/bpf_aio.h +++ b/drivers/block/ublk/bpf_aio.h @@ -75,4 +75,8 @@ struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_f void bpf_aio_release(struct bpf_aio *aio); int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, unsigned bytes, unsigned io_flags); + +int bpf_aio_prog_attach(struct bpf_prog_consumer *consumer); +void bpf_aio_prog_detach(struct bpf_prog_consumer *consumer); + #endif diff --git a/drivers/block/ublk/bpf_aio_ops.c b/drivers/block/ublk/bpf_aio_ops.c index 12757f634dbd..04ad45fd24e6 100644 --- a/drivers/block/ublk/bpf_aio_ops.c +++ b/drivers/block/ublk/bpf_aio_ops.c @@ -120,6 +120,28 @@ static void bpf_aio_unreg(void *kdata, struct bpf_link *link) kfree(curr); } +int bpf_aio_prog_attach(struct bpf_prog_consumer *consumer) +{ + unsigned id = consumer->prog_id; + struct bpf_aio_complete_ops *ops; + int ret = -EINVAL; + + mutex_lock(&bpf_aio_ops_lock); + ops = xa_load(&bpf_aio_all_ops, id); + if (ops && ops->id == id) + ret = bpf_prog_consumer_attach(consumer, &ops->provider); + mutex_unlock(&bpf_aio_ops_lock); + + return ret; +} + +void bpf_aio_prog_detach(struct bpf_prog_consumer *consumer) +{ + mutex_lock(&bpf_aio_ops_lock); + bpf_prog_consumer_detach(consumer, false); + mutex_unlock(&bpf_aio_ops_lock); +} + static void bpf_aio_cb(struct bpf_aio *io, long ret) { } diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 8343e70bd723..2c33f6a94bf2 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -126,6 +126,7 @@ struct ublk_queue { #ifdef CONFIG_UBLK_BPF struct ublk_bpf_ops *bpf_ops; + struct bpf_aio_complete_ops *bpf_aio_ops; #endif unsigned short force_abort:1; @@ -159,6 +160,7 @@ struct ublk_device { #ifdef CONFIG_UBLK_BPF struct bpf_prog_consumer prog; + struct bpf_prog_consumer aio_prog; #endif struct mutex mutex; @@ -203,6 +205,14 @@ static inline bool ublk_dev_support_bpf(const struct ublk_device *ub) return ub->dev_info.flags & UBLK_F_BPF; } +static inline bool ublk_dev_support_bpf_aio(const struct ublk_device *ub) +{ + if (!ublk_dev_support_bpf(ub)) + return false; + + return ub->params.bpf.flags & UBLK_BPF_HAS_AIO_OPS_ID; +} + struct ublk_device *ublk_get_device(struct ublk_device *ub); struct ublk_device *ublk_get_device_from_id(int idx); void ublk_put_device(struct ublk_device *ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index 27cf14e65cbc..ed6df4d61e89 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -406,9 +406,11 @@ struct ublk_param_zoned { struct ublk_param_bpf { #define UBLK_BPF_HAS_OPS_ID (1 << 0) +#define UBLK_BPF_HAS_AIO_OPS_ID (1 << 1) __u8 flags; __u8 ops_id; - __u8 reserved[6]; + __u16 aio_ops_id; + __u8 reserved[4]; }; struct ublk_params { From patchwork Tue Jan 7 12:04:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928805 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B74231EE7CD; Tue, 7 Jan 2025 12:08:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251765; cv=none; b=VDJQf+RGfh1ySRYZWOdSd3WxJgvBIhnETBjOP2YByS3EdcO7c8nzqwMI5BBlDjU7zyaM31GE1FxRUEEESGRIqPbL1hRYjENBOb+RldNx+poUcJyx9ialx5Qm4tuSREO+9g6eVLKsfGCwixBQlZi9bkYgZUfX5kq+PChie66u1Jg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251765; c=relaxed/simple; bh=DuYK349QWocmvKBH2q5HjOCLoOLJyazu8j8fZf0/Y0E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X+X5jm2l9gW0HbmPPX2qlscmL6QUHLD+eRbTzzg8dHTRuW9R60IwJIXciN0CYZmCdCvslRFrry1g/nylCl0k8RFh2AnMu7NLzECcaQTUrhM7shYg2CVVw3N2nU4t4F2SGmsRbEmLrknwL9kWvcYfzlCR7gglnqgvqi6px8Afoqw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NOg80+ou; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NOg80+ou" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21661be2c2dso205225545ad.1; Tue, 07 Jan 2025 04:08:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251731; x=1736856531; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0H88QF/Rk74Ebj0cgESBiJW8Kg50jsqod4brl+UMFwI=; b=NOg80+ou1Dt9rU5DTA9QKOmQQHbhU5xU4oV4O9x+CG/7ZolhjUcR4GMG7IudBMPYrq jVKRc6m9wZxO5iaiv4RkQpvNJx0VoNSHI8jYD5zwe1lDBYyOgxgwveg9JKFrJzk43/ON ymniBFj8J/GHhthS6UUtaAiWEbqE+JDvswnOYDncRZX0cB/UgYnPJWVLSc40CSj6TXEm 7wOid8z27oyfBTX6lSkUE5GQBC1SjozVMFWoh8Gf2xdRkTuoGjBx6f+m/Dmi1puW0tZK RIRPCd9a9cJ65u1M48iqfnJAU/SR6XmDjRwqt3j4i+qXbTZM84GumveBDoiRaTjZrT8D luUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251731; x=1736856531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0H88QF/Rk74Ebj0cgESBiJW8Kg50jsqod4brl+UMFwI=; b=vAfrts3tVPfQht17pVkETX1ojKllMROGy7/Y0FTPiCgAyUskMSbYl/n4n7VszLUL/A MTc9N0oURCMN1NGE6FMeJ8h+zA3O0vOzy5GXNIAIDYFKOFySIwzqN8Mcy6HLmTk0DsV6 XH+H3WHftud3vMWJ80KwQkdrO5E3HPihQL2vp9sL11SgRqSVhuw5WGwk4mP0fR0eKlCH EbMsfQrUkICxytJHEOX+9suHzj2pDh8C5hUShO8+a2NU+0wricKcFuy4zeXlVNGozSv8 yf5ypeZ6Y6S2yOqVkPHyt3Ij/lhWP4kWi62Z6yGiO6eAJj87r1DSgDDKvELRdP77NnKH V/Zg== X-Forwarded-Encrypted: i=1; AJvYcCV6piKXSDZ90aSSmHvxQZEwffQfrmorwEeuMxnTTamvsC0axbAO4zke5+geFcYKGS3xTyqr7bDDS78sGA==@vger.kernel.org X-Gm-Message-State: AOJu0Yy4H9oWYf9q/HqcC0Z1XMLGkO+ibSCQJI6vKzjmUmh31okgopJs m8lYry/3sv6MutljWRzmIwVYwge7OqJEAuXrL/+hLggiZkST+AQG X-Gm-Gg: ASbGncs5zWuoRlaTkPkFyFIjDvXRqKX7WZp9k1UKYo5r06qrgIschD3JwKN6GHBpOfr bepz/j5lxtyHgZNiw5AzvktqugnKcKNLzkRV9DqganCsWLEQShq8peev8b1UIwSeaJuvCDMPLRK /0L4wDSdr7KWGd9rydMnXdl/h8hwPEXTbp1rTH28TY+hN2bZV3JHEOK1BkCqhm2P51F4wYPcT9q bNkGg5vYul8eIVOSgIPc5SGyVO5lzAn6u7nKeM4Z/R2WLEySjHwf59ipU8IPxqJeadC X-Google-Smtp-Source: AGHT+IGnRh9cdtFUwPt/oUrHUR6JYIEVNO9kbawsgfW4Ycr3KcLPP2EX7S6p3lJzxu6HTSlwyhZMRw== X-Received: by 2002:a05:6a00:8d8c:b0:71e:a3:935b with SMTP id d2e1a72fcca58-72abdee2117mr94750158b3a.25.1736251731560; Tue, 07 Jan 2025 04:08:51 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:50 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Date: Tue, 7 Jan 2025 20:04:09 +0800 Message-ID: <20250107120417.1237392-19-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add ublk bpf aio kfuncs for bpf prog to do: - prepare buffer - assign bpf aio struct_ops - submit bpf aios for handle ublk io command - deal with ublk io and bpf aio lifetime, and make sure that ublk io won't be completed until all bpf aios are completed Signed-off-by: Ming Lei --- drivers/block/ublk/bpf.c | 77 ++++++++++++++++++++++++++++++++++++ drivers/block/ublk/bpf_aio.c | 6 ++- drivers/block/ublk/bpf_aio.h | 38 +++++++++++++++++- drivers/block/ublk/ublk.h | 2 + 4 files changed, 121 insertions(+), 2 deletions(-) diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c index 921bbbcf4d9e..c0babf6d5868 100644 --- a/drivers/block/ublk/bpf.c +++ b/drivers/block/ublk/bpf.c @@ -228,6 +228,77 @@ ublk_bpf_complete_io(struct ublk_bpf_io *io, int res) ublk_bpf_complete_io_cmd(io, res); } +/* + * Called before submitting one bpf aio in prog, and this ublk IO's + * reference is increased. + * + * Grab reference of `io` for this `aio`, and the reference will be dropped + * by ublk_bpf_dettach_and_complete_aio() + */ +__bpf_kfunc int +ublk_bpf_attach_and_prep_aio(const struct ublk_bpf_io *_io, unsigned off, + unsigned bytes, struct bpf_aio *aio) +{ + struct ublk_bpf_io *io = (struct ublk_bpf_io *)_io; + const struct request *req; + const struct ublk_rq_data *data; + const struct ublk_bpf_io *bpf_io; + + if (!io || !aio) + return -EINVAL; + + req = ublk_bpf_get_req(io); + if (!req) + return -EINVAL; + + if (off + bytes > blk_rq_bytes(req)) + return -EINVAL; + + if (req->mq_hctx) { + const struct ublk_queue *ubq = req->mq_hctx->driver_data; + + bpf_aio_assign_cb(aio, ubq->bpf_aio_ops); + } + + data = blk_mq_rq_to_pdu((struct request *)req); + bpf_io = &data->bpf_data; + bpf_aio_assign_buf(aio, &bpf_io->buf, off, bytes); + + refcount_inc(&io->ref); + aio->private_data = (void *)io; + + return 0; +} + +/* + * Called after this attached aio is completed, and the associated ublk IO's + * reference is decreased, and if the reference is dropped to zero, complete + * this ublk IO. + * + * Return -EIOCBQUEUED if this `io` is being handled, and 0 is returned + * if it can be completed now. + */ +__bpf_kfunc void +ublk_bpf_dettach_and_complete_aio(struct bpf_aio *aio) +{ + struct ublk_bpf_io *io = aio->private_data; + + if (io) { + ublk_bpf_io_dec_ref(io); + aio->private_data = NULL; + } +} + +__bpf_kfunc struct ublk_bpf_io *ublk_bpf_acquire_io_from_aio(struct bpf_aio *aio) +{ + return aio->private_data; +} + +__bpf_kfunc void ublk_bpf_release_io_from_aio(struct ublk_bpf_io *io) +{ +} + + BTF_KFUNCS_START(ublk_bpf_kfunc_ids) BTF_ID_FLAGS(func, ublk_bpf_complete_io, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL) @@ -240,6 +311,12 @@ BTF_ID_FLAGS(func, bpf_aio_alloc, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_aio_alloc_sleepable, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_aio_release) BTF_ID_FLAGS(func, bpf_aio_submit) + +/* ublk bpf aio kfuncs */ +BTF_ID_FLAGS(func, ublk_bpf_attach_and_prep_aio) +BTF_ID_FLAGS(func, ublk_bpf_dettach_and_complete_aio) +BTF_ID_FLAGS(func, ublk_bpf_acquire_io_from_aio, KF_ACQUIRE) +BTF_ID_FLAGS(func, ublk_bpf_release_io_from_aio, KF_RELEASE) BTF_KFUNCS_END(ublk_bpf_kfunc_ids) __bpf_kfunc void bpf_aio_release_dtor(void *aio) diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c index da050be4b710..06a6cc8f38b1 100644 --- a/drivers/block/ublk/bpf_aio.c +++ b/drivers/block/ublk/bpf_aio.c @@ -211,6 +211,7 @@ __bpf_kfunc void bpf_aio_release(struct bpf_aio *aio) __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, unsigned bytes, unsigned io_flags) { + unsigned op = bpf_aio_get_op(aio); struct file *file; /* @@ -220,6 +221,9 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, if (!aio->ops) return -EINVAL; + if (unlikely((bytes > aio->buf_size) && bpf_aio_is_rw(op))) + return -EINVAL; + file = fget(fd); if (!file) return -EINVAL; @@ -232,7 +236,7 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, aio->iocb.ki_filp = file; aio->iocb.ki_flags = io_flags; aio->bytes = bytes; - if (bpf_aio_is_rw(bpf_aio_get_op(aio))) { + if (bpf_aio_is_rw(op)) { if (file->f_flags & O_DIRECT) aio->iocb.ki_flags |= IOCB_DIRECT; else diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h index d144c5e20dcb..0683139f5354 100644 --- a/drivers/block/ublk/bpf_aio.h +++ b/drivers/block/ublk/bpf_aio.h @@ -40,11 +40,15 @@ struct bpf_aio_buf { struct bpf_aio { unsigned int opf; - unsigned int bytes; + union { + unsigned int bytes; + unsigned int buf_size; + }; struct bpf_aio_buf buf; struct bpf_aio_work *work; const struct bpf_aio_complete_ops *ops; struct kiocb iocb; + void *private_data; }; typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret); @@ -68,6 +72,38 @@ static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio) return aio->opf & BPF_AIO_OP_MASK; } +/* Must be called from kfunc defined in consumer subsystem */ +static inline void bpf_aio_assign_cb(struct bpf_aio *aio, + const struct bpf_aio_complete_ops *ops) +{ + aio->ops = ops; +} + +/* + * Skip `skip` bytes and assign the advanced source buffer for `aio`, so + * we can cover this part of source buffer by this `aio` + */ +static inline void bpf_aio_assign_buf(struct bpf_aio *aio, + const struct bpf_aio_buf *src, unsigned skip, + unsigned bytes) +{ + const struct bio_vec *bvec, *end; + struct bpf_aio_buf *abuf = &aio->buf; + + skip += src->bvec_off; + for (bvec = src->bvec, end = bvec + src->nr_bvec; bvec < end; bvec++) { + if (likely(skip < bvec->bv_len)) + break; + skip -= bvec->bv_len; + } + + aio->buf_size = bytes; + abuf->bvec_off = skip; + abuf->nr_bvec = src->nr_bvec - (bvec - src->bvec); + abuf->bvec = bvec; +} + + int bpf_aio_init(void); int bpf_aio_struct_ops_init(void); struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags); diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 2c33f6a94bf2..4bd04512c894 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -8,6 +8,7 @@ #include #include "bpf_reg.h" +#include "bpf_aio.h" #define UBLK_MINORS (1U << MINORBITS) @@ -47,6 +48,7 @@ struct ublk_bpf_io { unsigned long flags; refcount_t ref; int res; + struct bpf_aio_buf buf; }; struct ublk_rq_data { From patchwork Tue Jan 7 12:04:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928803 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D79041EE7D5; Tue, 7 Jan 2025 12:09:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251749; cv=none; b=bmZdrwMD5f8RrQcUUo21rgAKtYqoX9M7B2uTGBfQb1LoAmS1Ee1ynDWgJXhL1C8RTiiQcZMqRs5dz6JITDP2te826Mr/Oab/r4kEdSgFPaIIvX9X1G1scWdq7PEIVUxVmNXpTe0f5HxbXs67QwKu3Q0joiSH5ixck735HJ85l+U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251749; c=relaxed/simple; bh=UOVyCpofd4EXdvIUpGT4UjWdrswzHfdF2v4vMcmK59o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ue7UZ8PmrZcSSSYoRbplkS9IUQjVUwj08nSKf6b5lQezdMNs/trN40sIcge/dYxqiMNXe0kuLDybx07bMcvT3nVsduwHbSjmSr/7mDj7pe0Kz+3QOQyaIqlebg95ZgwdOEseIc6yaIdHnytJTp3B5ZabSLS7m8UPKlg1ssZUZyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QU/VBZ4z; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QU/VBZ4z" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21628b3fe7dso218304075ad.3; Tue, 07 Jan 2025 04:09:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251735; x=1736856535; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z28FLukQqDxI3mqoQ1xUq6UUA9jJA+giSyvGS4e6c8c=; b=QU/VBZ4zRuSCSSKtW/3Ijsvn65PD5UT0+bOdueWFmrlLskamg9cMFD2QQo5f5FsVcY s6oTQb5QzBwMexaLeEypTQqWFeGrx/xW+vNGPm5vaZ++VzDOBE6/w4QdNrT8dOnvwvaE h7EPVGUfg3iS5LsyVJ1Dmu1+5ouMVGl+qPMvI0W8LXVPGpdnqFwE72LzUFESTvINBbID dr01++fjdgGOSxlURTsj/xsr60sTRa/aqEWMCAtXbqQ8UHhQ7ymH9+/TPBrNJQchJOOZ dfkN0RVZPP54hpDGLxaQPGhSfu0/Wg9sL3tqrXsVmFsKLGaLr22vjWrNnsDeZK+rV5G4 HzZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251735; x=1736856535; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z28FLukQqDxI3mqoQ1xUq6UUA9jJA+giSyvGS4e6c8c=; b=gzTBh6wAKA/hkVt4GdiyXLs8aFKj5pgdc7VWOLkvTFjneHmHuwclHxpcsq6PNjJONk a/phmLlP1JuZ34nzRDzx530uuL2YLuaSB/eF9uKE+NI0o7HXhVJ89HPTImbLO4i8r4T3 xo9AmM9JLwDQXLZrotSDcd1kCbuGBLO3q+yJVnww9U3RmkTBD+LFs1XEorTBOVfMFgKD ebnWpFCHWvU+lPEv9HLadHWawP+7DTkfVU/PMuWIqJcxOgKH2h2LV0una87lKe+na+kK HDbIT9rRJACNxfEM00DSWuF43recn2ytypl/5wL+UDeu2NcborpEW6MfZ4UorkVPeXuZ 5UWw== X-Forwarded-Encrypted: i=1; AJvYcCUDNxI1TrLRqVTnzPkmf7Qb9jXiaN/hXxua8y+w+MzAPHKdaRfiNnwYo6rKI12LJp4P91FnNlfAU47ojg==@vger.kernel.org X-Gm-Message-State: AOJu0Yzr494e6c2CSJGt5QKtJMzXtemeKg6W8xdkNjYGYim9VhfpheBQ OQQ1peEFNI1cuAs8B0rnHYss2fO47B6+ZbTrji8+aWwKPO5vsfehlmslUxwdXKc= X-Gm-Gg: ASbGncuCUDdyv+bA8D2ujEu28v6jRAeFxJIYMoIxHT/waEREdbCRgnGQ8EyFhHOYcB4 3n07XHfmmsZ5vQFOri3wh4GwgzD2Qslss5FUn2/6GZTO+nsAg7uXJUL5gox/coiqYhuZ1I8FhJA /jcxxOQVl41GH8Jf3vKp8aUKdwtQ9yAts4uakiPqQzedHZb9jfq8vHq9kCo9KQBkqETNq0escO/ b0U37VIHGqymuYpG1LxBfVdAZ33EVv37mFhqCfKqT1RIdWWZ0n/Zg1gnhJV0sHlDVoG X-Google-Smtp-Source: AGHT+IFQeLqa9/UEKemi5zYwifY340nv63/+KSbC7bfcikGN77+wT5sopQrYD4I1aL1vEIj3sWfsng== X-Received: by 2002:a05:6a00:244c:b0:724:bf30:3030 with SMTP id d2e1a72fcca58-72abdbe59f6mr82188396b3a.0.1736251735279; Tue, 07 Jan 2025 04:08:55 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:54 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Date: Tue, 7 Jan 2025 20:04:10 +0800 Message-ID: <20250107120417.1237392-20-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add ublk_bpf_aio_prep_io_buf() and call it before running ublk bpf prog, so wire everything together. Signed-off-by: Ming Lei --- drivers/block/ublk/bpf.h | 13 +++++++++ drivers/block/ublk/bpf_ops.c | 51 +++++++++++++++++++++++++++++++++++- drivers/block/ublk/main.c | 5 ---- drivers/block/ublk/ublk.h | 6 +++++ 4 files changed, 69 insertions(+), 6 deletions(-) diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h index 0ab25743ae7d..a3d238bc707d 100644 --- a/drivers/block/ublk/bpf.h +++ b/drivers/block/ublk/bpf.h @@ -99,6 +99,9 @@ static inline void ublk_bpf_io_dec_ref(struct ublk_bpf_io *io) ubq->bpf_ops->release_io_cmd(io); } + if (test_bit(UBLK_BPF_BVEC_ALLOCATED, &io->flags)) + kvfree(io->buf.bvec); + if (test_bit(UBLK_BPF_IO_COMPLETED, &io->flags)) { smp_rmb(); __clear_bit(UBLK_BPF_IO_PREP, &io->flags); @@ -158,6 +161,11 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) return ublk_get_bpf_io_cb_daemon(ubq); } +static inline bool ublk_support_bpf_aio(const struct ublk_queue *ubq) +{ + return ublk_support_bpf(ubq) && ubq->bpf_aio_ops; +} + int ublk_bpf_init(void); int ublk_bpf_struct_ops_init(void); int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer); @@ -190,6 +198,11 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq) return NULL; } +static inline bool ublk_support_bpf_aio(const struct ublk_queue *ubq) +{ + return false; +} + static inline int ublk_bpf_init(void) { return 0; diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c index 05d8d415b30d..7085eab5e99b 100644 --- a/drivers/block/ublk/bpf_ops.c +++ b/drivers/block/ublk/bpf_ops.c @@ -155,6 +155,49 @@ void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer) mutex_unlock(&ublk_bpf_ops_lock); } +static int ublk_bpf_aio_prep_io_buf(const struct request *req) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu((struct request *)req); + struct ublk_bpf_io *io = &data->bpf_data; + struct req_iterator rq_iter; + struct bio_vec *bvec; + struct bio_vec bv; + unsigned offset; + + io->buf.bvec = NULL; + io->buf.nr_bvec = 0; + + if (!ublk_rq_has_data(req)) + return 0; + + rq_for_each_bvec(bv, req, rq_iter) + io->buf.nr_bvec++; + + if (!io->buf.nr_bvec) + return 0; + + if (req->bio != req->biotail) { + int idx = 0; + + bvec = kvmalloc_array(io->buf.nr_bvec, sizeof(struct bio_vec), + GFP_NOIO); + if (!bvec) + return -ENOMEM; + + offset = 0; + rq_for_each_bvec(bv, req, rq_iter) + bvec[idx++] = bv; + __set_bit(UBLK_BPF_BVEC_ALLOCATED, &io->flags); + } else { + struct bio *bio = req->bio; + + offset = bio->bi_iter.bi_bvec_done; + bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + } + io->buf.bvec = bvec; + io->buf.bvec_off = offset; + return 0; +} static void ublk_bpf_prep_io(struct ublk_bpf_io *io, const struct ublksrv_io_desc *iod) @@ -180,8 +223,14 @@ bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req, bool res = true; int err; - if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags)) + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags)) { ublk_bpf_prep_io(bpf_io, iod); + if (ublk_support_bpf_aio(ubq)) { + err = ublk_bpf_aio_prep_io_buf(req); + if (err) + goto fail; + } + } do { enum ublk_bpf_disposition rc; diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c index 3c2ed9bf924d..1974ebd33ce0 100644 --- a/drivers/block/ublk/main.c +++ b/drivers/block/ublk/main.c @@ -512,11 +512,6 @@ void ublk_put_device(struct ublk_device *ub) put_device(&ub->cdev_dev); } -static inline bool ublk_rq_has_data(const struct request *rq) -{ - return bio_has_data(rq->bio); -} - static inline char *ublk_queue_cmd_buf(struct ublk_device *ub, int q_id) { return ublk_get_queue(ub, q_id)->io_cmd_buf; diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h index 4bd04512c894..00b09589d95c 100644 --- a/drivers/block/ublk/ublk.h +++ b/drivers/block/ublk/ublk.h @@ -41,6 +41,7 @@ enum { UBLK_BPF_IO_PREP = 0, UBLK_BPF_IO_COMPLETED = 1, + UBLK_BPF_BVEC_ALLOCATED = 2, }; struct ublk_bpf_io { @@ -215,6 +216,11 @@ static inline bool ublk_dev_support_bpf_aio(const struct ublk_device *ub) return ub->params.bpf.flags & UBLK_BPF_HAS_AIO_OPS_ID; } +static inline bool ublk_rq_has_data(const struct request *rq) +{ + return bio_has_data(rq->bio); +} + struct ublk_device *ublk_get_device(struct ublk_device *ub); struct ublk_device *ublk_get_device_from_id(int idx); void ublk_put_device(struct ublk_device *ub); From patchwork Tue Jan 7 12:04:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928808 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 933631EE7B4; Tue, 7 Jan 2025 12:09:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251792; cv=none; b=ZGSXuQqILfRBYVKf7JEpl8lmwYdLGAbboME+PFVK1KNMA49/KA07wOFHK3OJZPQpguO1IbbW9lp4+pPFXjuzk2tX2R275hkdlkPoyKSbrL6nwvJ6XS7ubinC9qsMj0YIug/OCvaFA90PD0Wt79kepTflUkNElbC6e8xmwDMLZCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251792; c=relaxed/simple; bh=f5L3NOtDgrIxKxz+MIuEtL6HHvK1MQWt487MDa2bWI8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o9Pt17trSgt+gTZdZxvBfE1iCSjW9eznuaGhs7NLtWceJNpaS5sxsl2LHFdRGiCrG5JHc4CVIdyD8ozBqb6u7e5N/BbkmQwLRZ6PiC2Zo10jPEhInlMkiXKCxSA+2+ezMwTuEdxm2UqL/3qCZ+81IWmBzp8cOzbZD5dTYp6Y2o8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mc/8B4EL; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mc/8B4EL" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21636268e43so45719895ad.2; Tue, 07 Jan 2025 04:09:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251738; x=1736856538; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3BuNFZWGqYgSgLMfecofkQzEbOmOZM/5htkp9yJ0Ubs=; b=mc/8B4ELowCq+NBsdTQwy1ysL3c0Zg6F1FIQj2dpXw3+jzD7cvU+w1F0W27mDKRA+2 DJq5X5mUid0fVVLiCuiH3YNUC2k+1zi5QHgZ/zFTpC/t2+M/LOywbKc9PmcnFIJgJbT2 BfNilZCZAlM6tzpq1/vGtl3XOVD31TWK86Wd2JMCGmFub7E0NdCetkYo/dN49QAThbvn VTJuUkd10RySjn0xjKqApSwuqwCI40rS9GW0s/mUkQy5iHtSCuAJNAFQywTHZi7KFTEB C+ARXRHOgdMO8o+neiqGFqnPAvKcp5WxGp23HVkIee6mKd8XnetL6H10Pof33STcH9AH CS1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251738; x=1736856538; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3BuNFZWGqYgSgLMfecofkQzEbOmOZM/5htkp9yJ0Ubs=; b=TS0rTCitAfwMCswxxOlBZrVbkvYM9jckxoiMrnDtQoZmJ+W2Gmge7ETnJLhuapyfgi pzDoyeW2CDeV3v766i1ah2Fba8lTKPsIjEEGs5WG8tD0+S8ze5gBDwU+fMxetkYU5w2V o/fm75goyiZb/BX8Q+Scq+c1w7ps73Qh2MRObmjH08lE5JnhabVuBlSmwSjl7HIVByaA pvkHYvfcm+CqZt4lGZ5LyawT/VkhjoM+GivKmk/ipCqgi+uF1Kx0MCYJYlQUcHXyK+a5 xmFvpDFcH7U/Ibx8qwpr1ob7WBiiwZsRxdGTToDC9i5PMnfoYC54pqlj35eiqnGF+1cY CUdA== X-Forwarded-Encrypted: i=1; AJvYcCX50qiIkWN8WzhCRaIknJXyRP1YKhOH7gdyzkWiy+B63wtaAdbmmPddISEOvF/CO292XulUqSm/Z9iJjQ==@vger.kernel.org X-Gm-Message-State: AOJu0YzNUFXoflpv/tkl09VoxQjaIsQbmf7/x0JiHGmDMhr7+B8aZhde 555YMxXPlkE1/hrIycU+0erchJOkdp5ZxNAQVyZEfWiG3lSTmpHI X-Gm-Gg: ASbGnctNkmCAiMCGO15MXPHbNaylPQdlwL9P4VyTRvUzYdhZwsuM3WhW50OgW9SXOkj 1gNvPOvFY8Df87oG4ZluA8PSJwDTzJdZ8NaDvCq9EKv+VcvSa8NoMf9VBhfL0++Q4hhRC1FHoq+ iR58j+T0imDfEo855yox5nECnghxHjjhVMktTcjBp6BMekab9sC/evFbJ8uoYouWnLfN5cxmrgA xqyOX7NkK3GFJ8ny5iZtzZ4MoGTUWR95jefY0uTnQN2be2pZXAwlV5Vj6zf+2NbActF X-Google-Smtp-Source: AGHT+IGENnGhBxrC+8M9hZHMO9JRu9R3U8nGFcolu2U0Lqgy068q45LLaSI4e7pHEXjT8WUH7Mdzsw== X-Received: by 2002:a05:6a21:6d86:b0:1e0:ae58:2945 with SMTP id adf61e73a8af0-1e5e081179bmr115955173637.31.1736251738479; Tue, 07 Jan 2025 04:08:58 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:08:57 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Date: Tue, 7 Jan 2025 20:04:11 +0800 Message-ID: <20250107120417.1237392-21-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Create ublk loop target which uses bpf aio to submit & complete FS IO, then run write & read & verify on the ublk loop disk for making sure ublk bpf aio works as expected. Signed-off-by: Ming Lei --- tools/testing/selftests/ublk/Makefile | 3 + .../selftests/ublk/progs/ublk_bpf_kfunc.h | 11 ++ .../testing/selftests/ublk/progs/ublk_loop.c | 166 ++++++++++++++++++ tools/testing/selftests/ublk/test_common.sh | 47 +++++ tools/testing/selftests/ublk/test_loop_01.sh | 33 ++++ tools/testing/selftests/ublk/test_loop_02.sh | 24 +++ tools/testing/selftests/ublk/ublk_bpf.c | 141 ++++++++++++++- 7 files changed, 419 insertions(+), 6 deletions(-) create mode 100644 tools/testing/selftests/ublk/progs/ublk_loop.c create mode 100755 tools/testing/selftests/ublk/test_loop_01.sh create mode 100755 tools/testing/selftests/ublk/test_loop_02.sh diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile index 38903f05d99d..2540ae7a75a3 100644 --- a/tools/testing/selftests/ublk/Makefile +++ b/tools/testing/selftests/ublk/Makefile @@ -24,6 +24,9 @@ TEST_PROGS += test_null_02.sh TEST_PROGS += test_null_03.sh TEST_PROGS += test_null_04.sh +TEST_PROGS += test_loop_01.sh +TEST_PROGS += test_loop_02.sh + # Order correspond to 'make run_tests' order TEST_GEN_PROGS_EXTENDED = ublk_bpf diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h index 1db8870b57d6..9fb134e40d49 100644 --- a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h +++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h @@ -21,6 +21,17 @@ extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym; extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym; extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym; +extern void ublk_bpf_dettach_and_complete_aio(struct bpf_aio *aio) __ksym; +extern int ublk_bpf_attach_and_prep_aio(const struct ublk_bpf_io *_io, unsigned off, unsigned bytes, struct bpf_aio *aio) __ksym; +extern struct ublk_bpf_io *ublk_bpf_acquire_io_from_aio(struct bpf_aio *aio) __ksym; +extern void ublk_bpf_release_io_from_aio(struct ublk_bpf_io *io) __ksym; + +extern struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag flags) __ksym; +extern struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag flags) __ksym; +extern void bpf_aio_release(struct bpf_aio *aio) __ksym; +extern int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, + unsigned bytes, unsigned io_flags) __ksym; + static inline unsigned long long build_io_key(const struct ublk_bpf_io *io) { unsigned long long dev_id = (unsigned short)ublk_bpf_get_dev_id(io); diff --git a/tools/testing/selftests/ublk/progs/ublk_loop.c b/tools/testing/selftests/ublk/progs/ublk_loop.c new file mode 100644 index 000000000000..952caf7b7399 --- /dev/null +++ b/tools/testing/selftests/ublk/progs/ublk_loop.c @@ -0,0 +1,166 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include +#include +#include +#include + +//#define DEBUG +#include "ublk_bpf.h" + +/* libbpf v1.4.5 is required for struct_ops to work */ + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 128); + __type(key, unsigned int); /* dev id */ + __type(value, int); /* backing file fd */ +} fd_map SEC(".maps"); + +static inline void ublk_loop_comp_and_release_aio(struct bpf_aio *aio, int ret) +{ + struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio); + + ublk_bpf_complete_io(io, ret); + ublk_bpf_release_io_from_aio(io); + + ublk_bpf_dettach_and_complete_aio(aio); + bpf_aio_release(aio); +} + +SEC("struct_ops/bpf_aio_complete_cb") +void BPF_PROG(ublk_loop_comp_cb, struct bpf_aio *aio, long ret) +{ + BPF_DBG("aio result %d, back_file %s pos %llx", ret, + aio->iocb.ki_filp->f_path.dentry->d_name.name, + aio->iocb.ki_pos); + ublk_loop_comp_and_release_aio(aio, ret); +} + +SEC(".struct_ops.link") +struct bpf_aio_complete_ops loop_ublk_bpf_aio_ops = { + .id = 16, + .bpf_aio_complete_cb = (void *)ublk_loop_comp_cb, +}; + +static inline int ublk_loop_submit_backing_io(const struct ublk_bpf_io *io, + const struct ublksrv_io_desc *iod, int backing_fd) +{ + unsigned int op_flags = 0; + struct bpf_aio *aio; + int res = -EINVAL; + int op; + + /* translate ublk opcode into backing file's */ + switch (iod->op_flags & 0xff) { + case 0 /*UBLK_IO_OP_READ*/: + op = BPF_AIO_OP_FS_READ; + break; + case 1 /*UBLK_IO_OP_WRITE*/: + op = BPF_AIO_OP_FS_WRITE; + break; + case 2 /*UBLK_IO_OP_FLUSH*/: + op = BPF_AIO_OP_FS_FSYNC; + break; + case 3 /*UBLK_IO_OP_DISCARD*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE; + break; + case 4 /*UBLK_IO_OP_WRITE_SAME*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE; + break; + case 5 /*UBLK_IO_OP_WRITE_ZEROES*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE; + break; + default: + return -EINVAL; + } + + res = -ENOMEM; + aio = bpf_aio_alloc(op, 0); + if (!aio) + goto fail; + + /* attach aio into the specified range of this io command */ + res = ublk_bpf_attach_and_prep_aio(io, 0, iod->nr_sectors << 9, aio); + if (res < 0) { + bpf_printk("bpf aio attaching failed %d\n", res); + goto fail; + } + + /* submit this aio onto the backing file */ + res = bpf_aio_submit(aio, backing_fd, iod->start_sector << 9, + iod->nr_sectors << 9, op_flags); + if (res < 0) { + bpf_printk("aio submit failed %d\n", res); + ublk_loop_comp_and_release_aio(aio, res); + } + return 0; +fail: + return res; +} + +static inline ublk_bpf_return_t __ublk_loop_handle_io_cmd(const struct ublk_bpf_io *io, unsigned int off) +{ + const struct ublksrv_io_desc *iod; + int res = -EINVAL; + int fd_key = ublk_bpf_get_dev_id(io); + int *fd; + ublk_bpf_return_t ret = ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + + iod = ublk_bpf_get_iod(io); + if (!iod) { + ublk_bpf_complete_io(io, res); + return ret; + } + + BPF_DBG("ublk dev %u qid %u: handle io cmd tag %u op %u %lx-%d off %u", + ublk_bpf_get_dev_id(io), + ublk_bpf_get_queue_id(io), + ublk_bpf_get_io_tag(io), + iod->op_flags & 0xff, + iod->start_sector << 9, + iod->nr_sectors << 9, off); + + /* retrieve backing file descriptor */ + fd = bpf_map_lookup_elem(&fd_map, &fd_key); + if (!fd) { + bpf_printk("can't get FD from %d\n", fd_key); + return ret; + } + + /* handle this io command by submitting IOs on backing file */ + res = ublk_loop_submit_backing_io(io, iod, *fd); + +exit: + /* io cmd can't be completes until this reference is dropped */ + if (res < 0) + ublk_bpf_complete_io(io, io->res); + + return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); +} + +SEC("struct_ops/ublk_bpf_release_io_cmd") +void BPF_PROG(ublk_loop_release_io_cmd, struct ublk_bpf_io *io) +{ + BPF_DBG("%s: released io command %d", __func__, io->res); +} + +SEC("struct_ops.s/ublk_bpf_queue_io_cmd_daemon") +ublk_bpf_return_t BPF_PROG(ublk_loop_handle_io_cmd, struct ublk_bpf_io *io, unsigned int off) +{ + return __ublk_loop_handle_io_cmd(io, off); +} + +SEC(".struct_ops.link") +struct ublk_bpf_ops loop_ublk_bpf_ops = { + .id = 16, + .queue_io_cmd_daemon = (void *)ublk_loop_handle_io_cmd, + .release_io_cmd = (void *)ublk_loop_release_io_cmd, +}; + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh index 466b82e77860..4727a6ec9734 100755 --- a/tools/testing/selftests/ublk/test_common.sh +++ b/tools/testing/selftests/ublk/test_common.sh @@ -70,3 +70,50 @@ _add_ublk_dev() { fi udevadm settle } + +_create_backfile() { + local my_size=$1 + local my_file=`mktemp ublk_bpf_${my_size}_XXXXX` + + truncate -s ${my_size} ${my_file} + echo $my_file +} + +_remove_backfile() { + local file=$1 + + [ -f "$file" ] && rm -f $file +} + +_create_tmp_dir() { + local my_file=`mktemp -d ublk_bpf_dir_XXXXX` + + echo $my_file +} + +_remove_tmp_dir() { + local dir=$1 + + [ -d "$dir" ] && rmdir $dir +} + +_mkfs_mount_test() +{ + local dev=$1 + local err_code=0 + local mnt_dir=`_create_tmp_dir` + + mkfs.ext4 -F $dev > /dev/null 2>&1 + err_code=$? + if [ $err_code -ne 0 ]; then + return $err_code + fi + + mount -t ext4 $dev $mnt_dir > /dev/null 2>&1 + umount $dev + err_code=$? + _remove_tmp_dir $mnt_dir + if [ $err_code -ne 0 ]; then + return $err_code + fi +} diff --git a/tools/testing/selftests/ublk/test_loop_01.sh b/tools/testing/selftests/ublk/test_loop_01.sh new file mode 100755 index 000000000000..10c73ec0a01a --- /dev/null +++ b/tools/testing/selftests/ublk/test_loop_01.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +. test_common.sh + +TID="loop_01" +ERR_CODE=0 + +# prepare & register and pin bpf prog +_prep_bpf_test "loop" ublk_loop.bpf.o + +backfile_0=`_create_backfile 256M` + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t loop -n 0 --bpf_prog 16 --bpf_aio_prog 16 --quiet $backfile_0 + +# run fio over the ublk disk +fio --name=write_and_verify \ + --filename=/dev/ublkb0 \ + --ioengine=libaio --iodepth=4 \ + --rw=write \ + --size=256M \ + --direct=1 \ + --verify=crc32c \ + --do_verify=1 \ + --bs=4k > /dev/null 2>&1 +ERR_CODE=$? + +# cleanup & unregister and unpin the bpf prog +_cleanup_bpf_test "loop" + +_remove_backfile $backfile_0 + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/test_loop_02.sh b/tools/testing/selftests/ublk/test_loop_02.sh new file mode 100755 index 000000000000..05c3a863f517 --- /dev/null +++ b/tools/testing/selftests/ublk/test_loop_02.sh @@ -0,0 +1,24 @@ +#!/bin/bash + +. test_common.sh + +TID="loop_02" +ERR_CODE=0 + +# prepare & register and pin bpf prog +_prep_bpf_test "loop" ublk_loop.bpf.o + +backfile_0=`_create_backfile 256M` + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t loop -n 0 --bpf_prog 16 --bpf_aio_prog 16 --quiet $backfile_0 + +_mkfs_mount_test /dev/ublkb0 +ERR_CODE=$? + +# cleanup & unregister and unpin the bpf prog +_cleanup_bpf_test "loop" + +_remove_backfile $backfile_0 + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c index e2c2e92268e1..c24d5e18a1b1 100644 --- a/tools/testing/selftests/ublk/ublk_bpf.c +++ b/tools/testing/selftests/ublk/ublk_bpf.c @@ -64,6 +64,7 @@ struct dev_ctx { int nr_files; char *files[MAX_BACK_FILES]; int bpf_prog_id; + int bpf_aio_prog_id; unsigned int logging:1; unsigned int all:1; }; @@ -107,7 +108,10 @@ struct ublk_tgt { unsigned int cq_depth; const struct ublk_tgt_ops *ops; struct ublk_params params; - char backing_file[1024 - 8 - sizeof(struct ublk_params)]; + + int nr_backing_files; + unsigned long backing_file_size[MAX_BACK_FILES]; + char backing_file[MAX_BACK_FILES][PATH_MAX]; }; struct ublk_queue { @@ -133,12 +137,13 @@ struct ublk_dev { struct ublksrv_ctrl_dev_info dev_info; struct ublk_queue q[UBLK_MAX_QUEUES]; - int fds[2]; /* fds[0] points to /dev/ublkcN */ + int fds[MAX_BACK_FILES + 1]; /* fds[0] points to /dev/ublkcN */ int nr_fds; int ctrl_fd; struct io_uring ring; int bpf_prog_id; + int bpf_aio_prog_id; }; #ifndef offsetof @@ -983,7 +988,7 @@ static int cmd_dev_add(struct dev_ctx *ctx) struct ublk_dev *dev; int dev_id = ctx->dev_id; char ublkb[64]; - int ret; + int ret, i; ops = ublk_find_tgt(tgt_type); if (!ops) { @@ -1022,6 +1027,13 @@ static int cmd_dev_add(struct dev_ctx *ctx) dev->tgt.sq_depth = depth; dev->tgt.cq_depth = depth; dev->bpf_prog_id = ctx->bpf_prog_id; + dev->bpf_aio_prog_id = ctx->bpf_aio_prog_id; + for (i = 0; i < MAX_BACK_FILES; i++) { + if (ctx->files[i]) { + strcpy(dev->tgt.backing_file[i], ctx->files[i]); + dev->tgt.nr_backing_files++; + } + } ret = ublk_ctrl_add_dev(dev); if (ret < 0) { @@ -1271,14 +1283,14 @@ static int cmd_dev_reg_bpf(struct dev_ctx *ctx) static int cmd_dev_help(char *exe) { - printf("%s add -t [null] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [backfile1] [backfile2] ...\n", exe); + printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe); printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n"); printf("%s del [-n dev_id] -a \n", exe); printf("\t -a delete all devices -n delete specified device\n"); printf("%s list [-n dev_id] -a \n", exe); printf("\t -a list all devices, -n list specified device, default -a \n"); - printf("%s reg -t [null] bpf_prog_obj_path \n", exe); - printf("%s unreg -t [null]\n", exe); + printf("%s reg -t [null|loop] bpf_prog_obj_path \n", exe); + printf("%s unreg -t [null|loop]\n", exe); return 0; } @@ -1356,12 +1368,125 @@ static int ublk_null_queue_io(struct ublk_queue *q, int tag) return 0; } +static void backing_file_tgt_deinit(struct ublk_dev *dev) +{ + int i; + + for (i = 1; i < dev->nr_fds; i++) { + fsync(dev->fds[i]); + close(dev->fds[i]); + } +} + +static int backing_file_tgt_init(struct ublk_dev *dev) +{ + int fd, i; + + assert(dev->nr_fds == 1); + + for (i = 0; i < dev->tgt.nr_backing_files; i++) { + char *file = dev->tgt.backing_file[i]; + unsigned long bytes; + struct stat st; + + ublk_dbg(UBLK_DBG_DEV, "%s: file %d: %s\n", __func__, i, file); + + fd = open(file, O_RDWR | O_DIRECT); + if (fd < 0) { + ublk_err("%s: backing file %s can't be opened: %s\n", + __func__, file, strerror(errno)); + return -EBADF; + } + + if (fstat(fd, &st) < 0) { + close(fd); + return -EBADF; + } + + if (S_ISREG(st.st_mode)) + bytes = st.st_size; + else if (S_ISBLK(st.st_mode)) { + if (ioctl(fd, BLKGETSIZE64, &bytes) != 0) + return -1; + } else { + return -EINVAL; + } + + dev->tgt.backing_file_size[i] = bytes; + dev->fds[dev->nr_fds] = fd; + dev->nr_fds += 1; + } + + return 0; +} + +static int loop_bpf_setup_fd(unsigned dev_id, int fd) +{ + int map_fd; + int err; + + map_fd = bpf_obj_get("/sys/fs/bpf/ublk/loop/fd_map"); + if (map_fd < 0) { + ublk_err("Error getting map file descriptor from pinned map\n"); + return -EINVAL; + } + + err = bpf_map_update_elem(map_fd, &dev_id, &fd, BPF_ANY); + if (err) { + ublk_err("Error updating map element: %d\n", errno); + return -EINVAL; + } + + return 0; +} + +static int ublk_loop_tgt_init(struct ublk_dev *dev) +{ + unsigned long long bytes; + int ret; + struct ublk_params p = { + .types = UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_BPF, + .basic = { + .logical_bs_shift = 9, + .physical_bs_shift = 12, + .io_opt_shift = 12, + .io_min_shift = 9, + .max_sectors = dev->dev_info.max_io_buf_bytes >> 9, + }, + .bpf = { + .flags = UBLK_BPF_HAS_OPS_ID | UBLK_BPF_HAS_AIO_OPS_ID, + .ops_id = dev->bpf_prog_id, + .aio_ops_id = dev->bpf_aio_prog_id, + }, + }; + + assert(dev->tgt.nr_backing_files == 1); + ret = backing_file_tgt_init(dev); + if (ret) + return ret; + + assert(loop_bpf_setup_fd(dev->dev_info.dev_id, dev->fds[1]) == 0); + + bytes = dev->tgt.backing_file_size[0]; + dev->tgt.dev_size = bytes; + p.basic.dev_sectors = bytes >> 9; + dev->tgt.params = p; + + return 0; +} + + static const struct ublk_tgt_ops tgt_ops_list[] = { { .name = "null", .init_tgt = ublk_null_tgt_init, .queue_io = ublk_null_queue_io, }, + { + .name = "loop", + .init_tgt = ublk_loop_tgt_init, + .deinit_tgt = backing_file_tgt_deinit, + }, }; static const struct ublk_tgt_ops *ublk_find_tgt(const char *name) @@ -1389,6 +1514,7 @@ int main(int argc, char *argv[]) { "debug_mask", 1, NULL, 0 }, { "quiet", 0, NULL, 0 }, { "bpf_prog", 1, NULL, 0 }, + { "bpf_aio_prog", 1, NULL, 0 }, { 0, 0, 0, 0 } }; int option_idx, opt; @@ -1398,6 +1524,7 @@ int main(int argc, char *argv[]) .nr_hw_queues = 2, .dev_id = -1, .bpf_prog_id = -1, + .bpf_aio_prog_id = -1, }; int ret = -EINVAL, i; @@ -1433,6 +1560,8 @@ int main(int argc, char *argv[]) ctx.bpf_prog_id = strtol(optarg, NULL, 10); ctx.flags |= UBLK_F_BPF; } + if (!strcmp(longopts[option_idx].name, "bpf_aio_prog")) + ctx.bpf_aio_prog_id = strtol(optarg, NULL, 10); break; } } From patchwork Tue Jan 7 12:04:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928809 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 341BE1EE7C6; Tue, 7 Jan 2025 12:09:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251796; cv=none; b=Wfo03m9hqv64xAcy1dI4dg+2Jmq/DXd0KRN8R8jvgM8GAKRHQCuZP0PCt4LCtfptYBRIHWI6LOleUkEtBRHJHdP5hDx/PP2gUrYS89dUfO4azYwPxwCw0ssnH0TqT5TGGpKlvt7YvtAXbqloHe3JOKjzZk73ErAQs6V8UfYkMf4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251796; c=relaxed/simple; bh=0z1zGJsu3WN74tcxO3hAJKjwCJ3Cp5632Mlb8ahdQXQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZraiM++qQ9vxUtms0tCwO5ODeuG4UeR7d2INnZpzrb7cLA3c64pXVKZ3sPTkiiz1KFr/6owKS578tSs1Wpzfaqf2hNHttWdJhWemydQScfZy0UMudGX5Sj9OeJ5Q2gHh1RYA/Au6RbJReqzzv3jxO7Cre58zGHWXZbKxXnNNgkU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AuijT5Qc; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AuijT5Qc" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2166651f752so29428145ad.3; Tue, 07 Jan 2025 04:09:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251742; x=1736856542; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9huKsVEbf1v8YAcptwSkkhP1blRdS46Sxfnig2SU0AQ=; b=AuijT5Qc9TRDsZ+w4WQcQgDUMUX8paPBGVV1VKI7RX+7DQAwghytFnVEhbHuHdavT2 vbVFjkXlDyuD5UmW52kwomacyH19oThrmzS5MlWtJnZdxmUmYz8gql5hIDHJgf4SSWDY B4csB7u6TsGZbDYMicU8E557cTMvcQrlsuuigfYraqUf0+L8TFAjvVe3pVnUsGjBtIMv PmfsPNZm6z3dHI+Glmfq96Wtg+VmgDEQMBur+29k/sxx9jXQvSmLgwV9jlFufAP3MbBs vhbDvsp0qmRI+/JxmtmX1QUK7g+lrkhytBjNaM3it90+8N3V+5T3KrvY6X0F8dybki1l BnZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251742; x=1736856542; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9huKsVEbf1v8YAcptwSkkhP1blRdS46Sxfnig2SU0AQ=; b=eKmdDKKz/GLe/vlqkFw0kti75X613oQlyATxK91QMVr6cIFku+H9B75xNZS8uPKPAA 4GALun9posrkYM/uRFsf4j6L3Ucgbx2UwucYAwvZhq6EecV/PlSR7zYoEbHT2SdfshS8 46pDMc/e8LafQcx1lICb8oFK8hSDK3hDuHJHaV7zWYbWCIDyjgplx72g3dQ3k/3ATzpF 3FomXfMf6jpzL2ZH7+OVNNn2vr48VI74wPCnfufqTYucwxVyGRUBIBH/1Cv25c0TuUix 8pfNJQHZ1b4hOzPbVZktxeeziIwtccv53QZvRrHGsygCRQCtglvLOIgEcFN8V+VatIka kkiw== X-Forwarded-Encrypted: i=1; AJvYcCV6NSxZhpmyWmeI9my/+HcRjZry4xb/4Oc5wOsk83sCrXlOsdOcxV74KsoFN2lF/DwYvJSsrkHDtj7HEg==@vger.kernel.org X-Gm-Message-State: AOJu0YydPffWgZwVUle7yczKn0vKehrNancOwDxc94AnJfyFBoTiccH/ kog9QATXaY1DoBQWt/rY9n+JK2eFve4QX9W0PTMG5WJmBBr6A2E6 X-Gm-Gg: ASbGnctO6Af/xYRJsk6sb7ANSxOwJD8672dZDZT7LM74Ca8v7UX4zmqY1Q/G8T0V6gn 7tWaOBujUG7hyAiKywx8CqeT4fugbhrw8fG9aFsRBJwkpviQk/xElJYDRin5OnEo13X/XEmYhKP QaWZsSmtQeF1k0yoddyLa0XciARGvRveRtgiXxsUs3RsOpAzUwYWcsI+Egi0hnTAwFkzFiYB6km bC2as/rtknp0E5t7nVhD8t4kGZw8OU8wq/gU/Jo7GOgQNkCm0i+YwxMTf3vY/ZksLdw X-Google-Smtp-Source: AGHT+IGvaP49H23xLy+bsto3vxdfgvz+Gvvf70cc/Eo51fyAsqd6bFhIe7C0yzv+9ObH3BhuUjMIsw== X-Received: by 2002:a05:6a20:6a20:b0:1e0:d99f:7ad3 with SMTP id adf61e73a8af0-1e5e084b3f5mr95001497637.44.1736251741628; Tue, 07 Jan 2025 04:09:01 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.08.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:09:01 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Date: Tue, 7 Jan 2025 20:04:12 +0800 Message-ID: <20250107120417.1237392-22-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add ublk-stripe for covering both bpf aio and io split features. Signed-off-by: Ming Lei --- tools/testing/selftests/ublk/Makefile | 3 + .../selftests/ublk/progs/ublk_stripe.c | 319 ++++++++++++++++++ .../testing/selftests/ublk/test_stripe_01.sh | 35 ++ .../testing/selftests/ublk/test_stripe_02.sh | 26 ++ tools/testing/selftests/ublk/ublk_bpf.c | 88 ++++- 5 files changed, 468 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/ublk/progs/ublk_stripe.c create mode 100755 tools/testing/selftests/ublk/test_stripe_01.sh create mode 100755 tools/testing/selftests/ublk/test_stripe_02.sh diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile index 2540ae7a75a3..7c30c5728694 100644 --- a/tools/testing/selftests/ublk/Makefile +++ b/tools/testing/selftests/ublk/Makefile @@ -27,6 +27,9 @@ TEST_PROGS += test_null_04.sh TEST_PROGS += test_loop_01.sh TEST_PROGS += test_loop_02.sh +TEST_PROGS += test_stripe_01.sh +TEST_PROGS += test_stripe_02.sh + # Order correspond to 'make run_tests' order TEST_GEN_PROGS_EXTENDED = ublk_bpf diff --git a/tools/testing/selftests/ublk/progs/ublk_stripe.c b/tools/testing/selftests/ublk/progs/ublk_stripe.c new file mode 100644 index 000000000000..98a59239047c --- /dev/null +++ b/tools/testing/selftests/ublk/progs/ublk_stripe.c @@ -0,0 +1,319 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" +#include +#include +#include +#include +#include + +//#define DEBUG +#include "ublk_bpf.h" + +/* libbpf v1.4.5 is required for struct_ops to work */ + +struct ublk_stripe { +#define MAX_BACKFILES 4 + unsigned char chunk_shift; + unsigned char nr_backfiles; + int fds[MAX_BACKFILES]; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 128); + __type(key, unsigned int); /* dev id */ + __type(value, struct ublk_stripe); /* stripe setting */ +} stripe_map SEC(".maps"); + +/* todo: make it writable payload of ublk_bpf_io */ +struct ublk_io_payload { + unsigned int ref; + int res; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, unsigned long long); /* dev_id + q_id + tag */ + __type(value, struct ublk_io_payload); /* io payload */ +} io_map SEC(".maps"); + +static inline void dec_stripe_io_ref(const struct ublk_bpf_io *io, struct ublk_io_payload *pv, int ret) +{ + if (!pv) + return; + + if (pv->res >= 0) + pv->res = ret; + + if (!__sync_sub_and_fetch(&pv->ref, 1)) { + unsigned rw = (io->iod->op_flags & 0xff); + + if (pv->res >= 0 && (rw <= 1)) + pv->res = io->iod->nr_sectors << 9; + ublk_bpf_complete_io(io, pv->res); + } +} + +static inline void ublk_stripe_comp_and_release_aio(struct bpf_aio *aio, int ret) +{ + struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio); + struct ublk_io_payload *pv = NULL; + unsigned long long io_key = build_io_key(io); + + if (!io) + return; + + io_key = build_io_key(io); + pv = bpf_map_lookup_elem(&io_map, &io_key); + + /* drop reference for each underlying aio */ + dec_stripe_io_ref(io, pv, ret); + ublk_bpf_release_io_from_aio(io); + + ublk_bpf_dettach_and_complete_aio(aio); + bpf_aio_release(aio); +} + +SEC("struct_ops/bpf_aio_complete_cb") +void BPF_PROG(ublk_stripe_comp_cb, struct bpf_aio *aio, long ret) +{ + BPF_DBG("aio result %d, back_file %s pos %llx", ret, + aio->iocb.ki_filp->f_path.dentry->d_name.name, + aio->iocb.ki_pos); + ublk_stripe_comp_and_release_aio(aio, ret); +} + +SEC(".struct_ops.link") +struct bpf_aio_complete_ops stripe_ublk_bpf_aio_ops = { + .id = 32, + .bpf_aio_complete_cb = (void *)ublk_stripe_comp_cb, +}; + +static inline int ublk_stripe_submit_backing_io(const struct ublk_bpf_io *io, + int backfile_fd, unsigned long backfile_off, + unsigned int backfile_bytes, + unsigned int buf_off) +{ + const struct ublksrv_io_desc *iod = io->iod; + unsigned int op_flags = 0; + struct bpf_aio *aio; + int res = -EINVAL; + int op; + + /* translate ublk opcode into backing file's */ + switch (iod->op_flags & 0xff) { + case 0 /*UBLK_IO_OP_READ*/: + op = BPF_AIO_OP_FS_READ; + break; + case 1 /*UBLK_IO_OP_WRITE*/: + op = BPF_AIO_OP_FS_WRITE; + break; + case 2 /*UBLK_IO_OP_FLUSH*/: + op = BPF_AIO_OP_FS_FSYNC; + break; + case 3 /*UBLK_IO_OP_DISCARD*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE; + break; + case 4 /*UBLK_IO_OP_WRITE_SAME*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE; + break; + case 5 /*UBLK_IO_OP_WRITE_ZEROES*/: + op = BPF_AIO_OP_FS_FALLOCATE; + op_flags = FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE; + break; + default: + return -EINVAL; + } + + res = -ENOMEM; + aio = bpf_aio_alloc(op, 0); + if (!aio) + goto fail; + + /* attach aio into the specified range of this io command */ + res = ublk_bpf_attach_and_prep_aio(io, buf_off, backfile_bytes, aio); + if (res < 0) { + bpf_printk("bpf aio attaching failed %d\n", res); + goto fail; + } + + /* submit this aio onto the backing file */ + res = bpf_aio_submit(aio, backfile_fd, backfile_off, backfile_bytes, op_flags); + if (res < 0) { + bpf_printk("aio submit failed %d\n", res); + ublk_stripe_comp_and_release_aio(aio, res); + } + return 0; +fail: + return res; +} + +static int calculate_backfile_off_bytes(const struct ublk_stripe *stripe, + unsigned long stripe_off, unsigned int stripe_bytes, + unsigned long *backfile_off, + unsigned int *backfile_bytes) +{ + unsigned long chunk_size = 1U << stripe->chunk_shift; + unsigned int nr_bf = stripe->nr_backfiles; + unsigned long unit_chunk_size = nr_bf << stripe->chunk_shift; + unsigned long start_off = stripe_off & ~(chunk_size - 1); + unsigned long unit_start_off = stripe_off & ~(unit_chunk_size - 1); + unsigned int idx = (start_off - unit_start_off) >> stripe->chunk_shift; + + *backfile_bytes = stripe_bytes; + *backfile_off = (unit_start_off / nr_bf) + (idx << stripe->chunk_shift) + (stripe_off - start_off); + + return stripe->fds[idx % MAX_BACKFILES]; +} + +static unsigned int calculate_stripe_off_bytes(const struct ublk_stripe *stripe, + const struct ublksrv_io_desc *iod, unsigned int this_off, + unsigned long *stripe_off) +{ + unsigned long off, next_off; + unsigned int chunk_size = 1U << stripe->chunk_shift; + unsigned int max_size = (iod->nr_sectors << 9) - this_off; + + off = (iod->start_sector << 9) + this_off; + next_off = (off & ~(chunk_size - 1)) + chunk_size;; + + *stripe_off = off; + + if (max_size < next_off - off) + return max_size; + return next_off - off; +} + +static inline ublk_bpf_return_t __ublk_stripe_handle_io_cmd(const struct ublk_bpf_io *io, unsigned int off) +{ + ublk_bpf_return_t ret = ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0); + unsigned long stripe_off, backfile_off; + unsigned int stripe_bytes, backfile_bytes; + int dev_id = ublk_bpf_get_dev_id(io); + const struct ublksrv_io_desc *iod; + const struct ublk_stripe *stripe; + int res = -EINVAL; + int backfile_fd; + unsigned long long io_key = build_io_key(io); + struct ublk_io_payload pl = { + .ref = 2, + .res = 0, + }; + struct ublk_io_payload *pv = NULL; + + iod = ublk_bpf_get_iod(io); + if (!iod) { + ublk_bpf_complete_io(io, res); + return ret; + } + + BPF_DBG("ublk dev %u qid %u: handle io cmd tag %u op %u %lx-%d off %u", + ublk_bpf_get_dev_id(io), + ublk_bpf_get_queue_id(io), + ublk_bpf_get_io_tag(io), + iod->op_flags & 0xff, + iod->start_sector << 9, + iod->nr_sectors << 9, off); + + /* retrieve backing file descriptor */ + stripe = bpf_map_lookup_elem(&stripe_map, &dev_id); + if (!stripe) { + bpf_printk("can't get FD from %d\n", dev_id); + return ret; + } + + /* todo: build as big chunk as possible for each underlying files/disks */ + stripe_bytes = calculate_stripe_off_bytes(stripe, iod, off, &stripe_off); + backfile_fd = calculate_backfile_off_bytes(stripe, stripe_off, stripe_bytes, + &backfile_off, &backfile_bytes); + BPF_DBG("\t stripe(%lx %lu) backfile(%d %lx %lu)", + stripe->chunk_shift, stripe->nr_backfiles, + stripe_off, stripe_bytes, + backfile_fd, backfile_off, backfile_bytes); + + if (!stripe_bytes) { + bpf_printk("submit bpf aio failed %d\n", res); + res = -EINVAL; + goto exit; + } + + /* grab one submission reference, and one extra for the whole batch */ + if (!off) { + res = bpf_map_update_elem(&io_map, &io_key, &pl, BPF_ANY); + if (res) { + bpf_printk("update io map element failed %d key %llx\n", res, io_key); + goto exit; + } + } else { + pv = bpf_map_lookup_elem(&io_map, &io_key); + if (pv) + __sync_fetch_and_add(&pv->ref, 1); + } + + /* handle this io command by submitting IOs on backing file */ + res = ublk_stripe_submit_backing_io(io, backfile_fd, backfile_off, backfile_bytes, off); + +exit: + /* io cmd can't be completes until this reference is dropped */ + if (res < 0) { + bpf_printk("submit bpf aio failed %d\n", res); + ublk_bpf_complete_io(io, res); + return ret; + } + + /* drop the extra reference for the whole batch */ + if (off + stripe_bytes == iod->nr_sectors << 9) { + if (!pv) + pv = bpf_map_lookup_elem(&io_map, &io_key); + dec_stripe_io_ref(io, pv, pv ? pv->res : 0); + } + + return ublk_bpf_return_val(UBLK_BPF_IO_CONTINUE, stripe_bytes); +} + +SEC("struct_ops/ublk_bpf_release_io_cmd") +void BPF_PROG(ublk_stripe_release_io_cmd, struct ublk_bpf_io *io) +{ + BPF_DBG("%s: complete io command %d", __func__, io->res); +} + +SEC("struct_ops.s/ublk_bpf_queue_io_cmd_daemon") +ublk_bpf_return_t BPF_PROG(ublk_stripe_handle_io_cmd, struct ublk_bpf_io *io, unsigned int off) +{ + return __ublk_stripe_handle_io_cmd(io, off); +} + +SEC("struct_ops/ublk_bpf_attach_dev") +int BPF_PROG(ublk_stripe_attach_dev, int dev_id) +{ + const struct ublk_stripe *stripe; + + /* retrieve backing file descriptor */ + stripe = bpf_map_lookup_elem(&stripe_map, &dev_id); + if (!stripe) { + bpf_printk("can't get FD from %d\n", dev_id); + return -EINVAL; + } + + if (stripe->nr_backfiles >= MAX_BACKFILES) + return -EINVAL; + + if (stripe->chunk_shift < 12) + return -EINVAL; + + return 0; +} + +SEC(".struct_ops.link") +struct ublk_bpf_ops stripe_ublk_bpf_ops = { + .id = 32, + .attach_dev = (void *)ublk_stripe_attach_dev, + .queue_io_cmd_daemon = (void *)ublk_stripe_handle_io_cmd, + .release_io_cmd = (void *)ublk_stripe_release_io_cmd, +}; + +char LICENSE[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/ublk/test_stripe_01.sh b/tools/testing/selftests/ublk/test_stripe_01.sh new file mode 100755 index 000000000000..3c21f7db495a --- /dev/null +++ b/tools/testing/selftests/ublk/test_stripe_01.sh @@ -0,0 +1,35 @@ +#!/bin/bash + +. test_common.sh + +TID="stripe_01" +ERR_CODE=0 + +# prepare & register and pin bpf prog +_prep_bpf_test "stripe" ublk_stripe.bpf.o + +backfile_0=`_create_backfile 256M` +backfile_1=`_create_backfile 256M` + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t stripe -n 0 --bpf_prog 32 --bpf_aio_prog 32 --quiet $backfile_0 $backfile_1 + +# run fio over the ublk disk +fio --name=write_and_verify \ + --filename=/dev/ublkb0 \ + --ioengine=libaio --iodepth=4 \ + --rw=write \ + --size=256M \ + --direct=1 \ + --verify=crc32c \ + --do_verify=1 \ + --bs=4k > /dev/null 2>&1 +ERR_CODE=$? + +# cleanup & unregister and unpin the bpf prog +_cleanup_bpf_test "stripe" + +_remove_backfile $backfile_0 +_remove_backfile $backfile_1 + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/test_stripe_02.sh b/tools/testing/selftests/ublk/test_stripe_02.sh new file mode 100755 index 000000000000..fdbb81dc53d8 --- /dev/null +++ b/tools/testing/selftests/ublk/test_stripe_02.sh @@ -0,0 +1,26 @@ +#!/bin/bash + +. test_common.sh + +TID="stripe_02" +ERR_CODE=0 + +# prepare & register and pin bpf prog +_prep_bpf_test "stripe" ublk_stripe.bpf.o + +backfile_0=`_create_backfile 256M` +backfile_1=`_create_backfile 256M` + +# add two ublk null disks with the pinned bpf prog +_add_ublk_dev -t stripe -n 0 --bpf_prog 32 --bpf_aio_prog 32 --quiet $backfile_0 $backfile_1 + +_mkfs_mount_test /dev/ublkb0 +ERR_CODE=$? + +# cleanup & unregister and unpin the bpf prog +_cleanup_bpf_test "stripe" + +_remove_backfile $backfile_0 +_remove_backfile $backfile_1 + +_show_result $TID $ERR_CODE diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c index c24d5e18a1b1..85b2b4a09e05 100644 --- a/tools/testing/selftests/ublk/ublk_bpf.c +++ b/tools/testing/selftests/ublk/ublk_bpf.c @@ -1283,14 +1283,14 @@ static int cmd_dev_reg_bpf(struct dev_ctx *ctx) static int cmd_dev_help(char *exe) { - printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe); + printf("%s add -t [null|loop|stripe] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe); printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n"); printf("%s del [-n dev_id] -a \n", exe); printf("\t -a delete all devices -n delete specified device\n"); printf("%s list [-n dev_id] -a \n", exe); printf("\t -a list all devices, -n list specified device, default -a \n"); - printf("%s reg -t [null|loop] bpf_prog_obj_path \n", exe); - printf("%s unreg -t [null|loop]\n", exe); + printf("%s reg -t [null|loop|stripe] bpf_prog_obj_path \n", exe); + printf("%s unreg -t [null|loop|stripe]\n", exe); return 0; } @@ -1475,6 +1475,83 @@ static int ublk_loop_tgt_init(struct ublk_dev *dev) return 0; } +struct ublk_stripe_params { + unsigned char chunk_shift; + unsigned char nr_backfiles; + int fds[MAX_BACK_FILES]; +}; + +static int stripe_bpf_setup_parameters(struct ublk_dev *dev, unsigned int chunk_shift) +{ + int dev_id = dev->dev_info.dev_id; + struct ublk_stripe_params stripe = { + .chunk_shift = chunk_shift, + .nr_backfiles = dev->nr_fds - 1, + }; + int map_fd; + int err, i; + + for (i = 0; i < stripe.nr_backfiles; i++) + stripe.fds[i] = dev->fds[i + 1]; + + map_fd = bpf_obj_get("/sys/fs/bpf/ublk/stripe/stripe_map"); + if (map_fd < 0) { + ublk_err("Error getting map file descriptor\n"); + return -EINVAL; + } + + err = bpf_map_update_elem(map_fd, &dev_id, &stripe, BPF_ANY); + if (err) { + ublk_err("Error updating map element: %d\n", errno); + return -EINVAL; + } + + return 0; +} + +static int ublk_stripe_tgt_init(struct ublk_dev *dev) +{ + unsigned long long bytes = 0; + unsigned chunk_shift = 12; + int ret, i; + struct ublk_params p = { + .types = UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_BPF, + .basic = { + .logical_bs_shift = 9, + .physical_bs_shift = 12, + .io_opt_shift = 12, + .io_min_shift = 9, + .max_sectors = dev->dev_info.max_io_buf_bytes >> 9, + }, + .bpf = { + .flags = UBLK_BPF_HAS_OPS_ID | UBLK_BPF_HAS_AIO_OPS_ID, + .ops_id = dev->bpf_prog_id, + .aio_ops_id = dev->bpf_aio_prog_id, + }, + }; + + ret = backing_file_tgt_init(dev); + if (ret) + return ret; + + assert(stripe_bpf_setup_parameters(dev, chunk_shift) == 0); + + for (i = 0; i < dev->nr_fds - 1; i++) { + unsigned long size = dev->tgt.backing_file_size[i]; + + if (size != dev->tgt.backing_file_size[0]) + return -EINVAL; + if (size & ((1 << chunk_shift) - 1)) + return -EINVAL; + bytes += size; + } + + dev->tgt.dev_size = bytes; + p.basic.dev_sectors = bytes >> 9; + dev->tgt.params = p; + + return 0; +} static const struct ublk_tgt_ops tgt_ops_list[] = { { @@ -1487,6 +1564,11 @@ static const struct ublk_tgt_ops tgt_ops_list[] = { .init_tgt = ublk_loop_tgt_init, .deinit_tgt = backing_file_tgt_deinit, }, + { + .name = "stripe", + .init_tgt = ublk_stripe_tgt_init, + .deinit_tgt = backing_file_tgt_deinit, + }, }; static const struct ublk_tgt_ops *ublk_find_tgt(const char *name) From patchwork Tue Jan 7 12:04:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 13928806 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 357CB1EF092; Tue, 7 Jan 2025 12:09:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251777; cv=none; b=DfRxyts3jz8/BExYHt+5fwuSy47TixPOgUFo+2V48yoqj5N/mSrVJ/f7VnVEH5NTbpNKEFP+DQ8kJRzfJzzKjqzwYUCiD+gIW17BTzcCR9xmJvWJGhC6hFA4Fg5Tp2ks2MttvzThHGUb40SAK/0RERZTuvrLpXl8HFDVlUObzq0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736251777; c=relaxed/simple; bh=BcJ1+ErwhpYwGh/rfChvp0JPwe2mgonl6PVzz/hWq0k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Di3Oe86HEMvvuM4vtf3afH//V7VljkwR5QmjSgMqbGSbxSfilylYCY0QvEjTvNL2ug4qcMrmNUE2iTXXC6lYWF90hazsaVwPz+/yU2bypmkHEw6y7EVG9Zus8yx/u8Z/ToR+YcCy/LDMwfLJDsJqJ0vgNPDgZVAx8KNEI8h3U+0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jGZKwhbh; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jGZKwhbh" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-216401de828so212315825ad.3; Tue, 07 Jan 2025 04:09:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736251745; x=1736856545; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vJ+VSImmiXFzQePcnrhQ+GQSKc7zTHd3WIb9Tq+VcQU=; b=jGZKwhbhZe5dmKulvH3lvUFIl28F10h3VsucOdkdMB6K6Th2k7GqNkXHZdAY2AoEg2 8z1TdOiK9QIJDPqeFAFUGjFFkPR84Wia8TwqYxf7oun0UKeM0tPtY/oqYl0YkBtiS8zY /rqNA2t+aEOb1xknTMG9+80OZi2nabn/jnDmCrN3OrP9evGDyOTYtiqNuw+j35N1uhhq CVEHjgDslhRSZBPkhD3+hy+1zsSP4CBZyagZBAeKTLOjNGU2keR2tPBmizHoHzOpEiRF 2QXZxnfx0J5ImI3tnU79GoqynpT/CTMoMZUiMg4INnEX/OupKSy3mVbCPh10TJPjdQCk rbyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736251745; x=1736856545; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vJ+VSImmiXFzQePcnrhQ+GQSKc7zTHd3WIb9Tq+VcQU=; b=rFrWTf15Ah+KzuYZBK7qrQbLMZ7URq9noP64wjuLjJIpmNqOhuhHrToRQVtZ7meeEC CQ9y0sBnw5CYfJP9/78+c+ez5tMR0xKJVuBjymBk4crETez7AKWzZYqTzI93TNYa3Ndd DoeaRvYs7z0whFsUg7ky2MtksKXXUuzfBELdbhvHUe9zTK3rTLp2ZltTjKUI2h5Zuya7 N2BvGquVZ1WWQ5YoU9NNvP2mXPGlNpLPvct5GXFIuOwYZ86o57DXUPdw0OXVabSkrydx Ar/UzxCeWNvug7ZjZ3w/trf8lLXd6elxTxAp7ctBwkb/RBaUcEeMI4MOFb/BOKbrcYJ+ G4pw== X-Forwarded-Encrypted: i=1; AJvYcCUDIN9eA5MgEdw5NJBCzj24LclnXY/K+VlXcgHdz5PT7yYPw36xbSuzfvIzlDPJQBCDYH/USiYjnpMB8A==@vger.kernel.org X-Gm-Message-State: AOJu0YyvWiRDu9ulhd0vjBfLfYmHCDW+FDedtoa2tNzi1M7xufOMBXgY LLnAiqOWjzfnpHcwiRIhzAqV4BMpy39wXVfSzMy2NxV+hJg7Y9gEzU9q3R4UunM= X-Gm-Gg: ASbGncvpuirh/z3o2vhpOj3oW/q85eaIp1X25cDKY1sIyfvmDAiav/U7/0gJhXXPV7d NZ1ZKb4zZBby3vva0XfEK0Zk5fyNJMUJpXwlGRg31YoTqHDnnhZJP+QC83ARP36ldz8ekeHlk3J fmOesr4s2XBLc06leybBtyrzndM1XMELbv5FRjcywWJ4V+cKrcCxEFdKCjbn6TRoLJpwLFmM4nS ueNK+nDrUxSGyWfC4NzbO+zVXpfbOXKWVQOvPQsB17BgXj6CQTUBOSNuV98kvb+oEXl X-Google-Smtp-Source: AGHT+IEMyh0qn+48/wwBvNg6K0uoHVG+djpoPH4ToP2Ro3iN6Tdbo264ul/fzW6g4ImCKpyQI8hUMA== X-Received: by 2002:a05:6a20:8412:b0:1e2:2e4:6b2a with SMTP id adf61e73a8af0-1e5e044af7bmr93846638637.5.1736251744825; Tue, 07 Jan 2025 04:09:04 -0800 (PST) Received: from fedora.redhat.com ([2001:250:3c1e:503:ffff:ffff:ffea:4903]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-72aad835b8dsm34245118b3a.63.2025.01.07.04.09.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jan 2025 04:09:04 -0800 (PST) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org Cc: bpf@vger.kernel.org, Alexei Starovoitov , Martin KaFai Lau , Yonghong Song , Ming Lei Subject: [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio Date: Tue, 7 Jan 2025 20:04:13 +0800 Message-ID: <20250107120417.1237392-23-tom.leiming@gmail.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com> References: <20250107120417.1237392-1-tom.leiming@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Document ublk-bpf motivation and implementation. Document bpf-aio implementation. Document ublk-bpf selftests. Signed-off-by: Ming Lei --- Documentation/block/ublk.rst | 170 +++++++++++++++++++++++++++++++++++ 1 file changed, 170 insertions(+) diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 51665a3e6a50..bf7a3df48036 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -309,6 +309,176 @@ with specified IO tag in the command data: ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages. + +UBLK-BPF support +================ + +Motivation +---------- + +- support stacking ublk + + There are many 3rd party volume manager, ublk may be built over ublk device + for simplifying implementation, however, multiple userspace-kernel context + switchs for handling one single IO can't be accepted from performance view + of point + + ublk-bpf can avoid user-kernel context switch in most fast io path, so ublk + over ublk becomes possible + +- complicated virtual block device + + Many complicated virtual block devices have admin&meta code path and normal + IO fast path; meta & admin IO handling is usually complicated, so it can be + moved to ublk server for relieving development burden; meantime IO fast path + can be kept in kernel space for the sake of high performance. + + Bpf provides rich maps, which helps a lot for communication between + userspace and prog or between prog and prog. + + One typical example is qcow2, which meta IO handling can be kept in + ublk server, and fast IO path is moved to bpf prog. Efficient bpf map can be + looked up first and see if this virtual LBA & host LBA mapping is hit in + the map. If yes, handle the IO with ublk-bpf directly, otherwise forward to + ublk server to populate the mapping first. + +- some simple high performance virtual devices + + Such as null & loop, the whole implementation can be moved to bpf prog + completely. + +- provides chance to get similar performance with kernel driver + + One round of kernel/user context switch is avoided, and one extra IO data + copy is saved + +bpf aio +------- + +bpf aio exports kfuncs for bpf prog to submit & complete IO in async way. +IO completion handler is provided by the bpf aio user, which is still +defined in bpf prog(such as ublk bpf prog) as `struct bpf_aio_complete_ops` +of bpf struct_ops. + +bpf aio is designed as generic interface, which can be used for any bpf prog +in theory, and it may be move to `/lib/` in future if the interface becomes +mature and stable enough. + +- bpf_aio_alloc() + + Allocate one bpf aio instance of `struct bpf_aio` + +- bpf_aio_release() + + Free one bpf aio instance of `struct bpf_aio` + +- bpf_aio_submit() + + Submit one bpf aio instance of `struct bpf_aio` in async way. + +- `struct bpf_aio_complete_ops` + + Define bpf aio completion callback implemented as bpf struct_ops, and + it is called when the submitted bpf aio is completed. + + +ublk bpf implementation +----------------------- + +Export `struct ublk_bpf_ops` as bpf struct_ops, so that ublk IO command +can be queued or handled in the callback defined in the ublk bpf struct_ops, +see the whole logic in `ublk_run_bpf_handler`: + +- `UBLK_BPF_IO_QUEUED` + + If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_QUEUED`, + this IO command has been queued by bpf prog, so it won't be forwarded to + ublk server + +- `UBLK_BPF_IO_REDIRECT` + + If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_REDIRECT`, + this IO command will be forwarded to ublk server + +- `UBLK_BPF_IO_CONTINUE` + + If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_CONTINUE`, + part of this io command is queued, and `ublk_bpf_return_t` carries how many + bytes queued, so ublk driver will continue to call the callback to queue + remained bytes of this io command further, this way is helpful for + implementing stacking devices by allowing IO command split. + +ublk bpf provides kfuncs for ublk bpf prog to queue and handle ublk IO command: + +- ublk_bpf_complete_io() + + Complete this ublk IO command + +- ublk_bpf_get_io_tag() + + Get tag of this ublk IO command + +- ublk_bpf_get_queue_id() + + Get queue id of this ublk IO command + +- ublk_bpf_get_dev_id() + + Get device id of this ublk IO command + +- ublk_bpf_attach_and_prep_aio() + + Attach & prepare bpf aio to this ublk IO command, bpf aio buffer is + prepared, and aio's complete callback is setup, so the user prog can + get notified when the bpf aio is completed + +- ublk_bpf_dettach_and_complete_aio() + + Detach bpf aio from this IO command, and it is usually called from bpf + aio's completion callback. + +- ublk_bpf_acquire_io_from_aio() + + Acquire ublk IO command from the aio, one typical use is for calling + ublk_bpf_complete_io() to complete ublk IO command + +- ublk_bpf_release_io_from_aio() + + Release ublk IO command which is acquired from `ublk_bpf_acquire_io_from_aio` + + +Test +---- + +- Build kernel & install kernel headers & reboot & test + + enable CONFIG_BLK_DEV_UBLK & CONFIG_UBLK_BPF + + make + + make headers_install INSTALL_HDR_PATH=/usr + + reboot + + make -C tools/testing/selftests TARGETS=ublk run_test + +ublk selftests implements null, loop and stripe targets for covering all +bpf features: + +- complete bpf IO handling + +- complete ublk server IO handling + +- mixed bpf prog and ublk server IO handling + +- bpf aio for loop & stripe + +- IO split via `UBLK_BPF_IO_CONTINUE` for implementing ublk-stripe + +Write & read verify, and mkfs.ext4 & mount & umount are run in the +selftest. + + Future development ==================