From patchwork Thu Jul 12 15:08:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10521795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7156A603D7 for ; Thu, 12 Jul 2018 15:09:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5969F29A10 for ; Thu, 12 Jul 2018 15:09:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C02229A1C; Thu, 12 Jul 2018 15:09:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4BF7D29A10 for ; Thu, 12 Jul 2018 15:09:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732572AbeGLPS6 (ORCPT ); Thu, 12 Jul 2018 11:18:58 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:33026 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732373AbeGLPS6 (ORCPT ); Thu, 12 Jul 2018 11:18:58 -0400 Received: by mail-io0-f196.google.com with SMTP id z20-v6so28532386iol.0 for ; Thu, 12 Jul 2018 08:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=XoGdb+9VDyfpOg7C8Wg8vLqF8ipKXhaO4OiAgWa7qE8=; b=1a8n0YPDVLRRYasM7JTGDcHza3yZJLemBOiA9t3DjYk9m7xHuXO5ZHSi9+kAyQg7Ge vkLGFWz7rdmRd5VLWz2iRQ3u6+t8kO266XbGHj07uEKKNXvRhoc67G6t3DIcc3r3fQny jSZcf4kKk9nVanK0hKETUo3MT/FVZLIp+c0tYwPxsxfSKP/hjdvOAI4fwn2xXpN378KK sj8ppME/8aIsQuTD17SW8jyGGkQ+T5maPAyvEEO7EEAGkSrrB6reuTnryIadBGno8Jcf orehoImSsy+PBTLdeE+ZHiYsI37U2zcx2jxoigOvqisWs/Pdv7l9EFEyGdm1VN/bPHrl w4WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=XoGdb+9VDyfpOg7C8Wg8vLqF8ipKXhaO4OiAgWa7qE8=; b=YJPKrjtnux6YbmuD7mCqNnanjAFGnxWbOXFXWG19CABP99wG185lnyazDAeW0M3LMf /qd3mQEwIU98jlFmGHCFKCfEhbyZ95kWPX7y99p+wPZQW3llFdgS8A/mUp0BuzPjtW+p hW8Ou6x/Yt1uLRZIk6f3e4Wk8U/rKxWsFsXs+rOcPtHJ9wbZbLm5Xl588eBh/tuEf9U0 fFdcsre3Zqyi/WjyvWDpbSylWu56eTNWmIxK37Y4CgOBmXc/NwIYZbmvbr4FUc7+/GI2 b/H0Y6ouq1zJ7/NpNdHWOye9gv3jSIlLHF3ihbtfFAFCnD4u2rtIucYctViKzoV2pXsG XyLQ== X-Gm-Message-State: APt69E3xzcdI4jZboZh3Dm9D9eWtO8soX2Zfvx6kH57dc2Y4GyJ9tHHv fuOiIvrbJMSRs25wd8ZL2R25Vg== X-Google-Smtp-Source: AAOMgpfIB9b6/98HvEpLUqskd2wAj1eGUWBVpuG21EN9YigDv13y9k6rkObGOCXNg15oKwBIuDE42g== X-Received: by 2002:a6b:1f0e:: with SMTP id f14-v6mr25574968iof.236.1531408139600; Thu, 12 Jul 2018 08:08:59 -0700 (PDT) Received: from [192.168.1.167] ([216.160.245.98]) by smtp.gmail.com with ESMTPSA id f15-v6sm9068631ioh.64.2018.07.12.08.08.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Jul 2018 08:08:58 -0700 (PDT) Subject: Re: Silent data corruption in blkdev_direct_IO() To: Hannes Reinecke Cc: Christoph Hellwig , "linux-block@vger.kernel.org" , Martin Wilck References: From: Jens Axboe Message-ID: <3419a3ae-da82-9c20-26e1-7c9ed14ff8ed@kernel.dk> Date: Thu, 12 Jul 2018 09:08:57 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 7/12/18 8:36 AM, Hannes Reinecke wrote: > Hi Jens, Christoph, > > we're currently hunting down a silent data corruption occurring due to > commit 72ecad22d9f1 ("block: support a full bio worth of IO for > simplified bdev direct-io"). > > While the whole thing is still hazy on the details, the one thing we've > found is that reverting that patch fixes the data corruption. > > And looking closer, I've found this: > > static ssize_t > blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > { > int nr_pages; > > nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1); > if (!nr_pages) > return 0; > if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES) > return __blkdev_direct_IO_simple(iocb, iter, nr_pages); > > return __blkdev_direct_IO(iocb, iter, min(nr_pages, BIO_MAX_PAGES)); > } > > When checking the call path > __blkdev_direct_IO()->bio_alloc_bioset()->bvec_alloc() > I found that bvec_alloc() will fail if nr_pages > BIO_MAX_PAGES. > > So why is there the check for 'nr_pages <= BIO_MAX_PAGES' ? > It's not that we can handle it in __blkdev_direct_IO() ... The logic could be cleaned up like below, the sync part is really all we care about. What is the test case for this? async or sync? I also don't remember why it's BIO_MAX_PAGES + 1... diff --git a/fs/block_dev.c b/fs/block_dev.c index 0dd87aaeb39a..14ef3d71b55f 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -424,13 +424,13 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { int nr_pages; - nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1); + nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES); if (!nr_pages) return 0; - if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES) + if (is_sync_kiocb(iocb)) return __blkdev_direct_IO_simple(iocb, iter, nr_pages); - return __blkdev_direct_IO(iocb, iter, min(nr_pages, BIO_MAX_PAGES)); + return __blkdev_direct_IO(iocb, iter, nr_pages); } static __init int blkdev_init(void)