From patchwork Thu Aug 31 03:09:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 9931067 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A79A4602F0 for ; Thu, 31 Aug 2017 03:11:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A42DD287AE for ; Thu, 31 Aug 2017 03:11:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9905A287B7; Thu, 31 Aug 2017 03:11:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE17F287AE for ; Thu, 31 Aug 2017 03:11:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751664AbdHaDKs (ORCPT ); Wed, 30 Aug 2017 23:10:48 -0400 Received: from pb-smtp1.pobox.com ([64.147.108.70]:53251 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750995AbdHaDJi (ORCPT ); Wed, 30 Aug 2017 23:09:38 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id A1DEF9C7FA; Wed, 30 Aug 2017 23:09:37 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:date:message-id:in-reply-to:references; s=sasl; bh=XQed 1WcOZCG4+aJetyoaof0+Uig=; b=g1aYcVA9R1nExmk3Q+KG/zVYwLgpsUoRJIUh EuXpWt9AwvH2bTmP8Y53S5xCS4eKX7ZPRYh1BIH35S90uW3gtbnPB3Jsngu8dzKr VAm1e20dyEUp+YwzBf+FxOxr06orkIPyaCOiKY/h+hpZXAIRW2exYFiocaY4AlB/ mSi6HwI= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 989A39C7F8; Wed, 30 Aug 2017 23:09:37 -0400 (EDT) Received: from yoda.home (unknown [70.80.200.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 0D0DF9C7F5; Wed, 30 Aug 2017 23:09:37 -0400 (EDT) Received: from xanadu.home (xanadu.home [192.168.2.2]) by yoda.home (Postfix) with ESMTP id 690572DA0661; Wed, 30 Aug 2017 23:09:36 -0400 (EDT) From: Nicolas Pitre To: Alexander Viro Cc: linux-fsdevel@vger.kernel.org, linux-embedded@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Brandt Subject: [PATCH v3 4/5] cramfs: add mmap support Date: Wed, 30 Aug 2017 23:09:31 -0400 Message-Id: <20170831030932.26979-5-nicolas.pitre@linaro.org> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20170831030932.26979-1-nicolas.pitre@linaro.org> References: <20170831030932.26979-1-nicolas.pitre@linaro.org> X-Pobox-Relay-ID: D22E0AF2-8DF9-11E7-91AA-FE4B1A68708C-78420484!pb-smtp1.pobox.com Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When cramfs_physmem is used then we have the opportunity to map files directly from ROM, directly into user space, saving on RAM usage. This gives us Execute-In-Place (XIP) support. For a file to be mmap()-able, the map area has to correspond to a range of uncompressed and contiguous blocks, and in the MMU case it also has to be page aligned. A version of mkcramfs with appropriate support is necessary to create such a filesystem image. In the MMU case it may happen for a vma structure to extend beyond the actual file size. This is notably the case in binfmt_elf.c:elf_map(). Or the file's last block is shared with other files and cannot be mapped as is. Rather than refusing to mmap it, we do a partial map and set up a special vm_ops fault handler that splits the vma in two: the direct mapping vma and the memory-backed vma populated by the readpage method. In practice the unmapped area is seldom accessed so the split might never occur before this area is discarded. In the non-MMU case it is the get_unmapped_area method that is responsible for providing the address where the actual data can be found. No mapping is necessary of course. Signed-off-by: Nicolas Pitre Tested-by: Chris Brandt --- fs/cramfs/inode.c | 295 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 2fc886092b..1d7d61354b 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -15,7 +15,9 @@ #include #include +#include #include +#include #include #include #include @@ -49,6 +51,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb) static const struct super_operations cramfs_ops; static const struct inode_operations cramfs_dir_inode_operations; static const struct file_operations cramfs_directory_operations; +static const struct file_operations cramfs_physmem_fops; static const struct address_space_operations cramfs_aops; static DEFINE_MUTEX(read_mutex); @@ -96,6 +99,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb, case S_IFREG: inode->i_fop = &generic_ro_fops; inode->i_data.a_ops = &cramfs_aops; + if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && + CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS && + CRAMFS_SB(sb)->linear_phys_addr) + inode->i_fop = &cramfs_physmem_fops; break; case S_IFDIR: inode->i_op = &cramfs_dir_inode_operations; @@ -277,6 +284,294 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, return NULL; } +/* + * For a mapping to be possible, we need a range of uncompressed and + * contiguous blocks. Return the offset for the first block and number of + * valid blocks for which that is true, or zero otherwise. + */ +static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages) +{ + struct super_block *sb = inode->i_sb; + struct cramfs_sb_info *sbi = CRAMFS_SB(sb); + int i; + u32 *blockptrs, blockaddr; + + /* + * We can dereference memory directly here as this code may be + * reached only when there is a direct filesystem image mapping + * available in memory. + */ + blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4); + blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS; + i = 0; + do { + u32 expect = blockaddr + i * (PAGE_SIZE >> 2); + expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED; + if (blockptrs[i] != expect) { + pr_debug("range: block %d/%d got %#x expects %#x\n", + pgoff+i, pgoff+*pages-1, blockptrs[i], expect); + if (i == 0) + return 0; + break; + } + } while (++i < *pages); + + *pages = i; + + /* stored "direct" block ptrs are shifted down by 2 bits */ + return blockaddr << 2; +} + +/* + * It is possible for cramfs_physmem_mmap() to partially populate the mapping + * causing page faults in the unmapped area. When that happens, we need to + * split the vma so that the unmapped area gets its own vma that can be backed + * with actual memory pages and loaded normally. This is necessary because + * remap_pfn_range() overwrites vma->vm_pgoff with the pfn and filemap_fault() + * no longer works with it. Furthermore this makes /proc/x/maps right. + * Q: is there a way to do split vma at mmap() time? + */ +static const struct vm_operations_struct cramfs_vmasplit_ops; +static int cramfs_vmasplit_fault(struct vm_fault *vmf) +{ + struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma, *new_vma; + struct file *vma_file = get_file(vmf->vma->vm_file); + unsigned long split_val, split_addr; + unsigned int split_pgoff; + int ret; + + /* We have some vma surgery to do and need the write lock. */ + up_read(&mm->mmap_sem); + if (down_write_killable(&mm->mmap_sem)) { + fput(vma_file); + return VM_FAULT_RETRY; + } + + /* Make sure the vma didn't change between the locks */ + ret = VM_FAULT_SIGSEGV; + vma = find_vma(mm, vmf->address); + if (!vma) + goto out_fput; + + /* + * Someone else might have raced with us and handled the fault, + * changed the vma, etc. If so let it go back to user space and + * fault again if necessary. + */ + ret = VM_FAULT_NOPAGE; + if (vma->vm_ops != &cramfs_vmasplit_ops || vma->vm_file != vma_file) + goto out_fput; + fput(vma_file); + + /* Retrieve the vma split address and validate it */ + split_val = (unsigned long)vma->vm_private_data; + split_pgoff = split_val & 0xfff; + split_addr = (split_val >> 12) << PAGE_SHIFT; + if (split_addr < vma->vm_start) { + /* bottom of vma was unmapped */ + split_pgoff += (vma->vm_start - split_addr) >> PAGE_SHIFT; + split_addr = vma->vm_start; + } + pr_debug("fault: addr=%#lx vma=%#lx-%#lx split=%#lx\n", + vmf->address, vma->vm_start, vma->vm_end, split_addr); + ret = VM_FAULT_SIGSEGV; + if (!split_val || split_addr > vmf->address || vma->vm_end <= vmf->address) + goto out; + + if (unlikely(vma->vm_start == split_addr)) { + /* nothing to split */ + new_vma = vma; + } else { + /* Split away the directly mapped area */ + ret = VM_FAULT_OOM; + if (split_vma(mm, vma, split_addr, 0) != 0) + goto out; + + /* The direct vma should no longer ever fault */ + vma->vm_ops = NULL; + + /* Retrieve the new vma covering the unmapped area */ + new_vma = find_vma(mm, split_addr); + BUG_ON(new_vma == vma); + ret = VM_FAULT_SIGSEGV; + if (!new_vma) + goto out; + } + + /* + * Readjust the new vma with the actual file based pgoff and + * process the fault normally on it. + */ + new_vma->vm_pgoff = split_pgoff; + new_vma->vm_ops = &generic_file_vm_ops; + new_vma->vm_flags &= ~(VM_IO | VM_PFNMAP | VM_DONTEXPAND); + vmf->vma = new_vma; + vmf->pgoff = split_pgoff; + vmf->pgoff += (vmf->address - new_vma->vm_start) >> PAGE_SHIFT; + downgrade_write(&mm->mmap_sem); + return filemap_fault(vmf); + +out_fput: + fput(vma_file); +out: + downgrade_write(&mm->mmap_sem); + return ret; +} + +static const struct vm_operations_struct cramfs_vmasplit_ops = { + .fault = cramfs_vmasplit_fault, +}; + +static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct inode *inode = file_inode(file); + struct super_block *sb = inode->i_sb; + struct cramfs_sb_info *sbi = CRAMFS_SB(sb); + unsigned int pages, vma_pages, max_pages, offset; + unsigned long address; + char *fail_reason; + int ret; + + if (!IS_ENABLED(CONFIG_MMU)) + return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS; + + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE)) + return -EINVAL; + + /* Could COW work here? */ + fail_reason = "vma is writable"; + if (vma->vm_flags & VM_WRITE) + goto fail; + + vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT; + max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; + fail_reason = "beyond file limit"; + if (vma->vm_pgoff >= max_pages) + goto fail; + pages = vma_pages; + if (pages > max_pages - vma->vm_pgoff) + pages = max_pages - vma->vm_pgoff; + + offset = cramfs_get_block_range(inode, vma->vm_pgoff, &pages); + fail_reason = "unsuitable block layout"; + if (!offset) + goto fail; + address = sbi->linear_phys_addr + offset; + fail_reason = "data is not page aligned"; + if (!PAGE_ALIGNED(address)) + goto fail; + + /* Don't map the last page if it contains some other data */ + if (unlikely(vma->vm_pgoff + pages == max_pages)) { + unsigned int partial = offset_in_page(inode->i_size); + if (partial) { + char *data = sbi->linear_virt_addr + offset; + data += (max_pages - 1) * PAGE_SIZE + partial; + while ((unsigned long)data & 7) + if (*data++ != 0) + goto nonzero; + while (offset_in_page(data)) { + if (*(u64 *)data != 0) { + nonzero: + pr_debug("mmap: %s: last page is shared\n", + file_dentry(file)->d_name.name); + pages--; + break; + } + data += 8; + } + } + } + + if (pages) { + /* + * If we can't map it all, page faults will occur if the + * unmapped area is accessed. Let's handle them to split the + * vma and let the normal paging machinery take care of the + * rest through cramfs_readpage(). Because remap_pfn_range() + * repurposes vma->vm_pgoff, we have to save it somewhere. + * Let's use vma->vm_private_data to hold both the pgoff and + * the actual address split point. Maximum file size is 16MB + * (12 bits pgoff) and max 20 bits pfn where a long is 32 bits + * so we can pack both together. + */ + if (pages != vma_pages) { + unsigned int split_pgoff = vma->vm_pgoff + pages; + unsigned long split_pfn = (vma->vm_start >> PAGE_SHIFT) + pages; + unsigned long split_val = split_pgoff | (split_pfn << 12); + vma->vm_private_data = (void *)split_val; + vma->vm_ops = &cramfs_vmasplit_ops; + /* to keep remap_pfn_range() happy */ + vma->vm_end = vma->vm_start + pages * PAGE_SIZE; + } + + ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT, + pages * PAGE_SIZE, vma->vm_page_prot); + /* restore vm_end in case we cheated it above */ + vma->vm_end = vma->vm_start + vma_pages * PAGE_SIZE; + if (ret) + return ret; + + pr_debug("mapped %s at 0x%08lx (%u/%u pages) to vma 0x%08lx, " + "page_prot 0x%llx\n", file_dentry(file)->d_name.name, + address, pages, vma_pages, vma->vm_start, + (unsigned long long)pgprot_val(vma->vm_page_prot)); + return 0; + } + fail_reason = "no suitable block remaining"; + +fail: + pr_debug("%s: direct mmap failed: %s\n", + file_dentry(file)->d_name.name, fail_reason); + + /* We failed to do a direct map, but normal paging will do it */ + vma->vm_ops = &generic_file_vm_ops; + return 0; +} + +#ifndef CONFIG_MMU + +static unsigned long cramfs_physmem_get_unmapped_area(struct file *file, + unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + struct inode *inode = file_inode(file); + struct super_block *sb = inode->i_sb; + struct cramfs_sb_info *sbi = CRAMFS_SB(sb); + unsigned int pages, block_pages, max_pages, offset; + + pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; + max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; + if (pgoff >= max_pages || pages > max_pages - pgoff) + return -EINVAL; + block_pages = pages; + offset = cramfs_get_block_range(inode, pgoff, &block_pages); + if (!offset || block_pages != pages) + return -ENOSYS; + addr = sbi->linear_phys_addr + offset; + pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n", + file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr); + return addr; +} + +static unsigned cramfs_physmem_mmap_capabilities(struct file *file) +{ + return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC; +} +#endif + +static const struct file_operations cramfs_physmem_fops = { + .llseek = generic_file_llseek, + .read_iter = generic_file_read_iter, + .splice_read = generic_file_splice_read, + .mmap = cramfs_physmem_mmap, +#ifndef CONFIG_MMU + .get_unmapped_area = cramfs_physmem_get_unmapped_area, + .mmap_capabilities = cramfs_physmem_mmap_capabilities, +#endif +}; + static void cramfs_blkdev_kill_sb(struct super_block *sb) { struct cramfs_sb_info *sbi = CRAMFS_SB(sb);