From patchwork Fri Feb 17 19:19:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Daniel_M=C3=BCller?= X-Patchwork-Id: 13145172 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 227E0C636D6 for ; Fri, 17 Feb 2023 19:19:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229566AbjBQTTY (ORCPT ); Fri, 17 Feb 2023 14:19:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjBQTTX (ORCPT ); Fri, 17 Feb 2023 14:19:23 -0500 Received: from mout02.posteo.de (mout02.posteo.de [185.67.36.66]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46DA8498BF for ; Fri, 17 Feb 2023 11:19:21 -0800 (PST) Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id C19062403E0 for ; Fri, 17 Feb 2023 20:19:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1676661559; bh=uR1etxSmCAteS8np0qhgLmpJSlRGC9c301Pgqk+71+w=; h=From:To:Subject:Date:From; b=hjIi5AvJcXb8JSNcQYYji22kWVEkmlrBX2riAepUPxqQGxTJarjRaRCJstx6F0Fw9 /ggUB7pgFjFnmsN14L9SJISRdANpKYKrbSaKq5OC6gMqPR3awiohyNhjcKSFfBZfVG viWnTvDUr5NLqQNBZb/Sv0TOqLFsWPGW4Zgzx1lmvXRL0hLm5L1ytV4qXSX32SKorP j9UNmqbcFuD6AgA12x9Ylu6NuOYBjvhKX76tybqBz6W6zQjusSglV7ZLsTgCM7VhYL VMIZY8oFLbFMTVjr5zqPD64kXmXj5XsklkNFXaMPul7XKQnz0Zljr60mRTgjRKg+iX Sw6PQzhq0/BoA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PJM7V5Cz1z9rxF; Fri, 17 Feb 2023 20:19:18 +0100 (CET) From: =?utf-8?q?Daniel_M=C3=BCller?= To: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next 1/3] libbpf: Implement basic zip archive parsing support Date: Fri, 17 Feb 2023 19:19:06 +0000 Message-Id: <20230217191908.1000004-2-deso@posteo.net> In-Reply-To: <20230217191908.1000004-1-deso@posteo.net> References: <20230217191908.1000004-1-deso@posteo.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This change implements support for reading zip archives, including opening an archive, finding an entry based on its path and name in it, and closing it. The code was copied from https://github.com/iovisor/bcc/pull/4440, which implements similar functionality for bcc. The author confirmed that he is fine with this usage and the corresponding relicensing. I adjusted it to adhere to libbpf coding standards. Signed-off-by: Daniel Müller Acked-by: Michał Gregorczyk --- tools/lib/bpf/Build | 2 +- tools/lib/bpf/zip.c | 378 ++++++++++++++++++++++++++++++++++++++++++++ tools/lib/bpf/zip.h | 47 ++++++ 3 files changed, 426 insertions(+), 1 deletion(-) create mode 100644 tools/lib/bpf/zip.c create mode 100644 tools/lib/bpf/zip.h diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build index 5a3dfb..b8b0a63 100644 --- a/tools/lib/bpf/Build +++ b/tools/lib/bpf/Build @@ -1,4 +1,4 @@ libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \ netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \ btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \ - usdt.o + usdt.o zip.o diff --git a/tools/lib/bpf/zip.c b/tools/lib/bpf/zip.c new file mode 100644 index 0000000..59ec79 --- /dev/null +++ b/tools/lib/bpf/zip.c @@ -0,0 +1,378 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* + * Routines for dealing with .zip archives. + * + * Copyright (c) Meta Platforms, Inc. and affiliates. + */ + +#include +#include +#include +#include +#include +#include + +#include "zip.h" + +/* Specification of ZIP file format can be found here: + * https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT + * For a high level overview of the structure of a ZIP file see + * sections 4.3.1 - 4.3.6. + * + * Data structures appearing in ZIP files do not contain any + * padding and they might be misaligned. To allow us to safely + * operate on pointers to such structures and their members, without + * worrying of platform specific alignment issues, we define + * unaligned_uint16_t and unaligned_uint32_t types with no alignment + * requirements. + */ +typedef struct { + uint8_t raw[2]; +} unaligned_uint16_t; + +static uint16_t unaligned_uint16_read(unaligned_uint16_t value) +{ + uint16_t return_value; + + memcpy(&return_value, value.raw, sizeof(return_value)); + return return_value; +} + +typedef struct { + uint8_t raw[4]; +} unaligned_uint32_t; + +static uint32_t unaligned_uint32_read(unaligned_uint32_t value) +{ + uint32_t return_value; + + memcpy(&return_value, value.raw, sizeof(return_value)); + return return_value; +} + +#define END_OF_CD_RECORD_MAGIC 0x06054b50 + +/* See section 4.3.16 of the spec. */ +struct end_of_central_directory_record { + /* Magic value equal to END_OF_CD_RECORD_MAGIC */ + unaligned_uint32_t magic; + + /* Number of the file containing this structure or 0xFFFF if ZIP64 archive. + * Zip archive might span multiple files (disks). + */ + unaligned_uint16_t this_disk; + + /* Number of the file containing the beginning of the central directory or + * 0xFFFF if ZIP64 archive. + */ + unaligned_uint16_t cd_disk; + + /* Number of central directory records on this disk or 0xFFFF if ZIP64 + * archive. + */ + unaligned_uint16_t cd_records; + + /* Number of central directory records on all disks or 0xFFFF if ZIP64 + * archive. + */ + unaligned_uint16_t cd_records_total; + + /* Size of the central directory recrod or 0xFFFFFFFF if ZIP64 archive. */ + unaligned_uint32_t cd_size; + + /* Offset of the central directory from the beginning of the archive or + * 0xFFFFFFFF if ZIP64 archive. + */ + unaligned_uint32_t cd_offset; + + /* Length of comment data following end of central driectory record. */ + unaligned_uint16_t comment_length; + + /* Up to 64k of arbitrary bytes. */ + /* uint8_t comment[comment_length] */ +}; + +#define CD_FILE_HEADER_MAGIC 0x02014b50 +#define FLAG_ENCRYPTED (1 << 0) +#define FLAG_HAS_DATA_DESCRIPTOR (1 << 3) + +/* See section 4.3.12 of the spec. */ +struct central_directory_file_header { + /* Magic value equal to CD_FILE_HEADER_MAGIC. */ + unaligned_uint32_t magic; + unaligned_uint16_t version; + /* Minimum zip version needed to extract the file. */ + unaligned_uint16_t min_version; + unaligned_uint16_t flags; + unaligned_uint16_t compression; + unaligned_uint16_t last_modified_time; + unaligned_uint16_t last_modified_date; + unaligned_uint32_t crc; + unaligned_uint32_t compressed_size; + unaligned_uint32_t uncompressed_size; + unaligned_uint16_t file_name_length; + unaligned_uint16_t extra_field_length; + unaligned_uint16_t file_comment_length; + /* Number of the disk where the file starts or 0xFFFF if ZIP64 archive. */ + unaligned_uint16_t disk; + unaligned_uint16_t internal_attributes; + unaligned_uint32_t external_attributes; + /* Offset from the start of the disk containing the local file header to the + * start of the local file header. + */ + unaligned_uint32_t offset; +}; + +#define LOCAL_FILE_HEADER_MAGIC 0x04034b50 + +/* See section 4.3.7 of the spec. */ +struct local_file_header { + /* Magic value equal to LOCAL_FILE_HEADER_MAGIC. */ + unaligned_uint32_t magic; + /* Minimum zip version needed to extract the file. */ + unaligned_uint16_t min_version; + unaligned_uint16_t flags; + unaligned_uint16_t compression; + unaligned_uint16_t last_modified_time; + unaligned_uint16_t last_modified_date; + unaligned_uint32_t crc; + unaligned_uint32_t compressed_size; + unaligned_uint32_t uncompressed_size; + unaligned_uint16_t file_name_length; + unaligned_uint16_t extra_field_length; +}; + +struct zip_archive { + void *data; + uint32_t size; + uint32_t cd_offset; + uint32_t cd_records; +}; + +static void *check_access(struct zip_archive *archive, uint32_t offset, uint32_t size) +{ + if (offset + size > archive->size || offset > offset + size) { + return NULL; + } + return archive->data + offset; +} + +/* Returns 0 on success, -1 on error and -2 if the eocd indicates + * the archive uses features which are not supported. + */ +static int try_parse_end_of_central_directory(struct zip_archive *archive, uint32_t offset) +{ + struct end_of_central_directory_record *eocd = + check_access(archive, offset, sizeof(struct end_of_central_directory_record)); + uint16_t comment_length, cd_records; + uint32_t cd_offset, cd_size; + + if (!eocd || unaligned_uint32_read(eocd->magic) != END_OF_CD_RECORD_MAGIC) { + return -1; + } + + comment_length = unaligned_uint16_read(eocd->comment_length); + if (offset + sizeof(struct end_of_central_directory_record) + comment_length != + archive->size) { + return -1; + } + + cd_records = unaligned_uint16_read(eocd->cd_records); + if (unaligned_uint16_read(eocd->this_disk) != 0 || + unaligned_uint16_read(eocd->cd_disk) != 0 || + unaligned_uint16_read(eocd->cd_records_total) != cd_records) { + /* This is a valid eocd, but we only support single-file non-ZIP64 archives. */ + return -2; + } + + cd_offset = unaligned_uint32_read(eocd->cd_offset); + cd_size = unaligned_uint32_read(eocd->cd_size); + if (!check_access(archive, cd_offset, cd_size)) { + return -1; + } + + archive->cd_offset = cd_offset; + archive->cd_records = cd_records; + return 0; +} + +static int find_central_directory(struct zip_archive *archive) +{ + uint32_t offset; + int64_t limit; + int rc = -1; + + if (archive->size <= sizeof(struct end_of_central_directory_record)) { + return -1; + } + + /* Because the end of central directory ends with a variable length array of + * up to 0xFFFF bytes we can't know exactly where it starts and need to + * search for it at the end of the file, scanning the (limit, offset] range. + */ + offset = archive->size - sizeof(struct end_of_central_directory_record); + limit = (int64_t)offset - (1 << 16); + + for (; offset >= 0 && offset > limit && rc == -1; offset--) { + rc = try_parse_end_of_central_directory(archive, offset); + } + + return rc; +} + +struct zip_archive *zip_archive_open(const char *path) +{ + struct zip_archive *archive; + int fd = open(path, O_RDONLY); + off_t size; + void *data; + + if (fd < 0) { + return NULL; + } + + size = lseek(fd, 0, SEEK_END); + if (size == (off_t)-1 || size > UINT32_MAX) { + close(fd); + return NULL; + } + + data = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + + if (data == MAP_FAILED) { + return NULL; + } + + archive = malloc(sizeof(struct zip_archive)); + if (!archive) { + munmap(data, size); + return NULL; + }; + + archive->data = data; + archive->size = size; + if (find_central_directory(archive)) { + munmap(data, size); + free(archive); + archive = NULL; + } + + return archive; +} + +void zip_archive_close(struct zip_archive *archive) +{ + munmap(archive->data, archive->size); + free(archive); +} + +static struct local_file_header *local_file_header_at_offset(struct zip_archive *archive, + uint32_t offset) +{ + struct local_file_header *lfh = + check_access(archive, offset, sizeof(struct local_file_header)); + if (!lfh || unaligned_uint32_read(lfh->magic) != LOCAL_FILE_HEADER_MAGIC) { + return NULL; + } + return lfh; +} + +static int get_entry_at_offset(struct zip_archive *archive, uint32_t offset, struct zip_entry *out) +{ + struct local_file_header *lfh = local_file_header_at_offset(archive, offset); + uint16_t flags, name_length, extra_field_length; + uint32_t compressed_size; + const char *name; + void *data; + + offset += sizeof(struct local_file_header); + if (!lfh) { + return -1; + }; + + flags = unaligned_uint16_read(lfh->flags); + if ((flags & FLAG_ENCRYPTED) || (flags & FLAG_HAS_DATA_DESCRIPTOR)) { + return -1; + } + + name_length = unaligned_uint16_read(lfh->file_name_length); + name = check_access(archive, offset, name_length); + offset += name_length; + if (!name) { + return -1; + } + + extra_field_length = unaligned_uint16_read(lfh->extra_field_length); + if (!check_access(archive, offset, extra_field_length)) { + return -1; + } + offset += extra_field_length; + + compressed_size = unaligned_uint32_read(lfh->compressed_size); + data = check_access(archive, offset, compressed_size); + if (!data) { + return -1; + } + + out->compression = unaligned_uint16_read(lfh->compression); + out->name_length = name_length; + out->name = name; + out->data = data; + out->data_length = compressed_size; + out->data_offset = offset; + + return 0; +} + +static struct central_directory_file_header *cd_file_header_at_offset(struct zip_archive *archive, + uint32_t offset) +{ + struct central_directory_file_header *cdfh = + check_access(archive, offset, sizeof(struct central_directory_file_header)); + if (!cdfh || unaligned_uint32_read(cdfh->magic) != CD_FILE_HEADER_MAGIC) { + return NULL; + } + return cdfh; +} + +int zip_archive_find_entry(struct zip_archive *archive, const char *file_name, + struct zip_entry *out) +{ + size_t file_name_length = strlen(file_name); + + uint32_t i, offset = archive->cd_offset; + + for (i = 0; i < archive->cd_records; ++i) { + struct central_directory_file_header *cdfh = + cd_file_header_at_offset(archive, offset); + uint16_t cdfh_name_length, cdfh_flags; + const char *cdfh_name; + + offset += sizeof(struct central_directory_file_header); + if (!cdfh) { + return -1; + } + + cdfh_name_length = unaligned_uint16_read(cdfh->file_name_length); + cdfh_name = check_access(archive, offset, cdfh_name_length); + if (!cdfh_name) { + return -1; + } + + cdfh_flags = unaligned_uint16_read(cdfh->flags); + if ((cdfh_flags & FLAG_ENCRYPTED) == 0 && + (cdfh_flags & FLAG_HAS_DATA_DESCRIPTOR) == 0 && + file_name_length == cdfh_name_length && + memcmp(file_name, archive->data + offset, file_name_length) == 0) { + return get_entry_at_offset(archive, unaligned_uint32_read(cdfh->offset), + out); + } + + offset += cdfh_name_length; + offset += unaligned_uint16_read(cdfh->extra_field_length); + offset += unaligned_uint16_read(cdfh->file_comment_length); + } + + return -1; +} diff --git a/tools/lib/bpf/zip.h b/tools/lib/bpf/zip.h new file mode 100644 index 0000000..a9083f --- /dev/null +++ b/tools/lib/bpf/zip.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ + +#ifndef __LIBBPF_ZIP_H +#define __LIBBPF_ZIP_H + +#include + +/* Represents an opened zip archive. + * Only basic ZIP files are supported, in particular the following are not + * supported: + * - encryption + * - streaming + * - multi-part ZIP files + * - ZIP64 + */ +struct zip_archive; + +/* Carries information on name, compression method, and data corresponding to a + * file in a zip archive. + */ +struct zip_entry { + /* Compression method as defined in pkzip spec. 0 means data is uncompressed. */ + uint16_t compression; + + /* Non-null terminated name of the file. */ + const char *name; + /* Length of the file name. */ + uint16_t name_length; + + /* Pointer to the file data. */ + const void *data; + /* Length of the file data. */ + uint32_t data_length; + /* Offset of the file data within the archive. */ + uint32_t data_offset; +}; + +/* Open a zip archive. Returns NULL in case of an error. */ +struct zip_archive *zip_archive_open(const char *path); + +/* Close a zip archive and release resources. */ +void zip_archive_close(struct zip_archive *archive); + +/* Look up an entry corresponding to a file in given zip archive. */ +int zip_archive_find_entry(struct zip_archive *archive, const char *name, struct zip_entry *out); + +#endif From patchwork Fri Feb 17 19:19:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Daniel_M=C3=BCller?= X-Patchwork-Id: 13145173 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D81B2C636D7 for ; Fri, 17 Feb 2023 19:19:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229582AbjBQTTZ (ORCPT ); Fri, 17 Feb 2023 14:19:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjBQTTZ (ORCPT ); Fri, 17 Feb 2023 14:19:25 -0500 Received: from mout02.posteo.de (mout02.posteo.de [185.67.36.66]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E160498BF for ; Fri, 17 Feb 2023 11:19:24 -0800 (PST) Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id D3DFF24050C for ; Fri, 17 Feb 2023 20:19:22 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1676661562; bh=ow+O/w3nZNfiJVZ8THEXiHUzwqL8hCx/Uy/0IHWQaMM=; h=From:To:Subject:Date:From; b=E4YG4pBRV+U+WjzH2kHWhLiHW773Rs+Kx+iKkn0CwbxpGqdS+VFgPbPXA++lcgtJ/ bh2TNJzkuBmCsQkQ1KB4N2Fp5MUErlKc4rb/T3EqWzMEIlXdjtzyw6ji77nwifLBID 7L7dBFGe7llBpUC93J6uyx+4mQNN6UaRq3/tg4yFDvTRW1jrMD5IlYxfvc1k/klT0D AhCxibKIZGh3MvOi7yHfktAUyq0oagW48lEUkqQHvCYr/Po0e2TJqaTkCUdLL6RiKZ CrvbmtXhgLyIGpCh9Jn6j0icMkQ/vTIih5p7Y+fjTQMs5f2duqsK2XWAy2udPVlzlI PVwTJG6vI2DmA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PJM7Y6Fr2z9rxD; Fri, 17 Feb 2023 20:19:21 +0100 (CET) From: =?utf-8?q?Daniel_M=C3=BCller?= To: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next 2/3] libbpf: Introduce elf_find_func_offset_from_elf_file() function Date: Fri, 17 Feb 2023 19:19:07 +0000 Message-Id: <20230217191908.1000004-3-deso@posteo.net> In-Reply-To: <20230217191908.1000004-1-deso@posteo.net> References: <20230217191908.1000004-1-deso@posteo.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This change splits the elf_find_func_offset() function in two: elf_find_func_offset(), which now accepts an already opened Elf object instead of a path to a file that is to be opened, as well as elf_find_func_offset_from_elf_file(), which opens a binary based on a path and then invokes elf_find_func_offset() on the Elf object. Having this split in responsibilities will allow us to call elf_find_func_offset() from other code paths on Elf objects that did not necessarily come from a file on disk. Signed-off-by: Daniel Müller --- tools/lib/bpf/libbpf.c | 55 +++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 05c4db3..a474f49 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -10531,32 +10531,19 @@ static Elf_Scn *elf_find_next_scn_by_type(Elf *elf, int sh_type, Elf_Scn *scn) return NULL; } -/* Find offset of function name in object specified by path. "name" matches - * symbol name or name@@LIB for library functions. +/* Find offset of function name in the provided ELF object. "binary_path" is + * the path to the ELF binary represented by "elf", and only used for error + * reporting matters. "name" matches symbol name or name@@LIB for library + * functions. */ -static long elf_find_func_offset(const char *binary_path, const char *name) +static long elf_find_func_offset(Elf *elf, const char *binary_path, const char *name) { - int fd, i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB }; + int i, sh_types[2] = { SHT_DYNSYM, SHT_SYMTAB }; bool is_shared_lib, is_name_qualified; - char errmsg[STRERR_BUFSIZE]; long ret = -ENOENT; size_t name_len; GElf_Ehdr ehdr; - Elf *elf; - fd = open(binary_path, O_RDONLY | O_CLOEXEC); - if (fd < 0) { - ret = -errno; - pr_warn("failed to open %s: %s\n", binary_path, - libbpf_strerror_r(ret, errmsg, sizeof(errmsg))); - return ret; - } - elf = elf_begin(fd, ELF_C_READ_MMAP, NULL); - if (!elf) { - pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1)); - close(fd); - return -LIBBPF_ERRNO__FORMAT; - } if (!gelf_getehdr(elf, &ehdr)) { pr_warn("elf: failed to get ehdr from %s: %s\n", binary_path, elf_errmsg(-1)); ret = -LIBBPF_ERRNO__FORMAT; @@ -10682,6 +10669,34 @@ static long elf_find_func_offset(const char *binary_path, const char *name) } } out: + return ret; +} + +/* Find offset of function name in ELF object specified by path. "name" matches + * symbol name or name@@LIB for library functions. + */ +static long elf_find_func_offset_from_elf_file(const char *binary_path, const char *name) +{ + char errmsg[STRERR_BUFSIZE]; + long ret = -ENOENT; + Elf *elf; + int fd; + + fd = open(binary_path, O_RDONLY | O_CLOEXEC); + if (fd < 0) { + ret = -errno; + pr_warn("failed to open %s: %s\n", binary_path, + libbpf_strerror_r(ret, errmsg, sizeof(errmsg))); + return ret; + } + elf = elf_begin(fd, ELF_C_READ_MMAP, NULL); + if (!elf) { + pr_warn("elf: could not read elf from %s: %s\n", binary_path, elf_errmsg(-1)); + close(fd); + return -LIBBPF_ERRNO__FORMAT; + } + + ret = elf_find_func_offset(elf, binary_path, name); elf_end(elf); close(fd); return ret; @@ -10805,7 +10820,7 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid, if (func_name) { long sym_off; - sym_off = elf_find_func_offset(binary_path, func_name); + sym_off = elf_find_func_offset_from_elf_file(binary_path, func_name); if (sym_off < 0) return libbpf_err_ptr(sym_off); func_offset += sym_off; From patchwork Fri Feb 17 19:19:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Daniel_M=C3=BCller?= X-Patchwork-Id: 13145174 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1918C636D6 for ; Fri, 17 Feb 2023 19:19:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229674AbjBQTT3 (ORCPT ); Fri, 17 Feb 2023 14:19:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbjBQTT2 (ORCPT ); Fri, 17 Feb 2023 14:19:28 -0500 Received: from mout01.posteo.de (mout01.posteo.de [185.67.36.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D92153EDD for ; Fri, 17 Feb 2023 11:19:27 -0800 (PST) Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id F3FC02403A0 for ; Fri, 17 Feb 2023 20:19:25 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1676661566; bh=2t67PsuTcPgzpG0pzQcMjBQLDyw7ECrjL94jpIL4Cpg=; h=From:To:Subject:Date:From; b=LtHGAVk3OAjcoKfvGfAxZV+z07CJZYuILjZUgfNml9LtlV7EA1TUm9ldpn/u4KuP/ V9p9GQK5suZcrDZHxS6oKZ20oGuL5DR8BnQfCMkeuppFz+VcUQ2xL+RWvGM4Tlfz5M MVwNDA+tB0PzE8ibTqoVXtl1qmke7tMMnuxrSKMsDYbLgbj+k7NiUzNg0DtTJ4/6dM gMXzjc9hORqIguDGtAq+U1HZIgcQJbdOwAxoZmz9PJbtUbwS+umdrBzR6VI8HPKrG8 SPVCC2GL9x9gU4yPbhZQt0BQ8kqMDwhvoRJWYCwwHa2qbvLEukmwEuwDneXanNDORB WwUMue2AI1s/Q== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PJM7c5w87z9rxH; Fri, 17 Feb 2023 20:19:24 +0100 (CET) From: =?utf-8?q?Daniel_M=C3=BCller?= To: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org, daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com Subject: [PATCH bpf-next 3/3] libbpf: Add support for attaching uprobes to shared objects in APKs Date: Fri, 17 Feb 2023 19:19:08 +0000 Message-Id: <20230217191908.1000004-4-deso@posteo.net> In-Reply-To: <20230217191908.1000004-1-deso@posteo.net> References: <20230217191908.1000004-1-deso@posteo.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This change adds support for attaching uprobes to shared objects located in APKs, which is relevant for Android systems where various libraries may reside in APKs. To make that happen, we extend the syntax for the "binary path" argument to attach to with that supported by various Android tools: !/ For example: /system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so APKs need to be specified via full path, i.e., we do not attempt to resolve mere file names by searching system directories. We cannot currently test this functionality end-to-end in an automated fashion, because it relies on an Android system being present, but there is no support for that in CI. I have tested the functionality manually, by creating a libbpf program containing a uretprobe, attaching it to a function inside a shared object inside an APK, and verifying the sanity of the returned values. Signed-off-by: Daniel Müller --- tools/lib/bpf/libbpf.c | 84 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 80 insertions(+), 4 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index a474f49..79ab85f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -53,6 +53,7 @@ #include "libbpf_internal.h" #include "hashmap.h" #include "bpf_gen_internal.h" +#include "zip.h" #ifndef BPF_FS_MAGIC #define BPF_FS_MAGIC 0xcafe4a11 @@ -10702,6 +10703,60 @@ static long elf_find_func_offset_from_elf_file(const char *binary_path, const ch return ret; } +/* Find offset of function name in archive specified by path. Currently + * supported are .zip files that do not compress their contents (as used on + * Android in the form of APKs, for example). "file_name" is the name of the + * ELF file inside the archive. "func_name" matches symbol name or name@@LIB + * for library functions. + */ +static long elf_find_func_offset_from_archive(const char *archive_path, const char *file_name, + const char *func_name) +{ + struct zip_archive *archive; + struct zip_entry entry; + long ret = -ENOENT; + Elf *elf; + + archive = zip_archive_open(archive_path); + if (!archive) { + pr_warn("failed to open %s\n", archive_path); + return -LIBBPF_ERRNO__FORMAT; + } + + if (zip_archive_find_entry(archive, file_name, &entry)) { + pr_warn("zip: could not find archive member %s in %s\n", file_name, archive_path); + ret = -LIBBPF_ERRNO__FORMAT; + goto out; + } + + if (entry.compression) { + pr_warn("zip: entry %s of %s is compressed and cannot be handled\n", file_name, + archive_path); + ret = -LIBBPF_ERRNO__FORMAT; + goto out; + } + + elf = elf_memory((void *)entry.data, entry.data_length); + if (!elf) { + pr_warn("elf: could not read elf file %s from %s: %s\n", file_name, archive_path, + elf_errmsg(-1)); + ret = -LIBBPF_ERRNO__FORMAT; + goto out; + } + + ret = elf_find_func_offset(elf, file_name, func_name); + if (ret > 0) { + ret += entry.data_offset; + pr_debug("elf: symbol address match for '%s' in '%s': 0x%lx\n", func_name, + archive_path, ret); + } + elf_end(elf); + +out: + zip_archive_close(archive); + return ret; +} + static const char *arch_specific_lib_paths(void) { /* @@ -10789,6 +10844,9 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid, { DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, pe_opts); char errmsg[STRERR_BUFSIZE], *legacy_probe = NULL; + const char *archive_path = NULL; + const char *archive_sep = NULL; + char full_archive_path[PATH_MAX]; char full_binary_path[PATH_MAX]; struct bpf_link *link; size_t ref_ctr_off; @@ -10806,9 +10864,21 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid, if (!binary_path) return libbpf_err_ptr(-EINVAL); - if (!strchr(binary_path, '/')) { - err = resolve_full_path(binary_path, full_binary_path, - sizeof(full_binary_path)); + /* Check if "binary_path" refers to an archive. */ + archive_sep = strstr(binary_path, "!/"); + if (archive_sep) { + if (archive_sep - binary_path >= sizeof(full_archive_path)) { + return libbpf_err_ptr(-EINVAL); + } + + strncpy(full_archive_path, binary_path, archive_sep - binary_path); + full_archive_path[archive_sep - binary_path] = 0; + archive_path = full_archive_path; + + strcpy(full_binary_path, archive_sep + 2); + binary_path = full_binary_path; + } else if (!strchr(binary_path, '/')) { + err = resolve_full_path(binary_path, full_binary_path, sizeof(full_binary_path)); if (err) { pr_warn("prog '%s': failed to resolve full path for '%s': %d\n", prog->name, binary_path, err); @@ -10820,7 +10890,13 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid, if (func_name) { long sym_off; - sym_off = elf_find_func_offset_from_elf_file(binary_path, func_name); + if (archive_path) { + sym_off = elf_find_func_offset_from_archive(archive_path, binary_path, + func_name); + binary_path = archive_path; + } else { + sym_off = elf_find_func_offset_from_elf_file(binary_path, func_name); + } if (sym_off < 0) return libbpf_err_ptr(sym_off); func_offset += sym_off;