From patchwork Sun Oct 10 17:02:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04D0FC433EF for ; Sun, 10 Oct 2021 17:03:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D583760F4B for ; Sun, 10 Oct 2021 17:03:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232020AbhJJRFI (ORCPT ); Sun, 10 Oct 2021 13:05:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231927AbhJJRFH (ORCPT ); Sun, 10 Oct 2021 13:05:07 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CACEC06161C for ; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id r7so47816906wrc.10 for ; Sun, 10 Oct 2021 10:03:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yYJ/b49QpJ4fkZtCd4pL6lc0TKalwVrQgk6O0OkTjwk=; b=YRrV8sC4Fd3F6VdjoE44LH2cXmSsMrUmgcfo3jc/p4Pc6kdc0IZG9BrQHJt+PZd4lU iUQ49NWLkmFJbCnKZG4QbcQxm98pjhoQKelmbRole8kN2IfKOAjm5kR0fXgAqrvt+Cr7 K2w+lppnGuVHRkz0YQ6GKIyEkWhMoDQDsX2Qt9RskRZhgAmveQ7FXGnrIXV7xq/plS4l DMAxlqg3RZ37hsUSXlbUAesmZlPAGFf1Fs3R/7lTNeXNALskyEcOyprrMMkZtwjXRAlm +H0HwY0geo11ggzdW9hI4ERl6czkBxYc/b05VtLRSwg35xB99ohoXwSySgyKSVVEDElu ML5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yYJ/b49QpJ4fkZtCd4pL6lc0TKalwVrQgk6O0OkTjwk=; b=E55teyt2U8B6rdKoKv0BUFN9zQV7lk8SE/YNaupqHtwMDNn6LIF7gTAvg5PU9o7HGB JH+7wf+DqmTr/vx7rutdccPwqJqCx3tmq7tAy/lv0j13QscuDx/6r0avUT0N5HXVKNkQ 40a7XxB+iE0rLjYq57dp8W3hOwdE2uXI7hKGIAf6s8MSIXDnLpEtqgGloYjMxruCMtqY YdBAuwR5la8srf0zWhE5/U/Q106t3E49e1vJZcpT2YQmrYrnCd7StZ0pSHR9HFP0rVB2 alGQwMT0CKIK92Ek6qs28kNGsU5wbD2WUpFt2bs0hRbxD6KF4Fy+A9khyp2mBYBjewTP kSnA== X-Gm-Message-State: AOAM530JYI5zsGsIeTvo2XPAEHMTEVQwTeOeexDDAYiaYFS+Mdai8W2F ojD7+9efRzrUuT+nYEmn2embE0+3Sr8= X-Google-Smtp-Source: ABdhPJy2+cqhlRbhJ9IyVII6wQ0/K3cbgk+FgtWk8CrZYHzi7XWhO/37J9xLDlTsCK32Tc2LP4wSsQ== X-Received: by 2002:a5d:6c62:: with SMTP id r2mr18986021wrz.412.1633885386020; Sun, 10 Oct 2021 10:03:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c204sm18447328wme.11.2021.10.10.10.03.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:05 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 10 Oct 2021 17:02:59 +0000 Subject: [PATCH v3 1/6] t4034/cpp: actually test that operator tokens are not split Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt 8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18) added many tests with the intent to verify that operators consisting of more than one symbol are kept together. These are tested by probing a transition from, e.g., a!=b to x!=y, which results in the word-diff [-a-]{+x+}!=[-b-]{+y+} But that proves only that the letters and operators are separate tokens. To prove that != is an unseparable token, we have to probe a transition from, e.g., a=b to a!=b having a word-diff a[-=-]{+!=+}b that proves that the ! is not separate from the =. In the post-image, add to or remove from operators a character that turns it into another valid operator. Change the identifiers used around operators such that the diff algorithm does not have an incentive to match, e.g., a --- t/t4034/cpp/expect | 45 +++++++++++++++------------------------------ t/t4034/cpp/post | 29 +++++++++++++---------------- t/t4034/cpp/pre | 25 +++++++++++-------------- 3 files changed, 39 insertions(+), 60 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 37d1ea25870..41976971b93 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,36 +1,21 @@ diff --git a/pre b/post -index 23d5c8a..7e8c026 100644 +index c5672a2..4229868 100644 --- a/pre +++ b/post -@@ -1,19 +1,19 @@ +@@ -1,16 +1,16 @@ Foo() : x(0&&1&42) { bar(x); } cout<<"Hello World!?\n"<(1) (-1e10) (0xabcdef) 'xy' -[ax] ax->b ay x.by -!ax ~a ax x++ ax-- ax*b ay x&b -ay -x*b ay x/b ay x%b -ay -x+b ay x-b -ay -x<<b ay x>>b -ay -x<b ay x<=b ay x>b ay x>=b -ay -x==b ay x!=b -ay -x&b -ay -x^b -ay -x|b -ay -x&&b -ay -x||b -ay -x?by:z -ax=b ay x+=b ay x-=b ay x*=b ay x/=b ay x%=b ay x<<=b ay x>>=b ay x&=b ay x^=b ay x|=b -ay -x,y -ax::by +[a] b->->*v d.e.*e +~!a !~b c+++ d--- e**f g&&&h +a**=b c//=d e%%=f +a+++b c---d +a<<<<=b c>>>>=d +a<<=b c<=<d e>>=f g>=>h +a==!=b c!==d +a^^=b c||=d e&&&=f +a|||b +a?:b +a===b c+=+d e-=fe-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v +a,b +a:::b diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 7e8c026cefb..4229868ae62 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,19 +1,16 @@ Foo() : x(0&42) { bar(x); } cout<<"Hello World?\n"<y x.y -!x ~x x++ x-- x*y x&y -x*y x/y x%y -x+y x-y -x<>y -xy x>=y -x==y x!=y -x&y -x^y -x|y -x&&y -x||y -x?y:z -x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y -x,y -x::y +[a] b->*v d.*e +~!a !~b c+ d- e**f g&&h +a*=b c/=d e%=f +a++b c--d +a<<=b c>>=d +a<=b c=f g>h +a!=b c=d +a^=b c|=d e&=f +a|b +a?:b +a==b c+d e-f g*h i/j k%l m<>p q&r s^t u|v +a,b +a:b diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index 23d5c8adf54..c5672a24cfc 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -1,19 +1,16 @@ Foo():x(0&&1){} cout<<"Hello World!\n"<b a.b -!a ~a a++ a-- a*b a&b -a*b a/b a%b -a+b a-b -a<>b -ab a>=b -a==b a!=b -a&b -a^b -a|b -a&&b +[a] b->v d.e +!a ~b c++ d-- e*f g&h +a*b c/d e%f +a+b c-d +a<>d +af g>=h +a==b c!=d +a^b c|d e&&f a||b -a?b:z -a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b -a,y +a?b +a=b c+=d e-=f g*=h i/=j k%=l m<<=n o>>=p q&=r s^=t u|=v +a,b a::b From patchwork Sun Oct 10 17:03:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06F12C4332F for ; Sun, 10 Oct 2021 17:03:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E446C60F58 for ; Sun, 10 Oct 2021 17:03:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232089AbhJJRFJ (ORCPT ); Sun, 10 Oct 2021 13:05:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231928AbhJJRFH (ORCPT ); Sun, 10 Oct 2021 13:05:07 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30CB9C061745 for ; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id e3so14266837wrc.11 for ; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=bmuO/jMfblS6AyGnS/YriWnEGzghNWl6SYQCUtVvYAM=; b=gP8xOFhZ9/BfRn4RKHF0RA4j9jabYwcJHMqRM5bItXHpXQkl13BWsjKqeHY412ypWW 57H49D/puResBjsh2qV7UZkLXMf7QFdFrTNrHcp3/EY7VDJFZWOrb8LCkcvICK0Lv4OV ogjAZQkqqa2LvgVoUjZi8t5w02kjAMc0/rdzQ+VuXumQkMUjE3y8B0rI1BHdeutM9s69 u4RM+lSOwlhZvwreUUlu/+uYJVs+Mc2kSibxcu5AYwRHgeIg27tRUCy13AjSLEIz+wTb E9d2nhKYegvv3urlDzl7/EKY0dIFXzml0lm2Gfn4Dm3gCnYnWDg5XK3M8E268ZQpvhFd Oryg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=bmuO/jMfblS6AyGnS/YriWnEGzghNWl6SYQCUtVvYAM=; b=RPZ/wIromNv8AJASXaKq4tgpoLEUj4lYi4QpiomI0hU4iThM1TmnApq9tXZSK4ajWG gN95OZsXIjJmY50aYKYhlu2BNKVNe5M4aHHOdVk8eHyK005yqeuQgBI9BkaAgca2AFek hTu7PmTSoREXcMkPCboErCefMxi329Meg1eqP3EgeQZrklTfAq0yK+P5XWeUbaRwMhUy SiDwgdeUNjIPWTHXvKa8YRSA3lV+ZMN3l2UnbOu1WV9my0hj9JwgQI93Qbcf25nhyxxn eOPxivp+b+OZFmE2m6c/L6xlNL4YMnnWZMaskIbzh7cV4XyKrBp45vN12rvbIUA0rlQ2 Icqw== X-Gm-Message-State: AOAM531dWvdLYQiw/FgTu7yfKI+5Im+rhv2kKLjcmli6Ct9T60g8JGre pOgpziyXj3/nD0haMRKhwH9s/NiVSKY= X-Google-Smtp-Source: ABdhPJyh5iBMuJIwzZjf5pJdfyKxgJKb2HBqfJghgrfSpp0rlSbEzcdbjKzJO3++dWs+msJE1mH5fA== X-Received: by 2002:a1c:a401:: with SMTP id n1mr16469255wme.162.1633885386724; Sun, 10 Oct 2021 10:03:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id t11sm5345651wrz.65.2021.10.10.10.03.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:06 -0700 (PDT) Message-Id: <5a84fc9cf715aec258d9cda2dd7d2e8eff2dc66c.1633885384.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 10 Oct 2021 17:03:00 +0000 Subject: [PATCH v3 2/6] t4034: add tests showing problematic cpp tokenizations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt The word regex is too loose and matches long streaks of characters that should actually be separate tokens. Add these problematic test cases. Separate the lines with text that will remain identical in the pre- and post-image so that the diff algorithm will not lump removals and additions of consecutive lines together. This makes the expected output easier to read. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 22 ++++++++++++++++++---- t/t4034/cpp/post | 18 ++++++++++++++++-- t/t4034/cpp/pre | 16 +++++++++++++++- 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 41976971b93..63e53a61e62 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,11 +1,25 @@ diff --git a/pre b/post -index c5672a2..4229868 100644 +index 1229cdb..3feae6f 100644 --- a/pre +++ b/post -@@ -1,16 +1,16 @@ -Foo() : x(0&&1&42) { bar(x); } +@@ -1,30 +1,30 @@ +Foo() : x(0&&1&42) { foo0bar(x.f.Find); } cout<<"Hello World!?\n"<(1) (-1e10) (0xabcdef) 'xy' +(1 -1e10+1e10 0xabcdef) 'xy' +// long double +3.141592653e-10l3.141592654e+10l +// float +120E5fE6f +// hex +0xdeadbeaf+80xdeadBeaf+7ULL +// octal +0123456701234560 +// binary +0b10000b1100+e1 +// expression +1.5-e+2+f1.5-e+3+f +// another one +str.e+65.e+75 [a] b->->*v d.e.*e ~!a !~b c+++ d--- e**f g&&&h a**=b c//=d e%%=f diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 4229868ae62..3feae6f430f 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,6 +1,20 @@ -Foo() : x(0&42) { bar(x); } +Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<*v d.*e ~!a !~b c+ d- e**f g&&h a*=b c/=d e%=f diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index c5672a24cfc..1229cdb59d1 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -1,6 +1,20 @@ -Foo():x(0&&1){} +Foo():x(0&&1){ foo0( x.find); } cout<<"Hello World!\n"<v d.e !a ~b c++ d-- e*f g&h a*b c/d e%f From patchwork Sun Oct 10 17:03:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A482C433EF for ; Sun, 10 Oct 2021 17:03:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EFC860F4B for ; Sun, 10 Oct 2021 17:03:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232094AbhJJRFK (ORCPT ); Sun, 10 Oct 2021 13:05:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231948AbhJJRFH (ORCPT ); Sun, 10 Oct 2021 13:05:07 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C34CAC061762 for ; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id r18so47890206wrg.6 for ; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ngZ0ar27WWg0VwBar7AyiNjC3RMv1OoTrNDR7TYg3MA=; b=Hpo32RVrHK4A+dmGWzn029ldCrAZm7mXtNNbwSviEuo2AUFRWjp1BxZ2kYJgfmbYTd LJX6tX6Qi/P3nXpSeSqtc8K6Lw74qyVB6DRXO1iI3iANoIoVuMUel00Ou1UP4yFVGg6W p7yAXqv3xQdinOEQWfIqUMwTRhVJB3JNcG5NziY9bQxOeIhXzrLFI6Ecui0EQxp+5/kz hRIhKot8c2cmWpOcbnVhLVvs2h1JkJA3v/cFekAAs7sGt4fnMv2fIRVfHAqyNPcxqRCO WAoxShzaq0j0B8T2++impyp49MzGvEeMIK1GkBQ0vbnLERLJGfiCTArd2cKbDbWo1xR8 kpuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ngZ0ar27WWg0VwBar7AyiNjC3RMv1OoTrNDR7TYg3MA=; b=hKrjrQk6A6+Wa9wppjaHdqZb6FuBACzclbHupjI1ZR2bUVpQniwUyS7CTA4SnjU6Hu 8nEwIxtUQsiR5sOgTjuWUhu+DRc3i/SUVD1CZdDIHmPm+AfzCZk5x0tmmjslrq8Vej6F /adjkaN9pl5RMBN7+q2r4WJUQ0N3Y4yDztHcOc2VYwTPCFLfTZtFW4j28tbd+OmLuwpT yFxGAPDn4o/iWsFVbpg+4wFKm0lqDtSNfP2A56Rxg+jvFN3nIKh3VNBHm8K9WB+AzYq1 N1CVVMte4NTc5VI5OuAIkBMsoYzUwMRWl90noaMWaSzZ2+k9SukiNgiI4N87eFoHfxCW 96rQ== X-Gm-Message-State: AOAM532/juB1EfKL9IuS+8F21F5KDftgMz6B3suLpTGn3djQLdLF2OLn 5+JQFnBzpaYXYuXmrIEv0hwnY8x9ryk= X-Google-Smtp-Source: ABdhPJwGWAsjA6bx4yOT84xBaCz4s2zB1zkSwVOAUiwDctNWYLE6PgpJ8MchCDz7EKdtiYfSXM/Vsg== X-Received: by 2002:adf:a390:: with SMTP id l16mr18845948wrb.104.1633885387409; Sun, 10 Oct 2021 10:03:07 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l11sm6501372wms.45.2021.10.10.10.03.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:07 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 10 Oct 2021 17:03:01 +0000 Subject: [PATCH v3 3/6] userdiff-cpp: tighten word regex Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 16 ++++++++-------- userdiff.c | 8 +++++++- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 63e53a61e62..46c9460a968 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -3,24 +3,24 @@ --- a/pre +++ b/post @@ -1,30 +1,30 @@ -Foo() : x(0&&1&42) { foo0bar(x.f.Find); } +Foo() : x(0&&1&42) { foo0bar(x.findFind); } cout<<"Hello World!?\n"<(1 -1e10+1e10 0xabcdef) 'xy' +(1 -+1e10 0xabcdef) 'xy' // long double 3.141592653e-10l3.141592654e+10l // float -120E5fE6f +120E5f120E6f // hex -0xdeadbeaf+80xdeadBeaf+7ULL +0xdeadbeaf0xdeadBeaf+8ULL7ULL // octal 0123456701234560 // binary 0b10000b1100+e1 // expression -1.5-e+2+f1.5-e+3+f +1.5-e+23+f // another one -str.e+65.e+75 -[a] b->->*v d.e.*e +str.e+6575 +[a] b->->*v d..*e ~!a !~b c+++ d--- e**f g&&&h a**=b c//=d e%%=f a+++b c---d @@ -30,6 +30,6 @@ a==!=b c!==d a^^=b c||=d e&&&=f a|||b a?:b -a===b c+=+d e-=fe-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v +a===b c+=+d e-=-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v a,b a:::b diff --git a/userdiff.c b/userdiff.c index d9b2ba752f0..ce2a9230703 100644 --- a/userdiff.c +++ b/userdiff.c @@ -54,8 +54,14 @@ PATTERNS("cpp", /* functions/methods, variables, and compounds at top level */ "^((::[[:space:]]*)?[A-Za-z_].*)$", /* -- */ + /* identifiers and keywords */ "[a-zA-Z_][a-zA-Z0-9_]*" - "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*" + /* decimal and octal integers as well as floatingpoint numbers */ + "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*" + /* hexadecimal and binary integers */ + "|0[xXbB][0-9a-fA-F]+[lLuU]*" + /* floatingpoint numbers that begin with a decimal point */ + "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?" "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Sun Oct 10 17:03:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78641C433F5 for ; Sun, 10 Oct 2021 17:03:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5F77160F58 for ; Sun, 10 Oct 2021 17:03:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232102AbhJJRFM (ORCPT ); Sun, 10 Oct 2021 13:05:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231972AbhJJRFI (ORCPT ); Sun, 10 Oct 2021 13:05:08 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6461CC061570 for ; Sun, 10 Oct 2021 10:03:09 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id y3so14727579wrl.1 for ; Sun, 10 Oct 2021 10:03:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=L/JFOu18bI6ZDnCyvpp97l+zqeYnrFVFqOP4uPer1+s=; b=kF1GLObII+JSxbpYxJQEuDL3XpLzA9AfVlrkm1OzsX2RiJ6/Tu+moxavuVuFCPPcGz Bs3XO135TOWhU8xLfO9bjyr2xCqIeow/wAOkpgj035Gb52bQHvl7MWVZkKBiAvXGjuRG FTxVqGdxIU7Lf7JqKO9EogFOW2FM8UdVI61+H/qst/bWwxQhzr4nl4YyGu4r2qJIX28v XRZ3HTUcIs4WaCmEwekpgHk3mxOc+k3V9C8z3MqbL6OqQyo4unewZSNG53L4e8PrF9pj J732dLFUpEHrIwvSpwtQQfmN6keALZMZeyX/M7l4hdSE+aB5Sbi6MD6zwBNHKgefHRv2 4PgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=L/JFOu18bI6ZDnCyvpp97l+zqeYnrFVFqOP4uPer1+s=; b=e0194nrw7eMfLp0Ev7VsILU0smt9j2l5fBiUvK62mf1eCLT0+dFFI8AU/Ot7DMBTUO uR2V+uvH1JRkDDszXbyU1NrYwg75ArfDFqUH/Rmf4EV0pslv2jpagbC7UKKcgfrYOUNJ TleEd6z7QE0ZocGuG74qUNmdfStDNoJdZBFUtThET/PwUQu8wovjKmcm3yA/1ZkjGoOw EXSM/k3oUtKqRhVdsfg0xTkdVzSiJfIS22y0AqMBMhn4cyOoFoyNbxr2+XAIOKRfpiRO 4AMaoz4HGlC0Yrn2300rWUDx/YaheLEroJeWfZwRXsallCWi+7v3KdnaN0XYusb22qPs lsiA== X-Gm-Message-State: AOAM5305ZhQsOmq6PTDebtiR4u/XWjCXZ1Zp5YHMQJW32rnJQfmyNExO oPtaX4yvIdjxYkJr3Rg81hpN1ge6Iys= X-Google-Smtp-Source: ABdhPJxnVTNo96umpu5oufxBSZHuApd2AvjXYKpRpfzAQB9WYBsUSRKvEOHu58MaZEYDpVROza3PAA== X-Received: by 2002:a1c:a443:: with SMTP id n64mr16451733wme.32.1633885387998; Sun, 10 Oct 2021 10:03:07 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c9sm18116423wmb.41.2021.10.10.10.03.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:07 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 10 Oct 2021 17:03:02 +0000 Subject: [PATCH v3 4/6] userdiff-cpp: prepare test cases with yet unsupported features Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt We are going to add support for C++'s digit-separating single-quote and the spaceship operator. By adding the test cases in this separate commit, the effect on the word highlighting will become more obvious as the features are implemented and the file cpp/expect is updated. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 14 +++++++------- t/t4034/cpp/post | 12 ++++++------ t/t4034/cpp/pre | 10 +++++----- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 46c9460a968..3d37ddac42c 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,21 +1,21 @@ diff --git a/pre b/post -index 1229cdb..3feae6f 100644 +index 144cd98..64e78af 100644 --- a/pre +++ b/post @@ -1,30 +1,30 @@ Foo() : x(0&&1&42) { foo0bar(x.findFind); } cout<<"Hello World!?\n"<(1 -+1e10 0xabcdef) 'xy' +(1 -+1e10 0xabcdef) 'x.' // long double -3.141592653e-10l3.141592654e+10l +3.141'592'653e-10l654e+10l // float 120E5f120E6f // hex -0xdeadbeaf0xdeadBeaf+8ULL7ULL +0xdead'beafBeaf+8ULL7ULL // octal -0123456701234560 +0123'45674560 // binary -0b10000b1100+e1 +0b100b11'00+e1 // expression 1.5-e+23+f // another one @@ -25,7 +25,7 @@ str.e+6575 a**=b c//=d e%%=f a+++b c---d a<<<<=b c>>>>=d -a<<=b c<=<d e>>=f g>=>h +a<<=b c<=<d e>>=f g>=>h i<=>j a==!=b c!==d a^^=b c||=d e&&&=f a|||b diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 3feae6f430f..64e78afbfb5 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,16 +1,16 @@ Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<>=d -a<=b c=f g>h +a<=b c=f g>h i<=>j a!=b c=d a^=b c|=d e&=f a|b diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index 1229cdb59d1..144cd980d6b 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -2,15 +2,15 @@ Foo():x(0&&1){ foo0( x.find); } cout<<"Hello World!\n"<>d -af g>=h +af g>=h i<=j a==b c!=d a^b c|d e&&f a||b From patchwork Sun Oct 10 17:03:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63849C4332F for ; Sun, 10 Oct 2021 17:03:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4EF68610C7 for ; Sun, 10 Oct 2021 17:03:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232132AbhJJRFN (ORCPT ); Sun, 10 Oct 2021 13:05:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232018AbhJJRFI (ORCPT ); Sun, 10 Oct 2021 13:05:08 -0400 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F36A0C06161C for ; Sun, 10 Oct 2021 10:03:09 -0700 (PDT) Received: by mail-wr1-x432.google.com with SMTP id g25so1348692wrb.2 for ; Sun, 10 Oct 2021 10:03:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=v5JL2AenaR8RqTcWSZSPuR0Nrlr+9pVk/9sW+zxWyI8=; b=Dy+F4hmZD0jk+A8vRcQnqdLGIynR0seJ/ycK5iVTEMJsMTW6OTtK1DKv1mhLVTNoiT oBeV2vrQcSZSXBEGH2lvEuVyDm4sl7X26S/QZKHhNcf5YB22SriFivlcmRc23426dY7N kTcYE5mKUsfRnep8i3qqa7WFPH+V774IvxYwIqtlC1F9ymhFQYJX5houqDla0D+M6c3j +D++sRf5kER7G4jlRZn0JbfUeYZvmqxrh2uXfjOjnk40PtaXMRJyAsOzV3q+nmYx/b1D WctPaLKz9wTiIhJ0jPCQn05DVvwBoaBcJN/IP+8T8QhqHsthbcLam+A/Hcfp+0gj13Ne xPUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=v5JL2AenaR8RqTcWSZSPuR0Nrlr+9pVk/9sW+zxWyI8=; b=aJZ6dxxP+A2vtMmDkI27hh9PBnu5mcyEF0W1xOTsXLb4JJAHcsVJv9I/IMlRFuKTSq 80S2FrDn3ntNT2kTB3aS9UEfd9X2gOFHdu8JOLlyRsVrjM0JYF2u335SFHpZodtVpiDx gjAnCsLHWDcD/aUFMTsXzbG/aKO3ZIPmJw3B9tDMYZoJf2eudK2VGMIeJ2T9PgQoLDI9 yrhgk5bXBlOHb4TIY7aeMhYA4AFVjQGKww6ZhfXkkMEG48vF/YS1ro+FJXvTuWTTfcDa j5Nn76K8y7gF8mFRXqdTaPYKn6Di6hCwi1Ydw/PpA8oRjwhhB4x1KJWwX4PLzCDkn3zB w5Aw== X-Gm-Message-State: AOAM5314nTKORYfn/6bdlbFAFaW3qpmVytILlV06NOombpmGM3XnYCmH 6uNJXtsv6XLwDZRQ5U0FdNAhWaeTAYY= X-Google-Smtp-Source: ABdhPJzGLPz/TWFVA2SJPthE9FvZjk/TtcnBWdRDgfnLkAW6sJTnF0fVkvvkQzBmj1PvPMzsZfnM8Q== X-Received: by 2002:a05:600c:19cf:: with SMTP id u15mr16114950wmq.45.1633885388595; Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id g1sm18420376wmk.2.2021.10.10.10.03.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Message-Id: <037c743d9e317ced040bf76bb979c6e17583de8a.1633885384.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Sun, 10 Oct 2021 17:03:03 +0000 Subject: [PATCH v3 5/6] userdiff-cpp: permit the digit-separating single-quote in numbers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++17, the single-quote can be used as digit separator: 3.141'592'654 1'000'000 0xdead'beaf Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 8 ++++---- userdiff.c | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 3d37ddac42c..b90b3f207bf 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -7,15 +7,15 @@ Foo() : x(0&&1&42) { foo0bar cout<<"Hello World!?\n"<(1 -+1e10 0xabcdef) 'x.' // long double -3.141'592'653e-10l654e+10l +3.141'592'653e-10l3.141'592'654e+10l // float 120E5f120E6f // hex -0xdead'beafBeaf+8ULL7ULL +0xdead'beaf0xdead'Beaf+8ULL7ULL // octal -0123'45674560 +0123'45670123'4560 // binary -0b100b11'00+e1 +0b10'000b11'00+e1 // expression 1.5-e+23+f // another one diff --git a/userdiff.c b/userdiff.c index ce2a9230703..5072d12e51e 100644 --- a/userdiff.c +++ b/userdiff.c @@ -57,11 +57,11 @@ PATTERNS("cpp", /* identifiers and keywords */ "[a-zA-Z_][a-zA-Z0-9_]*" /* decimal and octal integers as well as floatingpoint numbers */ - "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*" + "|[0-9][0-9.']*([Ee][-+]?[0-9]+)?[fFlLuU]*" /* hexadecimal and binary integers */ - "|0[xXbB][0-9a-fA-F]+[lLuU]*" + "|0[xXbB][0-9a-fA-F']+[lLuU]*" /* floatingpoint numbers that begin with a decimal point */ - "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?" + "|\\.[0-9][0-9']*([Ee][-+]?[0-9]+)?[fFlL]?" "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Sun Oct 10 17:03:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12548673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6FDFC433EF for ; Sun, 10 Oct 2021 17:03:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 999B660F58 for ; Sun, 10 Oct 2021 17:03:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232169AbhJJRFO (ORCPT ); Sun, 10 Oct 2021 13:05:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232031AbhJJRFJ (ORCPT ); Sun, 10 Oct 2021 13:05:09 -0400 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 902C6C061570 for ; Sun, 10 Oct 2021 10:03:10 -0700 (PDT) Received: by mail-wr1-x434.google.com with SMTP id m22so47962666wrb.0 for ; Sun, 10 Oct 2021 10:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=NWiHaj6BkTEYrLmidKAYbWVxJGxhNIUtxiV1PggnK0k=; b=i+09nfann7YY4DYBISPM+y2Ru6YePzd+oyBBh59UeeaTr38AyfnSv7PynyB4EbACbd MQONHdcDygTTwGYoE52yq3eqWAnZWLiEcp134DH49EWduQraCMcKLYg1KtNqNw2Bnd1E DjihZzn7oXi/FR4OiVyiidFA37eB7zsnsY/4d3806SDzcce0oUSWJwHYrVZ1HJsa/EuU AV+27WfNTiU2nlx7hcIxjpZAP0wxQAXC3jiQv3xo2rGusLvJHxVfyupqMouWv+6j2XdU Af9ng1SIzDGgVTE6sF/UZsnPkgeKkua5ngxNtyGgz8E7SyTe5VL8Z7YPWHMQ99JN7qTl ychQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=NWiHaj6BkTEYrLmidKAYbWVxJGxhNIUtxiV1PggnK0k=; b=R5ZPXeI2uPMR0rm+urrPjeP9yteUJd1k+qw+2+0kgcm8JXDIuvD7OMYpv+qva4NSYi hc3aWw+kzABcLD49t3GbP69MaTGzRTaVgwgIIxG3YscyG0sw+P91mPIJEN6MogGMoiQB l4PMz86/NhMCD56R3H12DU+yHBK+wsbpzrM23lyzCaDPdb3Pp7BkOAegzlSuT5aHwND+ ILuFEV0ZZDfARDf/GSZnXg06YZ1UcaUdIhvoZL7hbTMJW9o8hSphpkuPDKjsI43nKgaF gxfS26eTkA+BLp8ECqoPh7g+KV0iAiJMQhT7R78z0I0BYnbuGBu+bVCJx6Suf5yQTEH+ oTvQ== X-Gm-Message-State: AOAM532gR3rrmFTqzbf6KR7AwE09ViUwXpSEAQkPRwE+83iYTRCFVbCZ WnaBKOqxlc2IIT030PTlzgVlVAG3wh4= X-Google-Smtp-Source: ABdhPJzj7MWE/LF0bwX1h5K5O2I63MX7ipfs8u9k+cyMLTvEfpfpzLxpnwAvUuumEf5C5xCWCZx4hQ== X-Received: by 2002:a7b:cb4b:: with SMTP id v11mr16701897wmj.155.1633885389212; Sun, 10 Oct 2021 10:03:09 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a17sm5520355wrx.33.2021.10.10.10.03.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 10:03:08 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Sun, 10 Oct 2021 17:03:04 +0000 Subject: [PATCH v3 6/6] userdiff-cpp: learn the C++ spaceship operator Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++20, the language has a generalized comparison operator <=>. Teach the cpp driver not to separate it into <= and > tokens. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 2 +- userdiff.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index b90b3f207bf..5ff4ce477b4 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -25,7 +25,7 @@ str.e+6575 a**=b c//=d e%%=f a+++b c---d a<<<<=b c>>>>=d -a<<=b c<=<d e>>=f g>=>h i<=>j +a<<=b c<=<d e>>=f g>=>h i<=<=>j a==!=b c!==d a^^=b c||=d e&&&=f a|||b diff --git a/userdiff.c b/userdiff.c index 5072d12e51e..96adddd6f9a 100644 --- a/userdiff.c +++ b/userdiff.c @@ -62,7 +62,7 @@ PATTERNS("cpp", "|0[xXbB][0-9a-fA-F']+[lLuU]*" /* floatingpoint numbers that begin with a decimal point */ "|\\.[0-9][0-9']*([Ee][-+]?[0-9]+)?[fFlL]?" - "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), + "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"), PATTERNS("csharp", /* Keywords */ "!^[ \t]*(do|while|for|if|else|instanceof|new|return|switch|case|throw|catch|using)\n" From patchwork Sun Oct 24 09:56:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12580157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 320C7C433EF for ; Sun, 24 Oct 2021 09:56:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3F9460F11 for ; Sun, 24 Oct 2021 09:56:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230300AbhJXJ7H (ORCPT ); Sun, 24 Oct 2021 05:59:07 -0400 Received: from bsmtp.bon.at ([213.33.87.14]:22874 "EHLO bsmtp.bon.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229868AbhJXJ7G (ORCPT ); Sun, 24 Oct 2021 05:59:06 -0400 Received: from [192.168.0.98] (unknown [93.83.142.38]) by bsmtp.bon.at (Postfix) with ESMTPSA id 4HcYQN2BlPz5tlD; Sun, 24 Oct 2021 11:56:44 +0200 (CEST) Subject: [PATCH 7/6] userdiff-cpp: back out the digit-separators in numbers To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= , Johannes Sixt via GitGitGadget References: From: Johannes Sixt Message-ID: Date: Sun, 24 Oct 2021 11:56:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The implementation of digit-separating single-quotes introduced a note-worthy regression: the change of a character literal with a digit would splice the digit and the closing single-quote. For example, the change from 'a' to '2' is now tokenized as '[-a'-]{+2'+} instead of '[-a-]{+2+}'. The options to fix the regression are: - Tighten the regular expression such that the single-quote can only occur between digits (that would match the official syntax). - Remove support for digit separators. I chose to remove support, because - I have not seen a lot of code make use of digit separators. - If code does use digit separators, then the numbers are typically long. If a change in one of the segments occurs, it is actually better visible if only that segment is highlighted as the word that changed instead of the whole long number. This choice does introduce another minor regression, though, which is highlighted in the test case: when a change occurs in the second or later segment of a hexadecimal number where the segment begins with a digit, but also has letters, the segment is mistaken as consisting of a number and an identifier. I can live with that. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 12 ++++++------ t/t4034/cpp/post | 10 +++++----- t/t4034/cpp/pre | 8 ++++---- userdiff.c | 6 +++--- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 5ff4ce477b..dc500ae092 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,21 +1,21 @@ diff --git a/pre b/post -index 144cd98..64e78af 100644 +index a1a09b7..f1b6f3c 100644 --- a/pre +++ b/post @@ -1,30 +1,30 @@ Foo() : x(0&&1&42) { foo0bar(x.findFind); } cout<<"Hello World!?\n"<(1 -+1e10 0xabcdef) 'x.' +(1 -+1e10 0xabcdef) 'x2' // long double -3.141'592'653e-10l3.141'592'654e+10l +3.141592653e-10l3.141592654e+10l // float 120E5f120E6f // hex -0xdead'beaf0xdead'Beaf+8ULL7ULL +0xdead0xdeaf'1eaFeaf+8ULL7ULL // octal -0123'45670123'4560 +0123456701234560 // binary -0b10'000b11'00+e1 +0b10000b1100+e1 // expression 1.5-e+23+f // another one diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 64e78afbfb..f1b6f3c228 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,16 +1,16 @@ Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"), PATTERNS("csharp", /* Keywords */