From patchwork Fri Oct 8 19:09:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12546225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1608BC433F5 for ; Fri, 8 Oct 2021 19:10:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1A5960F6E for ; Fri, 8 Oct 2021 19:10:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240925AbhJHTL5 (ORCPT ); Fri, 8 Oct 2021 15:11:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231455AbhJHTLz (ORCPT ); Fri, 8 Oct 2021 15:11:55 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5FCBC061755 for ; Fri, 8 Oct 2021 12:09:59 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id r10so32684030wra.12 for ; Fri, 08 Oct 2021 12:09:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=yYJ/b49QpJ4fkZtCd4pL6lc0TKalwVrQgk6O0OkTjwk=; b=EJaMAC4RmuO7BXahBO/W85F7+zUXg04UZrZ1i70fkYVAGlewDeMlOpfpFtKTSCyROn T5blLShKALFnHym3LvMLNsc/pc/V2e2Z8ya0vipGmqyb5lGxWQlBI1hxtC83isvSJfbA Vh6/ImvqoasNn8uMKDKrrVcwbmd2NEFKayMPQpzYKqkKJrqSm3u2BlwBZOz9U8ulW3su RI2bJUBMXqj2ntLC3bCu/JGTDYGhDCbKaiEnt89CmZOHCgwgfePGphlAtDozBLMc0rZ2 qrTWIPRwQkHC16sISdTn7HXi+oigNrdeo871YtzYQzxjL7jdBAs7TQyMWaInBwJspLvv FwSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=yYJ/b49QpJ4fkZtCd4pL6lc0TKalwVrQgk6O0OkTjwk=; b=YysG6lNs5zFq+fjRak/9eiPh1JdXYgvyqil9LsmxnyGetTsSV5Xockn8ILEYMSGBfD k3UmgYYvMoLEsysnpE2Y4LJJEs4xOBze2mV7guPziq25TnIXGu5NWExhWckyyB78pbUC PmSXL60/SdJ2CvMi6MsqJjvxxpvn7mNXa9vw/MG/b51WvbRyTJWYb5g3o1WU85Iuit6r HYT5WKlVf1T0AOuefs6sm8OASZXZdIeNEBBAvxaK4JmxcVprun0IXd2M2bp+hYxD4Yno +BD0ubUS3D1yIcKG23doenzyPVtXSENXp29b2kLAtWwrsqevmXXXithLUArUx4KkRers bgFQ== X-Gm-Message-State: AOAM532yTJ/qHdH9yof937bA8VLFpuxqbvIsQZ52vD5L+A6vUt60lKoC d4MWgdK/Ux4FyzE196qp/TjKI5eqBCk= X-Google-Smtp-Source: ABdhPJwvQCXgjSFHhNqt6UoVMFoJ3OZXrziHxn8PB+jjy5jG3BKluQClLhnF0VjB0xgIQe/5AGBgxA== X-Received: by 2002:adf:b34c:: with SMTP id k12mr6174947wrd.1.1633720198441; Fri, 08 Oct 2021 12:09:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id d16sm12076090wmb.2.2021.10.08.12.09.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 12:09:58 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Fri, 08 Oct 2021 19:09:53 +0000 Subject: [PATCH v2 1/5] t4034/cpp: actually test that operator tokens are not split Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt 8d96e7288f2b (t4034: bulk verify builtin word regex sanity, 2010-12-18) added many tests with the intent to verify that operators consisting of more than one symbol are kept together. These are tested by probing a transition from, e.g., a!=b to x!=y, which results in the word-diff [-a-]{+x+}!=[-b-]{+y+} But that proves only that the letters and operators are separate tokens. To prove that != is an unseparable token, we have to probe a transition from, e.g., a=b to a!=b having a word-diff a[-=-]{+!=+}b that proves that the ! is not separate from the =. In the post-image, add to or remove from operators a character that turns it into another valid operator. Change the identifiers used around operators such that the diff algorithm does not have an incentive to match, e.g., a --- t/t4034/cpp/expect | 45 +++++++++++++++------------------------------ t/t4034/cpp/post | 29 +++++++++++++---------------- t/t4034/cpp/pre | 25 +++++++++++-------------- 3 files changed, 39 insertions(+), 60 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 37d1ea25870..41976971b93 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,36 +1,21 @@ diff --git a/pre b/post -index 23d5c8a..7e8c026 100644 +index c5672a2..4229868 100644 --- a/pre +++ b/post -@@ -1,19 +1,19 @@ +@@ -1,16 +1,16 @@ Foo() : x(0&&1&42) { bar(x); } cout<<"Hello World!?\n"<(1) (-1e10) (0xabcdef) 'xy' -[ax] ax->b ay x.by -!ax ~a ax x++ ax-- ax*b ay x&b -ay -x*b ay x/b ay x%b -ay -x+b ay x-b -ay -x<<b ay x>>b -ay -x<b ay x<=b ay x>b ay x>=b -ay -x==b ay x!=b -ay -x&b -ay -x^b -ay -x|b -ay -x&&b -ay -x||b -ay -x?by:z -ax=b ay x+=b ay x-=b ay x*=b ay x/=b ay x%=b ay x<<=b ay x>>=b ay x&=b ay x^=b ay x|=b -ay -x,y -ax::by +[a] b->->*v d.e.*e +~!a !~b c+++ d--- e**f g&&&h +a**=b c//=d e%%=f +a+++b c---d +a<<<<=b c>>>>=d +a<<=b c<=<d e>>=f g>=>h +a==!=b c!==d +a^^=b c||=d e&&&=f +a|||b +a?:b +a===b c+=+d e-=fe-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v +a,b +a:::b diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 7e8c026cefb..4229868ae62 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,19 +1,16 @@ Foo() : x(0&42) { bar(x); } cout<<"Hello World?\n"<y x.y -!x ~x x++ x-- x*y x&y -x*y x/y x%y -x+y x-y -x<>y -xy x>=y -x==y x!=y -x&y -x^y -x|y -x&&y -x||y -x?y:z -x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y -x,y -x::y +[a] b->*v d.*e +~!a !~b c+ d- e**f g&&h +a*=b c/=d e%=f +a++b c--d +a<<=b c>>=d +a<=b c=f g>h +a!=b c=d +a^=b c|=d e&=f +a|b +a?:b +a==b c+d e-f g*h i/j k%l m<>p q&r s^t u|v +a,b +a:b diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index 23d5c8adf54..c5672a24cfc 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -1,19 +1,16 @@ Foo():x(0&&1){} cout<<"Hello World!\n"<b a.b -!a ~a a++ a-- a*b a&b -a*b a/b a%b -a+b a-b -a<>b -ab a>=b -a==b a!=b -a&b -a^b -a|b -a&&b +[a] b->v d.e +!a ~b c++ d-- e*f g&h +a*b c/d e%f +a+b c-d +a<>d +af g>=h +a==b c!=d +a^b c|d e&&f a||b -a?b:z -a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b -a,y +a?b +a=b c+=d e-=f g*=h i/=j k%=l m<<=n o>>=p q&=r s^=t u|=v +a,b a::b From patchwork Fri Oct 8 19:09:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12546227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA182C433EF for ; Fri, 8 Oct 2021 19:10:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE65360F6E for ; Fri, 8 Oct 2021 19:10:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240598AbhJHTMA (ORCPT ); Fri, 8 Oct 2021 15:12:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231459AbhJHTL4 (ORCPT ); Fri, 8 Oct 2021 15:11:56 -0400 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42B68C061762 for ; Fri, 8 Oct 2021 12:10:00 -0700 (PDT) Received: by mail-wr1-x42d.google.com with SMTP id e12so32672400wra.4 for ; Fri, 08 Oct 2021 12:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=bmuO/jMfblS6AyGnS/YriWnEGzghNWl6SYQCUtVvYAM=; b=cfXO/TJ9lsPk5dQ+ePL+WbDTO3kj8rKBmg7FwBl4vt5Ru7RgtEaMZtxiVsP/5nmrVN roTCz2LMkHZWCVWDNXZSOq0B3vP+iH/pae0nu4pQYN5qlPRA+o1+E1v6CLpSx0BidQi1 fMHqsF5+F9FbarjaQVUkPNuTosprenZV2WrH5IuyNLBiv+LjkDxa8Bd5+7qpExXrdQLD XZ0BfwdsiNJhrgvivpK8leNvSR6ng1I7TMdswb2F+4EJYaMnM51zSL5G24lp+ZlB6BUu DZxdWZ8Za0dNah0c9QpJh5d+xOX/StJdB0UUR/IjFMI3t4t3NKkc/rdBX0zsAcKkMOwU UInA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=bmuO/jMfblS6AyGnS/YriWnEGzghNWl6SYQCUtVvYAM=; b=LbcKRBiZQI0wIet3Bf+vsaXmHTBfnqK4HCj7Ll6ILQE+lk76fet0nHnmOKVR1hZt0v Hrng4qoL7QNohRHSJcZXXE8MHF0RyD86wrj92cBtIZQujv42spOtesWgdrtAljHSbn/G tVtwd581pQsTOhlPtLgp767OGEHJGYTxTlwQS0UTGjGf0O09VxCg2YV5qfyr/DVF7qlx huzzgOKovanbj44gwI7JImT99+Q4jK3MNeK8jMDS07p0ClroexGueg1atqy9fxIarpBp 3XqfNuozHrvTugsGWQkHVyLKAsOWUhI/GSCp1XiONlCYqhOLAbFi59Ovf6DOGk8/bg/f oitw== X-Gm-Message-State: AOAM533S0y7wJVOxgXaGtDcQnM7z2YRNX88qRacUQxyVmB9xNdcU5Il5 34+AnxWjoQ5q1O5Cog1/0rMIuJNOjXw= X-Google-Smtp-Source: ABdhPJwF07DCKYxjEUeslsxrqr0qeAGzt5gzM+TXtR8M3+tKOZ+ye0oUwyGO6i675IsrMVNKapgZrQ== X-Received: by 2002:a05:6000:2a4:: with SMTP id l4mr6460288wry.221.1633720198950; Fri, 08 Oct 2021 12:09:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 61sm149499wrl.94.2021.10.08.12.09.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 12:09:58 -0700 (PDT) Message-Id: <5a84fc9cf715aec258d9cda2dd7d2e8eff2dc66c.1633720197.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 08 Oct 2021 19:09:54 +0000 Subject: [PATCH v2 2/5] t4034: add tests showing problematic cpp tokenizations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt The word regex is too loose and matches long streaks of characters that should actually be separate tokens. Add these problematic test cases. Separate the lines with text that will remain identical in the pre- and post-image so that the diff algorithm will not lump removals and additions of consecutive lines together. This makes the expected output easier to read. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 22 ++++++++++++++++++---- t/t4034/cpp/post | 18 ++++++++++++++++-- t/t4034/cpp/pre | 16 +++++++++++++++- 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 41976971b93..63e53a61e62 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,11 +1,25 @@ diff --git a/pre b/post -index c5672a2..4229868 100644 +index 1229cdb..3feae6f 100644 --- a/pre +++ b/post -@@ -1,16 +1,16 @@ -Foo() : x(0&&1&42) { bar(x); } +@@ -1,30 +1,30 @@ +Foo() : x(0&&1&42) { foo0bar(x.f.Find); } cout<<"Hello World!?\n"<(1) (-1e10) (0xabcdef) 'xy' +(1 -1e10+1e10 0xabcdef) 'xy' +// long double +3.141592653e-10l3.141592654e+10l +// float +120E5fE6f +// hex +0xdeadbeaf+80xdeadBeaf+7ULL +// octal +0123456701234560 +// binary +0b10000b1100+e1 +// expression +1.5-e+2+f1.5-e+3+f +// another one +str.e+65.e+75 [a] b->->*v d.e.*e ~!a !~b c+++ d--- e**f g&&&h a**=b c//=d e%%=f diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 4229868ae62..3feae6f430f 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -1,6 +1,20 @@ -Foo() : x(0&42) { bar(x); } +Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<*v d.*e ~!a !~b c+ d- e**f g&&h a*=b c/=d e%=f diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index c5672a24cfc..1229cdb59d1 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -1,6 +1,20 @@ -Foo():x(0&&1){} +Foo():x(0&&1){ foo0( x.find); } cout<<"Hello World!\n"<v d.e !a ~b c++ d-- e*f g&h a*b c/d e%f From patchwork Fri Oct 8 19:09:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12546229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 314F5C433F5 for ; Fri, 8 Oct 2021 19:10:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1257260F9C for ; Fri, 8 Oct 2021 19:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240920AbhJHTMB (ORCPT ); Fri, 8 Oct 2021 15:12:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240479AbhJHTL4 (ORCPT ); Fri, 8 Oct 2021 15:11:56 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D176BC061570 for ; Fri, 8 Oct 2021 12:10:00 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id t2so32643678wrb.8 for ; Fri, 08 Oct 2021 12:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=ngZ0ar27WWg0VwBar7AyiNjC3RMv1OoTrNDR7TYg3MA=; b=KxUsVvgtJdhU4xXqHRD70pjjYUiH7SbFPBPhyVrV5wsZ64ie9qmO5tkbuQMbpEMU/I I6gEi1roIbOn5H/q532Q3zsBL2PJG/uvMY2DIYhhTk+EtpdVWzmKTMjWeOhQSriMigQz rFcKkiGAEN4LMrXT1N2YEjCxnDFNrpXbbqXpZClrVNIDXOzM1YAdKxJtnm/g2iHA9WSF HmEQ+h/M/KjJ6i74skhar+dsjhwJ0YiNb1+oYWJxhObCFfgYCIwlvu/Y7lFuawPRTUib Tj+kGY5+6mjvPgdaQ3t5YGnuRwmtwRZikOqFj8e8/1OEsPN21oiOdHbAHGy4bW75amNA eX4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=ngZ0ar27WWg0VwBar7AyiNjC3RMv1OoTrNDR7TYg3MA=; b=ivUIVRvO6hLOp3cU8B8bctPnerAw5saRTrVZRQ0M8se+TM4hppBrZk0OGK00daP8X+ uJq9AQLc8MHfaL0H7zRobRxMFvGetSPV6djKCH2rOf8ZvcUtzp8GU9CxalueJ54E62l3 v+x1Izf/2eMdt13eHuAmRSHgssvT5zH03JpRg1Ouz8785TqoLgBgObiienbYc7zAR69d lQkFe2n8UglelrRx1ZJRCaAsRzsZZAJy8iwnO33b0dkfjpASh0R90V/BO/HtxKtnNIZg nkxAyeSPb8rY8Qsrg4MII7qDyICGmghDcdau8nRE6CIKXITNqShdrxse/jYSKzuF406b SIeA== X-Gm-Message-State: AOAM532vaC3W1n5uZxRJ3J19ASmbASpX2/UBet00g1csm1uqUrLz7O3f DRkzEvxdOQAEWYMBLUehgegZjPxkQ0Y= X-Google-Smtp-Source: ABdhPJy/s209MtRInX4nEyxJNafgjotzi34myCAc93z8nfFziZck+pB2BuZEtSSvLnLS9MovyLTpPA== X-Received: by 2002:a5d:6b03:: with SMTP id v3mr6257124wrw.226.1633720199492; Fri, 08 Oct 2021 12:09:59 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x17sm174306wrc.51.2021.10.08.12.09.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 12:09:59 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Fri, 08 Oct 2021 19:09:55 +0000 Subject: [PATCH v2 3/5] userdiff-cpp: tighten word regex Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Generally, word regex can be written such that they match tokens liberally and need not model the actual syntax because it can be assumed that the regex will only be applied to syntactically correct text. The regex for cpp (C/C++) is too liberal, though. It regards these sequences as single tokens: 1+2 1.5-e+2+f and the following amalgams as one token: .l as in str.length .f as in str.find .e as in str.erase Tighten the regex in the following way: - Accept + and - only in one position in the exponent. + and - are no longer regarded as the sign of a number and are treated by the catcher-all that is not visible in the driver's regex. - Accept a leading decimal point only when it is followed by a digit. For readability, factor hex- and binary numbers into an own term. As a drive-by, this fixes that floating point numbers such as 12E5 (with upper-case E) were split into two tokens. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 16 ++++++++-------- userdiff.c | 8 +++++++- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 63e53a61e62..46c9460a968 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -3,24 +3,24 @@ --- a/pre +++ b/post @@ -1,30 +1,30 @@ -Foo() : x(0&&1&42) { foo0bar(x.f.Find); } +Foo() : x(0&&1&42) { foo0bar(x.findFind); } cout<<"Hello World!?\n"<(1 -1e10+1e10 0xabcdef) 'xy' +(1 -+1e10 0xabcdef) 'xy' // long double 3.141592653e-10l3.141592654e+10l // float -120E5fE6f +120E5f120E6f // hex -0xdeadbeaf+80xdeadBeaf+7ULL +0xdeadbeaf0xdeadBeaf+8ULL7ULL // octal 0123456701234560 // binary 0b10000b1100+e1 // expression -1.5-e+2+f1.5-e+3+f +1.5-e+23+f // another one -str.e+65.e+75 -[a] b->->*v d.e.*e +str.e+6575 +[a] b->->*v d..*e ~!a !~b c+++ d--- e**f g&&&h a**=b c//=d e%%=f a+++b c---d @@ -30,6 +30,6 @@ a==!=b c!==d a^^=b c||=d e&&&=f a|||b a?:b -a===b c+=+d e-=fe-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v +a===b c+=+d e-=-f g*=*h i/=/j k%=%l m<<=<<n o>>=>>p q&=&r s^=^t u|=|v a,b a:::b diff --git a/userdiff.c b/userdiff.c index d9b2ba752f0..ce2a9230703 100644 --- a/userdiff.c +++ b/userdiff.c @@ -54,8 +54,14 @@ PATTERNS("cpp", /* functions/methods, variables, and compounds at top level */ "^((::[[:space:]]*)?[A-Za-z_].*)$", /* -- */ + /* identifiers and keywords */ "[a-zA-Z_][a-zA-Z0-9_]*" - "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lLuU]*" + /* decimal and octal integers as well as floatingpoint numbers */ + "|[0-9][0-9.]*([Ee][-+]?[0-9]+)?[fFlLuU]*" + /* hexadecimal and binary integers */ + "|0[xXbB][0-9a-fA-F]+[lLuU]*" + /* floatingpoint numbers that begin with a decimal point */ + "|\\.[0-9]+([Ee][-+]?[0-9]+)?[fFlL]?" "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Fri Oct 8 19:09:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12546231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5E88C433EF for ; Fri, 8 Oct 2021 19:10:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C79A260F6E for ; Fri, 8 Oct 2021 19:10:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240937AbhJHTMG (ORCPT ); Fri, 8 Oct 2021 15:12:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240771AbhJHTL5 (ORCPT ); Fri, 8 Oct 2021 15:11:57 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F9B9C061755 for ; Fri, 8 Oct 2021 12:10:01 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id m22so32784633wrb.0 for ; Fri, 08 Oct 2021 12:10:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=TprPzN7R7X3HwMszuUqbxuwew16H0gOGT7C4dUNg7yI=; b=VUx4/bKOeCbslG8Yl5evi95/V6lV9BSmGOByTT4OdzuRAadKBoKDs+dR22d486WAma bZSF4suvcaeZ5RsQ2FFd1FKr7lR6hOmBRG8T6UW0DuOsVC7Ew0n16f49RHuh9ghCtirq tYkKiHPiOk8Xs6hos1rsEH1wsEiiRcirtWRMPmM85jzMKCGcaTy0QcnaBSGHibVK5uZO nSLWVIacRALZuxV3LVpII6qfSODFHQ+kbk8rDPW2TKX3qLdXiAp/VUAvtLzigHQpV9UU 8ttBzDr2IHqjZ/nVjjkmD/IDa1WyLqfGqp3CpRFQGOknKLw80VCBMuSMODbAweJh96HI Lrqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=TprPzN7R7X3HwMszuUqbxuwew16H0gOGT7C4dUNg7yI=; b=Pk3gD6/jyd3+Q6zuinWEiLvUPBnkKwBtj5WjDZtCRPOAMMvnEkIdtRt9nNtGdJ5esE zQnNy7vYpJWJ4ST4+mJhXJ86+H3nmmeuQEG0l382bNDLCQQDleJakA+J/phUEviBitb0 1AMtpFNSvv/NIhTKIB7j091IN6k3RFjNeNqmeDIA5jcwyuhgKc2UT6KsxXwx3BfXvx8M oUgFA7lvGaim4pWLRKkv9wQa1Gq2GpWUFqKmfaYPzSM2eafcHm+6opyvtEkr8MZK04ot ZE5m3q/+oOqvf1dXwc1wkS80Wsk5BhwqBTAQDKUJ1d39amK0/49h3X6MhQ7aunro3ck7 Ht3g== X-Gm-Message-State: AOAM532YBSqTZ/TSUsw3EfLefe+Xq3m1FMIZBHwzML/ueq6ORhuSTLwz XpFujIy7us9fUTwWKuqIOoqalrkHrEQ= X-Google-Smtp-Source: ABdhPJz2spvCIU+0ggddfza6NXeaaMMyuLNHW+SIlk8EWtZyFzIBHCCdPzUgkhVmC+oLQMB0CfQaxw== X-Received: by 2002:a1c:2c2:: with SMTP id 185mr5406683wmc.85.1633720199985; Fri, 08 Oct 2021 12:09:59 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r27sm163550wrr.70.2021.10.08.12.09.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 12:09:59 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Fri, 08 Oct 2021 19:09:56 +0000 Subject: [PATCH v2 4/5] userdiff-cpp: permit the digit-separating single-quote in numbers Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++17, the single-quote can be used as digit separator: 3.141'592'654 1'000'000 0xdead'beaf Make it known to the word regex of the cpp driver, so that numbers are not split into separate tokens at the single-quotes. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 10 +++++----- t/t4034/cpp/post | 8 ++++---- t/t4034/cpp/pre | 8 ++++---- userdiff.c | 6 +++--- 4 files changed, 16 insertions(+), 16 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index 46c9460a968..a3a234f5461 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,5 +1,5 @@ diff --git a/pre b/post -index 1229cdb..3feae6f 100644 +index 60f3640..f6fbf7b 100644 --- a/pre +++ b/post @@ -1,30 +1,30 @@ @@ -7,15 +7,15 @@ Foo() : x(0&&1&42) { foo0bar cout<<"Hello World!?\n"<(1 -+1e10 0xabcdef) 'xy' // long double -3.141592653e-10l3.141592654e+10l +3.141'592'653e-10l3.141'592'654e+10l // float 120E5f120E6f // hex -0xdeadbeaf0xdeadBeaf+8ULL7ULL +0xdead'beaf0xdead'Beaf+8ULL7ULL // octal -0123456701234560 +0123'45670123'4560 // binary -0b10000b1100+e1 +0b10'000b11'00+e1 // expression 1.5-e+23+f // another one diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index 3feae6f430f..f6fbf7bc04c 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -2,15 +2,15 @@ Foo() : x(0&42) { bar(x.Find); } cout<<"Hello World?\n"<%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), PATTERNS("csharp", /* Keywords */ From patchwork Fri Oct 8 19:09:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Johannes Sixt X-Patchwork-Id: 12546233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A810BC433FE for ; Fri, 8 Oct 2021 19:10:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A39B60F4F for ; Fri, 8 Oct 2021 19:10:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240962AbhJHTMH (ORCPT ); Fri, 8 Oct 2021 15:12:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240776AbhJHTL5 (ORCPT ); Fri, 8 Oct 2021 15:11:57 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB235C061570 for ; Fri, 8 Oct 2021 12:10:01 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id i12so20042709wrb.7 for ; Fri, 08 Oct 2021 12:10:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:in-reply-to:references:from:date:subject:fcc :content-transfer-encoding:mime-version:to:cc; bh=tKWx5XFWAWGUG2fX/3dZLci7iB3dxtBBhmwN1/cOdiw=; b=S8L39ahdtAL/rM8LNXPScKzNOjLayzSGRZ/EaCU08QyFABBVpX/TO1FtFx8ZkPsDvL iWKoLw45LblX8ygkdDJEkbIzMyYAGTd+hjF9ufaTa2gU24snpt3WYRBJuL+UUu8niNv7 k144+MVZoS8qCmr0+xBSoAQP1pX5EmF6092yLBSlzX3IjbtnzqWL4tFxPpm2P22WlWvj lQemMn0orNfR4lUFjmIUG3NEQY2EPuQ1HlE0NGQL1iVRl9ZcruhvNXT4r2fInjwW37NV I+2Iv6RBa5SWn3KHh41t31S3ef0pltdR8fVCqN4+wVfOqxDF1no9pztGUrA1aDWhnEwb MoyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:in-reply-to:references:from:date :subject:fcc:content-transfer-encoding:mime-version:to:cc; bh=tKWx5XFWAWGUG2fX/3dZLci7iB3dxtBBhmwN1/cOdiw=; b=Zsxjv4/9rPueTT95I4JxemDn9/37m06/Bo5dQXFV+mW+x2z2b5t/6fcWOgQxeESNwX oaemjuH1qwNn8mxV5UNbm3w/T+JLym1H7Z3wAi5yiUBus3W68ErZQK90f51NoffktnOM ExNVoa8b6hZjUk3GUv/7PHcNUsew8g5zrbY/Bef3MVVPNacce3OK6Kq9/yebVK4YKLrD ZoZn7QM/SN23M2LvBNrjU9NZHDTQVG2qvBIp6d+0aQ0s4zgCSRzw1UKoHZPoFcIKfobg WeRpOAXWGoUYzZz9U8Nxmjdrqj6nGD2NqAUukLjK2DCLtOTn+a9401TOZPSbG03cddQS eJUw== X-Gm-Message-State: AOAM530W0OL1u7L7tbRngl3sjZoWB65Kz6GrsmrbzbzXvtlxXLsoMJ0j aXQWf1axejNMd/4/zi1dVZfmUvOR5bg= X-Google-Smtp-Source: ABdhPJygMKIB3uB08n6KQ9/a6DPXYl59uR8FMiL7PY2iBpxW4Pnv2NLwTUfmkwHjuOu9KOY+/kzfXA== X-Received: by 2002:a5d:6b46:: with SMTP id x6mr6293323wrw.192.1633720200538; Fri, 08 Oct 2021 12:10:00 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c132sm13009381wma.22.2021.10.08.12.10.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 12:10:00 -0700 (PDT) Message-Id: <43a701f5ffd899ae56b2db0fb865e37dd2bb4e07.1633720197.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Fri, 08 Oct 2021 19:09:57 +0000 Subject: [PATCH v2 5/5] userdiff-cpp: learn the C++ spaceship operator Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Johannes Sixt , Johannes Sixt Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Johannes Sixt From: Johannes Sixt Since C++20, the language has a generalized comparison operator <=>. Teach the cpp driver not to separate it into <= and > tokens. Signed-off-by: Johannes Sixt --- t/t4034/cpp/expect | 4 ++-- t/t4034/cpp/post | 2 +- t/t4034/cpp/pre | 2 +- userdiff.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/t/t4034/cpp/expect b/t/t4034/cpp/expect index a3a234f5461..bf3cd2abc74 100644 --- a/t/t4034/cpp/expect +++ b/t/t4034/cpp/expect @@ -1,5 +1,5 @@ diff --git a/pre b/post -index 60f3640..f6fbf7b 100644 +index 144cd98..244f79c 100644 --- a/pre +++ b/post @@ -1,30 +1,30 @@ @@ -25,7 +25,7 @@ str.e+6575 a**=b c//=d e%%=f a+++b c---d a<<<<=b c>>>>=d -a<<=b c<=<d e>>=f g>=>h +a<<=b c<=<d e>>=f g>=>h i<=<=>j a==!=b c!==d a^^=b c||=d e&&&=f a|||b diff --git a/t/t4034/cpp/post b/t/t4034/cpp/post index f6fbf7bc04c..244f79c9900 100644 --- a/t/t4034/cpp/post +++ b/t/t4034/cpp/post @@ -20,7 +20,7 @@ str.e+75 a*=b c/=d e%=f a++b c--d a<<=b c>>=d -a<=b c=f g>h +a<=b c=f g>h i<=>j a!=b c=d a^=b c|=d e&=f a|b diff --git a/t/t4034/cpp/pre b/t/t4034/cpp/pre index 60f3640d773..144cd980d6b 100644 --- a/t/t4034/cpp/pre +++ b/t/t4034/cpp/pre @@ -20,7 +20,7 @@ str.e+65 a*b c/d e%f a+b c-d a<>d -af g>=h +af g>=h i<=j a==b c!=d a^b c|d e&&f a||b diff --git a/userdiff.c b/userdiff.c index 1b640c7df79..13cec0b48db 100644 --- a/userdiff.c +++ b/userdiff.c @@ -62,7 +62,7 @@ PATTERNS("cpp", "|0[xXbB][0-9a-fA-F']+[lLuU]*" /* floatingpoint numbers that begin with a decimal point */ "|\\.[0-9']+([Ee][-+]?[0-9]+)?[fFlL]?" - "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*"), + "|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->\\*?|\\.\\*|<=>"), PATTERNS("csharp", /* Keywords */ "!^[ \t]*(do|while|for|if|else|instanceof|new|return|switch|case|throw|catch|using)\n"