GAEN Protocol Metadata Deanonymization and Risk-score Inflation Issues (CVE-2020-24722)

Summary

The TX Power value in the metadata in the beacon of the GAEN protocol
used by the corona/contact tracing app allows for attackers to
influence risk-score calculations in their favor, the same metadata
can also be used to deanonymize diagnosed users based on the type of
phone they are using.

Intro: GAEN Metadata in a nutshell

The beacon sent out by the protocol contains some metadata. This
metadata is encrypted with the Associated Encrypted Metadata Key
(AEMK), which is derived from the current TEK as follows:

        AEMK = HDKF(TEK, NULL, "EN-AEMK", 16)

The metadata is only 4 bytes in length and is currently specified as follows:
Byte 0 — Versioning. - Bits 7:6 — Major version (01).
                     - Bits 5:4 — Minor version (00).
                     - Bits 3:0 — Reserved for future use.
Byte 1 — Transmit power level. - This is the measured radiated
      transmit power of Bluetooth Advertisement packets,
      and is used to improve distance approximation. The
      range of this field shall be -127 to +127 dBm.
Byte 2-3 — Reserved for future use.

The four bytes of metadata are encrypted like this:

    AEM=aes128ctr(AEMK,RPI,Metadata)

Here, AEMK is used as the key, RPI is used as the initialization
vector (IV), and Metadata as the plaintext input to an AES128 in a
Counter Mode-like construction. Counter Mode acts as a stream cipher
in this construction and only uses the first 4 bytes of its output,
and thus no padding is needed. However AES in Counter Mode does not
provide authenticity of the encrypted content. This means that a
re(p)laying attacker can flip bits in a controlled way[1].

[1] https://lasec.epfl.ch/people/vaudenay/swisscovid/swisscovid-ana.pdf p 12, section 3.5

Issue 1: Increasing risk scores by bit-flipping TX Power in the Metadata

The problem is that the TX power value is not uniform furthermore it
can be known for transmitting devices and can thus be manipulated
through bit-flipping to make re(p)layed beacons appear closer than in
reality and thus increasing the risk-score of the victim.

There are multiple variants of this attack based on the knowledge of
the attacker about the re(p)layed beacon sending device:

  1/ if the attacker knows the exact type of the sending device the TX
  power value is known, and can thus be safely manipulated to the
  lowest known valid dB value which is -46dB. With the bulk of
  android devices having a TX Power value of more than -30dB this is
  a significant change.

  2/ If the attacker knows some probability distribution of possible
  sending devices it is possible to calculate the odds of bitflips
  which have a high chance of flipping the TX Power value to a lower
  but still valid value.

  3/ If the attacker has no knowledge of the sending device it is
  still possible to calculate the most hopeful combination of bitflips
  leading to an increase in risk-scoring. An example of such a
  calculation is appended in C source code to this report. In this
  case there is still a ~65% chance (by flipping bits 2 and 3) to make
  the received beacons appear closer than they really are - while the
  manipulated TX power value will flip to a value that is valid for
  some other device.

  4/ If the attacker does not care about TX Power values that do not
  correspond to existing devices, it is also possible to flip bit 6 of
  the TX power and thus generate a value that is not associated with
  any device, but which will with very high probability register as a
  very close contact.

An attacker may also send multiple attempts flipping various bits; the
validation during checking might weed out the invalid ones, and keep
the valid but stronger values. This is speculation and depends on the
implementation handling multiple Beacons with the same RPI but
different AEM values and the validation of these in case they are
decrypted with the Diagnosis key.

Impact: A re(p)laying attacker can can improve his attack by
influencing the risk-score calculation.

Issue 2: Deanonymizing Diagnosed users based on devices with unique or uncommon TX Power values

The value of the Transmit power level is a device-dependant value[2];
values for Android phones are published by Google[3]. Some of these
values are unique to a specific device, as in the case of the Pixel
3a, SM-A510M, SM-G610F, and SM-J510F. This means that diagnosed
victims with these devices are easier to deanonymize, if they are
known to be in possession of such a device. This attack can be
extended to a probabilistic one when combined with statistical
information on which devices are in common use in a certain geographic
location (for example based on regional sales statistics or local
fashion trends).

The GAEN system sets the TX power based on the device and includes
this information in the AEM metadata[4][5]. By combining beacons of
infected persons in conjunction with an infected person's diagnosis
key it is possible to infer which device these people were using.

In the bar graph in the attached image[6], the X-axis shows the value of TX power sent,
and in the Y-axis shows how many different devices set this
value. Devices with low values on the Y-axis are TX values that should
be easy to identify. Notably, the Pixel 3a, SM-A510M, SM-G610F and
SM-J510F all have unique values. If these values are seen in the
Metadata, the sending device can be unambiguously identified.

Impact: Infected persons can be easier (sometimes trivially)
deanonymized by entities having collected their beacons based on the
TX Power value in the Metadata.

[2] https://developers.google.com/android/exposure-notifications/ble-attenuation-computation
[3] https://developers.google.com/android/exposure-notifications/files/en-calibration-2020-06-13.csv
[4] https://developers.google.com/android/exposure-notifications/ble-attenuation-overview
[5] https://blog.google/documents/70/Exposure_Notification_-_Bluetooth_Specification_v1.2.2.pdf
[6] https://blog.radicallyopensecurity.com/txpowerdistribution.png

Timeline

2020 Jun  5 - Vaudenay, Vuagnoux report mentions malleability of Metadata
2020 Aug 27 - Request by client to disclose these issues to Google,
              Apple and the dutch National Cyber-Security Center (NCSC)
2020 Aug 28 - Full Report shared with Dutch Ministry of Health
2020 Aug 28 - Submitted to the Google Android Security Bugtracker
2020 Aug 28 - Apple acknowledges receipt of report
2020 Aug 31 - NCSC confirmation of receipt
2020 Sep  1 - Google assigns issue to the Android engineering team.
2020 Sep  7 - Google responds to issue I- see below
2020 Sep 29 - Full Report gets published
2020 Oct 05 - Detailed write-up of CVE-2020-24722 gets published

Google Response (2020 Sep 7)

> Thanks again for your report. We understand and have analyzed the
> concern that Associated Encrypted Metadata (AEM) TX (transmit) power
> could be altered as part of a relay attack. We do not believe that
> TX power authentication would be a useful defense against relay
> attacks for the following reasons:
>
> * TX power authentication doesn't protect against the use of a
>   high-power antenna positioned to cover a large area with a signal
>   strength indicative of proximity. This is because the system is
>   designed to be highly tolerant of differences between TX-power and
>   received signal strength indicator (RSSI) caused, for example, by
>   both devices being in-pocket. The Android ecosystem also has a
>   diverse range of hardware: some Android devices transmit signals
>   as much as 30dB weaker than others. The required system tolerance
>   to differences in transmit power means that relaying a packet from
>   such a weak device on a higher transmit power device would already
>   allow the higher power device to appear nearby without altering
>   the TX field.
>
> * TX power authentication would not prevent the deployment of
>   malware on phones to perform relay packets; app-store and OS
>   policy are more effective to prevent collection of EN packets by
>   third party apps.
>
> * In this context, encrypting the AEM prevents an adversary from
>   joining between RPIs coming from the same device (using
>   device-specific TX power as a source of entropy).
>
> Please see
> https://github.com/google/exposure-notifications-internals/blob/main/en-risks-and-mitigations-faq.md#additional-considerations
> for additional information.

Appendix I - example calculation of best bitflips

```
#include <stdio.h>

typedef struct {
  char tx;
  int cnt;
} TX_Count;

// source: https://developers.google.com/android/exposure-notifications/files/en-calibration-2020-06-13.csv
// cnt is count of all items with the given tx value
TX_Count calib[] = {{-46, 3}, {-45, 2}, {-41, 32}, {-40, 5}, {-39, 1}, {-38, 4}, {-37, 11},
                  {-36, 15}, {-35, 75}, {-34, 11}, {-33, 223}, {-32, 22}, {-31, 83},
                  {-30, 851}, {-29, 76}, {-28, 262}, {-27, 522}, {-26, 1025}, {-25, 1414},
                  {-24, 2075}, {-23, 35}, {-22, 245}, {-21, 67}, {-20, 1037}, {-19, 1371},
                  {-18, 255}, {-17, 14}, {-16, 32}, {-15, 10}, {-14, 2}, {-13, 1}, {-12, 16},
                  {-9, 16}, {-7, 6}, {-6, 27}, {-5, 4}, {-3, 12}, {-2, 5}, {-1, 1}, {0, 27}, {2, 1},
                  {0,0}};

int main(void) {
  int bits[7] = {0};
  int i, j, k, total = 0;
  // calculate total of all cnt values
  for(i=0;calib[i].cnt!=0;i++)
    total+=calib[i].cnt;
  // test each tx_count in calib
  for(i=0;calib[i].cnt!=0;i++) {
    for(j=0;j<(int)sizeof bits;j++) {
      char flipped = calib[i].tx ^ (1 << j);
#ifdef noisy
      if(flipped < calib[i].tx) bits[j]+=calib[i].cnt;
#else
      for(k=0;calib[k].cnt!=0;k++) {
        if(calib[k].tx == flipped) {
          if(calib[k].tx < calib[i].tx) bits[j]+=calib[i].cnt;
          break;
        }
      }
#endif
    }
  }
  for(i=0;i<7;i++) {
    printf("bit %d: %d (%2.3f%%)\n", i, bits[i], (bits[i]/(float)total)*100);
  }
  return 0;
}

// outputs
// bit 0: 3910 (39.511%)
// bit 1: 4205 (42.492%)
// bit 2: 6306 (63.723%)
// bit 3: 5375 (54.315%)
// bit 4: 132 (1.334%)
// bit 5: 74 (0.748%)
// bit 6: 0 (0.000%)
// flipping bits 2+3 from 00 to 11 has a chance of: 34.61114745% increasing risk by 12db
//               2+3 from 10 to 01 has a chance of 29.111852549999995% increasing risk by 4db
//               2+3 from 01 to 10 has a chance of 19.703852550000004% decreasing risk by 4db
//               2+3 from 11 to 00 has a chance of 16.57314745% decreasing risk by 12db


// flipping bits 2+3 from 00 to 11: 34.61114745% -12db
//               10 to 01: 29.111852549999995% -4db
//               01 to 10: 19.703852550000004% +4db
//               11 to 00: 16.57314745% +12db

// noisy variant
// bit 0: 3976 (40.178%)
// bit 1: 4290 (43.351%)
// bit 2: 6306 (63.723%)
// bit 3: 5499 (55.568%)
// bit 4: 514 (5.194%)
// bit 5: 9486 (95.857%)
// bit 6: 9868 (99.717%)
```