Difference between revisions of "RFC8771"

From RFC-Wiki
(Created page with " Independent Submission A. Mayrhofer Request for Comments: 8771 nic.at GmbH Category: Experimental...")
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 

 

 
 
  
 
Independent Submission                                      A. Mayrhofer
 
Independent Submission                                      A. Mayrhofer
Line 7: Line 5:
 
Category: Experimental                                          J. Hague
 
Category: Experimental                                          J. Hague
 
ISSN: 2070-1721                                                  Sinodun
 
ISSN: 2070-1721                                                  Sinodun
                                                            1 April 2020
+
                                                        1 April 2020
 
 
  
 
The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO)
 
The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO)
  
Abstract
+
'''Abstract'''
  
  Domain Names were designed for humans, IP addresses were not.  But
+
Domain Names were designed for humans, IP addresses were not.  But
  more than 30 years after the introduction of the DNS, a minority of
+
more than 30 years after the introduction of the DNS, a minority of
  mankind persists in invading the realm of machine-to-machine
+
mankind persists in invading the realm of machine-to-machine
  communication by reading, writing, misspelling, memorizing,
+
communication by reading, writing, misspelling, memorizing,
  permuting, and confusing IP addresses.  This memo describes the
+
permuting, and confusing IP addresses.  This memo describes the
  Internationalized Deliberately Unreadable Network NOtation
+
Internationalized Deliberately Unreadable Network NOtation
  ("I-DUNNO"), a notation designed to replace current textual
+
("I-DUNNO"), a notation designed to replace current textual
  representations of IP addresses with something that is not only more
+
representations of IP addresses with something that is not only more
  concise but will also discourage this small, but obviously important,
+
concise but will also discourage this small, but obviously important,
  subset of human activity.
+
subset of human activity.
  
Status of This Memo
+
'''Status of This Memo'''
  
  This document is not an Internet Standards Track specification; it is
+
This document is not an Internet Standards Track specification; it is
  published for examination, experimental implementation, and
+
published for examination, experimental implementation, and
  evaluation.
+
evaluation.
  
  This document defines an Experimental Protocol for the Internet
+
This document defines an Experimental Protocol for the Internet
  community.  This is a contribution to the RFC Series, independently
+
community.  This is a contribution to the RFC Series, independently
  of any other RFC stream.  The RFC Editor has chosen to publish this
+
of any other RFC stream.  The RFC Editor has chosen to publish this
  document at its discretion and makes no statement about its value for
+
document at its discretion and makes no statement about its value for
  implementation or deployment.  Documents approved for publication by
+
implementation or deployment.  Documents approved for publication by
  the RFC Editor are not candidates for any level of Internet Standard;
+
the RFC Editor are not candidates for any level of Internet Standard;
  see Section 2 of RFC 7841.
+
see Section 2 of [[RFC7841|RFC 7841]].
  
  Information about the current status of this document, any errata,
+
Information about the current status of this document, any errata,
  and how to provide feedback on it may be obtained at
+
and how to provide feedback on it may be obtained at
  https://www.rfc-editor.org/info/rfc8771.
+
https://www.rfc-editor.org/info/rfc8771.
  
Copyright Notice
+
'''Copyright Notice'''
  
  Copyright (c) 2020 IETF Trust and the persons identified as the
+
Copyright (c) 2020 IETF Trust and the persons identified as the
  document authors.  All rights reserved.
+
document authors.  All rights reserved.
  
  This document is subject to BCP 78 and the IETF Trust's Legal
+
This document is subject to [[BCP78|BCP 78]] and the IETF Trust's Legal
  Provisions Relating to IETF Documents
+
Provisions Relating to IETF Documents
  (https://trustee.ietf.org/license-info) in effect on the date of
+
(https://trustee.ietf.org/license-info) in effect on the date of
  publication of this document.  Please review these documents
+
publication of this document.  Please review these documents
  carefully, as they describe your rights and restrictions with respect
+
carefully, as they describe your rights and restrictions with respect
  to this document.
+
to this document.
  
Table of Contents
+
1.  Introduction
 +
2.  Terminology
 +
3.  The Notation
 +
  3.1.  Forming I-DUNNO
 +
  3.2.  Deforming I-DUNNO
 +
4.  I-DUNNO Confusion Level Requirements
 +
  4.1.  Minimum Confusion Level
 +
  4.2.  Satisfactory Confusion Level
 +
  4.3.  Delightful Confusion Level
 +
5.  Example
 +
6.  IANA Considerations
 +
7.  Security Considerations
 +
8.  References
 +
  8.1.  Normative References
 +
  8.2.  Informative References
 +
Authors' Addresses
  
  1.  Introduction
+
== Introduction ==
  2.  Terminology
 
  3.  The Notation
 
    3.1.  Forming I-DUNNO
 
    3.2.  Deforming I-DUNNO
 
  4.  I-DUNNO Confusion Level Requirements
 
    4.1.  Minimum Confusion Level
 
    4.2.  Satisfactory Confusion Level
 
    4.3.  Delightful Confusion Level
 
  5.  Example
 
  6.  IANA Considerations
 
  7.  Security Considerations
 
  8.  References
 
    8.1.  Normative References
 
    8.2.  Informative References
 
  Authors' Addresses
 
  
1Introduction
+
In Section 2.3 of [[RFC0791]], the original designers of the Internet
 +
Protocol carefully defined names and addresses as separate
 +
quantities.  While they did not explicitly reserve names for human
 +
consumption and addresses for machine use, they did consider the
 +
matter indirectly in their philosophical communal statement: "A name
 +
indicates what we seek." This clearly indicates that names rather
 +
than addresses should be of concern to humans.
  
  In Section 2.3 of [RFC0791], the original designers of the Internet
+
The specification of domain names in [[RFC1034]], and indeed the
  Protocol carefully defined names and addresses as separate
+
continuing enormous effort put into the Domain Name System,
  quantitiesWhile they did not explicitly reserve names for human
+
reinforces the view that humans should use names and leave worrying
  consumption and addresses for machine use, they did consider the
+
about addresses to the machines[[RFC1034|RFC 1034]] mentions "users" several
  matter indirectly in their philosophical communal statement: "A name
+
times, and even includes the word "humans", even though it is
  indicates what we seek." This clearly indicates that names rather
+
positioned slightly unfortunately, though perfectly understandably,
  than addresses should be of concern to humans.
+
in a context of "annoying" and "can wreak havoc" (see Section 5.2.3
 +
of [[RFC1034]])Nevertheless, this is another clear indication that
 +
domain names are made for human use, while IP addresses are for
 +
machine use.
  
  The specification of domain names in [RFC1034], and indeed the
+
Given this, and a long error-strewn history of human attempts to
  continuing enormous effort put into the Domain Name System,
+
utilize addresses directly, it is obviously desirable that humans
  reinforces the view that humans should use names and leave worrying
+
should not meddle with IP addresses.  For that reason, it appears
  about addresses to the machinesRFC 1034 mentions "users" several
+
quite logical that a human-readable (textual) representation of IP
  times, and even includes the word "humans", even though it is
+
addresses was just very vaguely specified in Section 2.1 of
  positioned slightly unfortunately, though perfectly understandably,
+
[[RFC1123]].  Subsequently, a directed effort to further discourage
  in a context of "annoying" and "can wreak havoc" (see Section 5.2.3
+
human use by making IP addresses more confusing was introduced in
  of [RFC1034])Nevertheless, this is another clear indication that
+
[[RFC1883]] (which was obsoleted by [[RFC8200]]), and additional options
  domain names are made for human use, while IP addresses are for
+
for human puzzlement were offered in Section 2.2 of [[RFC4291]].  These
  machine use.
+
noble early attempts to hamper efforts by humans to read, understand,
 +
or even spell IP addressing schemes were unfortunately severely
 +
compromised in [[RFC5952]].
  
  Given this, and a long error-strewn history of human attempts to
+
In order to prevent further damage from human meddling with IP
  utilize addresses directly, it is obviously desirable that humans
+
addresses, there is a clear urgent need for an address notation that
  should not meddle with IP addresses. For that reason, it appears
+
replaces these "Legacy Notations", and efficiently discourages humans
  quite logical that a human-readable (textual) representation of IP
+
from reading, modifying, or otherwise manipulating IP addresses.
  addresses was just very vaguely specified in Section 2.1 of
+
Research in this area long ago recognized the potential in
  [RFC1123].  Subsequently, a directed effort to further discourage
+
ab^H^Hperusing the intricacies, inaccuracies, and chaotic disorder of
  human use by making IP addresses more confusing was introduced in
+
what humans are pleased to call a "Cultural Technique" (also known as
  [RFC1883] (which was obsoleted by [RFC8200]), and additional options
+
"Script"), and with a certain inexorable inevitability has focused of
  for human puzzlement were offered in Section 2.2 of [RFC4291].  These
+
late on the admirable confusion (and thus discouragement) potential
  noble early attempts to hamper efforts by humans to read, understand,
+
of [UNICODE] as an address notationIn Section 4, we introduce a
  or even spell IP addressing schemes were unfortunately severely
+
framework of Confusion Levels as an aid to the evaluation of the
  compromised in [RFC5952].
+
effectiveness of any Unicode-based scheme in producing notation in a
 +
form designed to be resistant to ready comprehension or, heaven
 +
forfend, mutation of the address, and so effecting the desired
 +
confusion and discouragement.
  
  In order to prevent further damage from human meddling with IP
+
The authors welcome [[RFC8369]] as a major step in the right direction.
  addresses, there is a clear urgent need for an address notation that
+
However, we have some reservations about the scheme proposed therein:
  replaces these "Legacy Notations", and efficiently discourages humans
 
  from reading, modifying, or otherwise manipulating IP addresses.
 
  Research in this area long ago recognized the potential in
 
  ab^H^Hperusing the intricacies, inaccuracies, and chaotic disorder of
 
  what humans are pleased to call a "Cultural Technique" (also known as
 
  "Script"), and with a certain inexorable inevitability has focused of
 
  late on the admirable confusion (and thus discouragement) potential
 
  of [UNICODE] as an address notation.  In Section 4, we introduce a
 
  framework of Confusion Levels as an aid to the evaluation of the
 
  effectiveness of any Unicode-based scheme in producing notation in a
 
  form designed to be resistant to ready comprehension or, heaven
 
  forfend, mutation of the address, and so effecting the desired
 
  confusion and discouragement.
 
  
   The authors welcome [RFC8369] as a major step in the right direction.
+
*  Our analysis of the proposed scheme indicates that, while
  However, we have some reservations about the scheme proposed therein:
+
   impressively concise, it fails to attain more than at best a
 +
  Minimum Confusion Level in our classification.
  
  Our analysis of the proposed scheme indicates that, while
+
Humans, especially younger ones, are becoming skilled at handling
      impressively concise, it fails to attain more than at best a
+
  emoji.  Over time, this will negatively impact the discouragement
      Minimum Confusion Level in our classification.
+
  factor.
  
  Humans, especially younger ones, are becoming skilled at handling
+
The proposed scheme is specific to IPv6; if a solution to this
      emojiOver time, this will negatively impact the discouragement
+
  problem is to be in any way timely, it must, as a matter of the
      factor.
+
  highest priority, address IPv4After all, even taking the
 +
  regrettable effects of [[RFC5952|RFC 5952]] into account, IPv6 does at least
 +
  remain inherently significantly more confusing and discouraging
 +
  than IPv4.
  
  *  The proposed scheme is specific to IPv6; if a solution to this
+
This document therefore specifies an alternative Unicode-based
      problem is to be in any way timely, it must, as a matter of the
+
notation, the Internationalized Deliberately Unreadable Network
      highest priority, address IPv4After all, even taking the
+
NOtation (I-DUNNO)This notation addresses each of the concerns
      regrettable effects of RFC 5952 into account, IPv6 does at least
+
outlined above:
      remain inherently significantly more confusing and discouraging
 
      than IPv4.
 
  
  This document therefore specifies an alternative Unicode-based
+
*  I-DUNNO can generate Minimum, Satisfactory, or Delightful levels
  notation, the Internationalized Deliberately Unreadable Network
+
   of confusion.
   NOtation (I-DUNNO). This notation addresses each of the concerns
 
  outlined above:
 
  
  I-DUNNO can generate Minimum, Satisfactory, or Delightful levels
+
As well as emoji, it takes advantage of other areas of Unicode
      of confusion.
+
  confusion.
  
  As well as emoji, it takes advantage of other areas of Unicode
+
It can be used with IPv4 and IPv6 addresses.
      confusion.
 
  
  * It can be used with IPv4 and IPv6 addresses.
+
We concede that I-DUNNO notation is markedly less concise than that
 +
of [[RFC8369|RFC 8369]]. However, by permitting multiple code points in the
 +
representation of a single address, I-DUNNO opens up the full
 +
spectrum of Unicode-adjacent code point interaction.  This is a
 +
significant factor in allowing I-DUNNO to achieve higher levels of
 +
confusion.  I-DUNNO also requires no change to the current size of
 +
Unicode code points, and so its chances of adoption and
 +
implementation are (slightly) higher.
  
  We concede that I-DUNNO notation is markedly less concise than that
+
Note that the use of I-DUNNO in the reverse DNS system is currently
  of RFC 8369.  However, by permitting multiple code points in the
+
out of scopeThe occasional human-induced absence of the magical
  representation of a single address, I-DUNNO opens up the full
+
one-character sequence U+002E is believed to cause sufficient
  spectrum of Unicode-adjacent code point interactionThis is a
+
disorder there.
  significant factor in allowing I-DUNNO to achieve higher levels of
 
  confusion.  I-DUNNO also requires no change to the current size of
 
  Unicode code points, and so its chances of adoption and
 
  implementation are (slightly) higher.
 
  
  Note that the use of I-DUNNO in the reverse DNS system is currently
+
Media Access Control (MAC) addresses are totally out of the question.
  out of scope.  The occasional human-induced absence of the magical
 
  one-character sequence U+002E is believed to cause sufficient
 
  disorder there.
 
  
  Media Access Control (MAC) addresses are totally out of the question.
+
== Terminology ==
  
2. Terminology
+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
 +
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
 +
"OPTIONAL" in this document are to be interpreted as described in
 +
[[BCP14|BCP 14]] [[RFC2119]] [[RFC8174]] when, and only when, they appear in all
 +
capitals, as shown here.
  
  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+
Additional terminology from [[RFC6919]] MIGHT apply.
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
 
  "OPTIONAL" in this document are to be interpreted as described in
 
  BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
 
  capitals, as shown here.
 
  
  Additional terminology from [RFC6919] MIGHT apply.
+
== The Notation ==
 
 
3.  The Notation
 
  
  I-DUNNO leverages UTF-8 [RFC3629] to obfuscate IP addresses for
+
I-DUNNO leverages UTF-8 [[RFC3629]] to obfuscate IP addresses for
  humans.  UTF-8 uses sequences between 1 and 4 octets to represent
+
humans.  UTF-8 uses sequences between 1 and 4 octets to represent
  code points as follows:
+
code points as follows:
  
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
      | Char. number range    | UTF-8 octet sequence                |
+
  | Char. number range    | UTF-8 octet sequence                |
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
      | (hexadecimal)        | (binary)                            |
+
  | (hexadecimal)        | (binary)                            |
      +=======================+=====================================+
+
  +=======================+=====================================+
      | 0000 0000 - 0000 007F | 0xxxxxxx                            |
+
  | 0000 0000 - 0000 007F | 0xxxxxxx                            |
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
      | 0000 0080 - 0000 07FF | 110xxxxx 10xxxxxx                  |
+
  | 0000 0080 - 0000 07FF | 110xxxxx 10xxxxxx                  |
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
      | 0000 0800 - 0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx          |
+
  | 0000 0800 - 0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx          |
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
      | 0001 0000 - 0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
+
  | 0001 0000 - 0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
      +-----------------------+-------------------------------------+
+
  +-----------------------+-------------------------------------+
  
                                  Table 1
+
                              Table 1
  
  I-DUNNO uses that structure to convey addressing information as
+
I-DUNNO uses that structure to convey addressing information as
  follows:
+
follows:
  
3.1.  Forming I-DUNNO
+
=== Forming I-DUNNO ===
  
  In order to form an I-DUNNO based on the Legacy Notation of an IP
+
In order to form an I-DUNNO based on the Legacy Notation of an IP
  address, the following steps are performed:
+
address, the following steps are performed:
  
  1.  The octets of the IP address are written as a bitstring in
+
1.  The octets of the IP address are written as a bitstring in
      network byte order.
+
    network byte order.
  
  2.  Working from left to right, the bitstring (32 bits for IPv4; 128
+
2.  Working from left to right, the bitstring (32 bits for IPv4; 128
      bits for IPv6) is used to generate a list of valid UTF-8 octet
+
    bits for IPv6) is used to generate a list of valid UTF-8 octet
      sequences.  To allocate a single UTF-8 sequence:
+
    sequences.  To allocate a single UTF-8 sequence:
  
      a.  Choose whether to generate a UTF-8 sequence of 1, 2, 3, or 4
+
    a.  Choose whether to generate a UTF-8 sequence of 1, 2, 3, or 4
          octets.  The choice OUGHT TO be guided by the requirement to
+
        octets.  The choice OUGHT TO be guided by the requirement to
          generate a satisfactory Minimum Confusion Level (Section 4.1)
+
        generate a satisfactory Minimum Confusion Level (Section 4.1)
          (not to be confused with the minimum Satisfactory Confusion
+
        (not to be confused with the minimum Satisfactory Confusion
          Level (Section 4.2)).  Refer to the character number range in
+
        Level (Section 4.2)).  Refer to the character number range in
          Table 1 in order to identify which octet sequence lengths are
+
        Table 1 in order to identify which octet sequence lengths are
          valid for a given bitstring.  For example, a 2-octet UTF-8
+
        valid for a given bitstring.  For example, a 2-octet UTF-8
          sequence requires the next 11 bits to have a value in the
+
        sequence requires the next 11 bits to have a value in the
          range 0080-07ff.
+
        range 0080-07ff.
  
      b.  Allocate bits from the bitstring to fill the vacant positions
+
    b.  Allocate bits from the bitstring to fill the vacant positions
          'x' in the UTF-8 sequence (see Table 1) from left to right.
+
        'x' in the UTF-8 sequence (see Table 1) from left to right.
  
      c.  UTF-8 sequences of 1, 2, 3, and 4 octets require 7, 11, 16,
+
    c.  UTF-8 sequences of 1, 2, 3, and 4 octets require 7, 11, 16,
          and 21 bits, respectively, from the bitstring.  Since the
+
        and 21 bits, respectively, from the bitstring.  Since the
          number of combinations of UTF-8 sequences accommodating
+
        number of combinations of UTF-8 sequences accommodating
          exactly 32 or 128 bits is limited, in sequences where the
+
        exactly 32 or 128 bits is limited, in sequences where the
          number of bits required does not exactly match the number of
+
        number of bits required does not exactly match the number of
          available bits, the final UTF-8 sequence MUST be padded with
+
        available bits, the final UTF-8 sequence MUST be padded with
          additional bits once the available address bits are
+
        additional bits once the available address bits are
          exhausted.  The sequence may therefore require up to 20 bits
+
        exhausted.  The sequence may therefore require up to 20 bits
          of padding.  The content of the padding SHOULD be chosen to
+
        of padding.  The content of the padding SHOULD be chosen to
          maximize the resulting Confusion Level.
+
        maximize the resulting Confusion Level.
  
  3.  Once the bits in the bitstring are exhausted, the conversion is
+
3.  Once the bits in the bitstring are exhausted, the conversion is
      complete.  The I-DUNNO representation of the address consists of
+
    complete.  The I-DUNNO representation of the address consists of
      the Unicode code points described by the list of generated UTF-8
+
    the Unicode code points described by the list of generated UTF-8
      sequences, and it MAY now be presented to unsuspecting humans.
+
    sequences, and it MAY now be presented to unsuspecting humans.
  
3.2.  Deforming I-DUNNO
+
=== Deforming I-DUNNO ===
  
  This section is intentionally omitted.  The machines will know how to
+
This section is intentionally omitted.  The machines will know how to
  do it, and by definition humans SHOULD NOT attempt the process.
+
do it, and by definition humans SHOULD NOT attempt the process.
  
4.  I-DUNNO Confusion Level Requirements
+
== I-DUNNO Confusion Level Requirements ==
  
  A sequence of characters is considered I-DUNNO only when there's
+
A sequence of characters is considered I-DUNNO only when there's
  enough potential to confuse humans.
+
enough potential to confuse humans.
  
  Unallocated code points MUST be avoided.  While they might appear to
+
Unallocated code points MUST be avoided.  While they might appear to
  have great confusion power at the moment, there's a minor chance that
+
have great confusion power at the moment, there's a minor chance that
  a future allocation to a useful, legible character will reduce this
+
a future allocation to a useful, legible character will reduce this
  capacity significantly.  Worse, in the (unlikely, but not impossible
+
capacity significantly.  Worse, in the (unlikely, but not impossible
  -- see Section 3.1.3 of [RFC5894]) event of a code point losing its
+
-- see Section 3.1.3 of [[RFC5894]]) event of a code point losing its
  DISALLOWED property per IDNA2008 [RFC5894], existing I-DUNNOs could
+
DISALLOWED property per IDNA2008 [[RFC5894]], existing I-DUNNOs could
  be rendered less than minimally confusing, with disastrous
+
be rendered less than minimally confusing, with disastrous
  consequences.
+
consequences.
  
  The following Confusion Levels are defined:
+
The following Confusion Levels are defined:
  
4.1.  Minimum Confusion Level
+
=== Minimum Confusion Level ===
  
  As a minimum, a valid I-DUNNO MUST:
+
As a minimum, a valid I-DUNNO MUST:
  
  *  Contain at least one UTF-8 octet sequence with a length greater
+
*  Contain at least one UTF-8 octet sequence with a length greater
      than one octet.
+
  than one octet.
  
  *  Contain at least one character that is DISALLOWED in IDNA2008.  No
+
*  Contain at least one character that is DISALLOWED in IDNA2008.  No
      code point left behind!  Note that this allows machines to
+
  code point left behind!  Note that this allows machines to
      distinguish I-DUNNO from Internationalized Domain Name labels.
+
  distinguish I-DUNNO from Internationalized Domain Name labels.
  
  I-DUNNOs on this level will at least puzzle most human users with
+
I-DUNNOs on this level will at least puzzle most human users with
  knowledge of the Legacy Notation.
+
knowledge of the Legacy Notation.
  
4.2.  Satisfactory Confusion Level
+
=== Satisfactory Confusion Level ===
  
  An I-DUNNO with Satisfactory Confusion Level MUST adhere to the
+
An I-DUNNO with Satisfactory Confusion Level MUST adhere to the
  Minimum Confusion Level, and additionally contain two of the
+
Minimum Confusion Level, and additionally contain two of the
  following:
+
following:
  
  *  At least one non-printable character.
+
*  At least one non-printable character.
  
  *  Characters from at least two different Scripts.
+
*  Characters from at least two different Scripts.
  
  *  A character from the "Symbol" category.
+
*  A character from the "Symbol" category.
  
  The Satisfactory Confusion Level will make many human-machine
+
The Satisfactory Confusion Level will make many human-machine
  interfaces beep, blink, silently fail, or any combination thereof.
+
interfaces beep, blink, silently fail, or any combination thereof.
  This is considered sufficient to discourage most humans from
+
This is considered sufficient to discourage most humans from
  deforming I-DUNNO.
+
deforming I-DUNNO.
  
4.3.  Delightful Confusion Level
+
=== Delightful Confusion Level ===
  
  An I-DUNNO with Delightful Confusion Level MUST adhere to the
+
An I-DUNNO with Delightful Confusion Level MUST adhere to the
  Satisfactory Confusion Level, and additionally contain at least two
+
Satisfactory Confusion Level, and additionally contain at least two
  of the following:
+
of the following:
  
  *  Characters from scripts with different directionalities.
+
*  Characters from scripts with different directionalities.
  
  *  Character classified as "Confusables".
+
*  Character classified as "Confusables".
  
  *  One or more emoji.
+
*  One or more emoji.
  
  An I-DUNNO conforming to this level will cause almost all humans to
+
An I-DUNNO conforming to this level will cause almost all humans to
  U+1F926, with the exception of those subscribed to the idna-update
+
U+1F926, with the exception of those subscribed to the idna-update
  mailing list.
+
mailing list.
  
  (We have also considered a further, higher Confusion Level,
+
(We have also considered a further, higher Confusion Level,
  tentatively entitled "BReak EXaminatIon or Twiddling" or "BREXIT"
+
tentatively entitled "BReak EXaminatIon or Twiddling" or "BREXIT"
  Level Confusion, but currently we have no idea how to go about
+
Level Confusion, but currently we have no idea how to go about
  actually implementing it.)
+
actually implementing it.)
  
5.  Example
+
== Example ==
  
  An I-DUNNO based on the Legacy Notation IPv4 address "198.51.100.164"
+
An I-DUNNO based on the Legacy Notation IPv4 address "198.51.100.164"
  is formed and validated as follows: First, the Legacy Notation is
+
is formed and validated as follows: First, the Legacy Notation is
  written as a string of 32 bits in network byte order:
+
written as a string of 32 bits in network byte order:
  
                    11000110001100110110010010100100
+
                  11000110001100110110010010100100
  
  Since I-DUNNO requires at least one UTF-8 octet sequence with a
+
Since I-DUNNO requires at least one UTF-8 octet sequence with a
  length greater than one octet, we allocate bits in the following
+
length greater than one octet, we allocate bits in the following
  form:
+
form:
  
                  seq1  |  seq2  |  seq3  |  seq4
+
                seq1  |  seq2  |  seq3  |  seq4
                --------+---------+---------+------------
+
              --------+---------+---------+------------
                1100011 | 0001100 | 1101100 | 10010100100
+
              1100011 | 0001100 | 1101100 | 10010100100
  
  This translates into the following code points:
+
This translates into the following code points:
  
        +-------------+-------------------------------------------+
+
    +-------------+-------------------------------------------+
        | Bit Seq.    | Character Number (Character Name)        |
+
    | Bit Seq.    | Character Number (Character Name)        |
        +=============+===========================================+
+
    +=============+===========================================+
        | 1100011    | U+0063 (LATIN SMALL LETTER C)            |
+
    | 1100011    | U+0063 (LATIN SMALL LETTER C)            |
        +-------------+-------------------------------------------+
+
    +-------------+-------------------------------------------+
        | 0001100    | U+000C (FORM FEED (FF))                  |
+
    | 0001100    | U+000C (FORM FEED (FF))                  |
        +-------------+-------------------------------------------+
+
    +-------------+-------------------------------------------+
        | 1101100    | U+006C (LATIN SMALL LETTER L)            |
+
    | 1101100    | U+006C (LATIN SMALL LETTER L)            |
        +-------------+-------------------------------------------+
+
    +-------------+-------------------------------------------+
        | 10010100100 | U+04A4 (CYRILLIC CAPITAL LIGATURE EN GHE) |
+
    | 10010100100 | U+04A4 (CYRILLIC CAPITAL LIGATURE EN GHE) |
        +-------------+-------------------------------------------+
+
    +-------------+-------------------------------------------+
  
                                  Table 2
+
                              Table 2
  
  The resulting string MUST be evaluated against the Confusion Level
+
The resulting string MUST be evaluated against the Confusion Level
  Requirements before I-DUNNO can be declared.  Given the example
+
Requirements before I-DUNNO can be declared.  Given the example
  above:
+
above:
  
  *  There is at least one UTF-8 octet sequence with a length greater
+
*  There is at least one UTF-8 octet sequence with a length greater
      than 1 (U+04A4) .
+
  than 1 (U+04A4) .
  
  *  There are two IDNA2008 DISALLOWED characters: U+000C (for good
+
*  There are two IDNA2008 DISALLOWED characters: U+000C (for good
      reason!) and U+04A4.
+
  reason!) and U+04A4.
  
  *  There is one non-printable character (U+000C).
+
*  There is one non-printable character (U+000C).
  
  *  There are characters from two different Scripts (Latin and
+
*  There are characters from two different Scripts (Latin and
      Cyrillic).
+
  Cyrillic).
  
  Therefore, the example above constitutes valid I-DUNNO with a
+
Therefore, the example above constitutes valid I-DUNNO with a
  Satisfactory Confusion Level.  U+000C in particular has great
+
Satisfactory Confusion Level.  U+000C in particular has great
  potential in environments where I-DUNNOs would be sent to printers.
+
potential in environments where I-DUNNOs would be sent to printers.
  
6.  IANA Considerations
+
== IANA Considerations ==
  
  If this work is standardized, IANA is kindly requested to revoke all
+
If this work is standardized, IANA is kindly requested to revoke all
  IPv4 and IPv6 address range allocations that do not allow for at
+
IPv4 and IPv6 address range allocations that do not allow for at
  least one I-DUNNO of Delightful Confusion Level.  IPv4 prefixes are
+
least one I-DUNNO of Delightful Confusion Level.  IPv4 prefixes are
  more likely to be affected, hence this can easily be marketed as an
+
more likely to be affected, hence this can easily be marketed as an
  effort to foster IPv6 deployment.
+
effort to foster IPv6 deployment.
  
  Furthermore, IANA is urged to expand the Internet TLA Registry
+
Furthermore, IANA is urged to expand the Internet TLA Registry
  [RFC5513] to accommodate Seven-Letter Acronyms (SLA) for obvious
+
[[RFC5513]] to accommodate Seven-Letter Acronyms (SLA) for obvious
  reasons, and register 'I-DUNNO'.  For that purpose, U+002D ("-",
+
reasons, and register 'I-DUNNO'.  For that purpose, U+002D ("-",
  HYPHEN-MINUS) SHALL be declared a Letter.
+
HYPHEN-MINUS) SHALL be declared a Letter.
  
7.  Security Considerations
+
== Security Considerations ==
  
  I-DUNNO is not a security algorithm.  Quite the contrary -- many
+
I-DUNNO is not a security algorithm.  Quite the contrary -- many
  humans are known to develop a strong feeling of insecurity when
+
humans are known to develop a strong feeling of insecurity when
  confronted with I-DUNNO.
+
confronted with I-DUNNO.
  
  In the tradition of many other RFCs, the evaluation of other security
+
In the tradition of many other RFCs, the evaluation of other security
  aspects of I-DUNNO is left as an exercise for the reader.
+
aspects of I-DUNNO is left as an exercise for the reader.
  
8.  References
+
== References ==
  
8.1.  Normative References
+
=== Normative References ===
  
  [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
+
[[RFC2119]]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
+
          Requirement Levels", [[BCP14|BCP 14]], [[RFC2119|RFC 2119]],
              DOI 10.17487/RFC2119, March 1997,
+
          DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.
+
          <https://www.rfc-editor.org/info/rfc2119>.
  
  [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
+
[[RFC3629]]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
+
          10646", [[STD63|STD 63]], [[RFC3629|RFC 3629]], DOI 10.17487/RFC3629, November
              2003, <https://www.rfc-editor.org/info/rfc3629>.
+
          2003, <https://www.rfc-editor.org/info/rfc3629>.
  
  [RFC5894]  Klensin, J., "Internationalized Domain Names for
+
[[RFC5894]]  Klensin, J., "Internationalized Domain Names for
              Applications (IDNA): Background, Explanation, and
+
          Applications (IDNA): Background, Explanation, and
              Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
+
          Rationale", [[RFC5894|RFC 5894]], DOI 10.17487/RFC5894, August 2010,
              <https://www.rfc-editor.org/info/rfc5894>.
+
          <https://www.rfc-editor.org/info/rfc5894>.
  
  [RFC6919]  Barnes, R., Kent, S., and E. Rescorla, "Further Key Words
+
[[RFC6919]]  Barnes, R., Kent, S., and E. Rescorla, "Further Key Words
              for Use in RFCs to Indicate Requirement Levels", RFC 6919,
+
          for Use in RFCs to Indicate Requirement Levels", [[RFC6919|RFC 6919]],
              DOI 10.17487/RFC6919, April 2013,
+
          DOI 10.17487/RFC6919, April 2013,
              <https://www.rfc-editor.org/info/rfc6919>.
+
          <https://www.rfc-editor.org/info/rfc6919>.
  
  [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+
[[RFC8174]]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+
          2119 Key Words", [[BCP14|BCP 14]], [[RFC8174|RFC 8174]], DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
          May 2017, <https://www.rfc-editor.org/info/rfc8174>.
  
8.2.  Informative References
+
=== Informative References ===
  
  [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
+
[[RFC0791]]  Postel, J., "Internet Protocol", [[STD5|STD 5]], [[RFC791|RFC 791]],
              DOI 10.17487/RFC0791, September 1981,
+
          DOI 10.17487/RFC0791, September 1981,
              <https://www.rfc-editor.org/info/rfc791>.
+
          <https://www.rfc-editor.org/info/rfc791>.
  
  [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
+
[[RFC1034]]  Mockapetris, P., "Domain names - concepts and facilities",
              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
+
          [[STD13|STD 13]], [[RFC1034|RFC 1034]], DOI 10.17487/RFC1034, November 1987,
              <https://www.rfc-editor.org/info/rfc1034>.
+
          <https://www.rfc-editor.org/info/rfc1034>.
  
  [RFC1123]  Braden, R., Ed., "Requirements for Internet Hosts -
+
[[RFC1123]]  Braden, R., Ed., "Requirements for Internet Hosts -
              Application and Support", STD 3, RFC 1123,
+
          Application and Support", [[STD3|STD 3]], [[RFC1123|RFC 1123]],
              DOI 10.17487/RFC1123, October 1989,
+
          DOI 10.17487/RFC1123, October 1989,
              <https://www.rfc-editor.org/info/rfc1123>.
+
          <https://www.rfc-editor.org/info/rfc1123>.
  
  [RFC1883]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
+
[[RFC1883]]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 1883, DOI 10.17487/RFC1883,
+
          (IPv6) Specification", [[RFC1883|RFC 1883]], DOI 10.17487/RFC1883,
              December 1995, <https://www.rfc-editor.org/info/rfc1883>.
+
          December 1995, <https://www.rfc-editor.org/info/rfc1883>.
  
  [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
+
[[RFC4291]]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 4291, DOI 10.17487/RFC4291, February
+
          Architecture", [[RFC4291|RFC 4291]], DOI 10.17487/RFC4291, February
              2006, <https://www.rfc-editor.org/info/rfc4291>.
+
          2006, <https://www.rfc-editor.org/info/rfc4291>.
  
  [RFC5513]  Farrel, A., "IANA Considerations for Three Letter
+
[[RFC5513]]  Farrel, A., "IANA Considerations for Three Letter
              Acronyms", RFC 5513, DOI 10.17487/RFC5513, April 2009,
+
          Acronyms", [[RFC5513|RFC 5513]], DOI 10.17487/RFC5513, April 2009,
              <https://www.rfc-editor.org/info/rfc5513>.
+
          <https://www.rfc-editor.org/info/rfc5513>.
  
  [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
+
[[RFC5952]]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
              Address Text Representation", RFC 5952,
+
          Address Text Representation", [[RFC5952|RFC 5952]],
              DOI 10.17487/RFC5952, August 2010,
+
          DOI 10.17487/RFC5952, August 2010,
              <https://www.rfc-editor.org/info/rfc5952>.
+
          <https://www.rfc-editor.org/info/rfc5952>.
  
  [RFC8200]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
+
[[RFC8200]]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", STD 86, RFC 8200,
+
          (IPv6) Specification", [[STD86|STD 86]], [[RFC8200|RFC 8200]],
              DOI 10.17487/RFC8200, July 2017,
+
          DOI 10.17487/RFC8200, July 2017,
              <https://www.rfc-editor.org/info/rfc8200>.
+
          <https://www.rfc-editor.org/info/rfc8200>.
  
  [RFC8369]  Kaplan, H., "Internationalizing IPv6 Using 128-Bit
+
[[RFC8369]]  Kaplan, H., "Internationalizing IPv6 Using 128-Bit
              Unicode", RFC 8369, DOI 10.17487/RFC8369, April 2018,
+
          Unicode", [[RFC8369|RFC 8369]], DOI 10.17487/RFC8369, April 2018,
              <https://www.rfc-editor.org/info/rfc8369>.
+
          <https://www.rfc-editor.org/info/rfc8369>.
  
  [UNICODE]  The Unicode Consortium, "The Unicode Standard (Current
+
[UNICODE]  The Unicode Consortium, "The Unicode Standard (Current
              Version)", 2019,
+
          Version)", 2019,
              <http://www.unicode.org/versions/latest/>.
+
          <http://www.unicode.org/versions/latest/>.
  
 
Authors' Addresses
 
Authors' Addresses
  
  Alexander Mayrhofer
+
Alexander Mayrhofer
  nic.at GmbH
+
nic.at GmbH
  
+
  URI:  https://i-dunno.at/
+
URI:  https://i-dunno.at/
  
 +
Jim Hague
 +
Sinodun
  
  Jim Hague
+
  Sinodun
+
URI:  https://www.sinodun.com/
  
+
[[Category:Experimental]]
  URI:  https://www.sinodun.com/
 

Latest revision as of 11:11, 30 October 2020



Independent Submission A. Mayrhofer Request for Comments: 8771 nic.at GmbH Category: Experimental J. Hague ISSN: 2070-1721 Sinodun

                                                        1 April 2020

The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO)

Abstract

Domain Names were designed for humans, IP addresses were not. But more than 30 years after the introduction of the DNS, a minority of mankind persists in invading the realm of machine-to-machine communication by reading, writing, misspelling, memorizing, permuting, and confusing IP addresses. This memo describes the Internationalized Deliberately Unreadable Network NOtation ("I-DUNNO"), a notation designed to replace current textual representations of IP addresses with something that is not only more concise but will also discourage this small, but obviously important, subset of human activity.

Status of This Memo

This document is not an Internet Standards Track specification; it is published for examination, experimental implementation, and evaluation.

This document defines an Experimental Protocol for the Internet community. This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not candidates for any level of Internet Standard; see Section 2 of RFC 7841.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8771.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

1. Introduction 2. Terminology 3. The Notation

 3.1.  Forming I-DUNNO
 3.2.  Deforming I-DUNNO

4. I-DUNNO Confusion Level Requirements

 4.1.  Minimum Confusion Level
 4.2.  Satisfactory Confusion Level
 4.3.  Delightful Confusion Level

5. Example 6. IANA Considerations 7. Security Considerations 8. References

 8.1.  Normative References
 8.2.  Informative References

Authors' Addresses

Introduction

In Section 2.3 of RFC0791, the original designers of the Internet Protocol carefully defined names and addresses as separate quantities. While they did not explicitly reserve names for human consumption and addresses for machine use, they did consider the matter indirectly in their philosophical communal statement: "A name indicates what we seek." This clearly indicates that names rather than addresses should be of concern to humans.

The specification of domain names in RFC1034, and indeed the continuing enormous effort put into the Domain Name System, reinforces the view that humans should use names and leave worrying about addresses to the machines. RFC 1034 mentions "users" several times, and even includes the word "humans", even though it is positioned slightly unfortunately, though perfectly understandably, in a context of "annoying" and "can wreak havoc" (see Section 5.2.3 of RFC1034). Nevertheless, this is another clear indication that domain names are made for human use, while IP addresses are for machine use.

Given this, and a long error-strewn history of human attempts to utilize addresses directly, it is obviously desirable that humans should not meddle with IP addresses. For that reason, it appears quite logical that a human-readable (textual) representation of IP addresses was just very vaguely specified in Section 2.1 of RFC1123. Subsequently, a directed effort to further discourage human use by making IP addresses more confusing was introduced in RFC1883 (which was obsoleted by RFC8200), and additional options for human puzzlement were offered in Section 2.2 of RFC4291. These noble early attempts to hamper efforts by humans to read, understand, or even spell IP addressing schemes were unfortunately severely compromised in RFC5952.

In order to prevent further damage from human meddling with IP addresses, there is a clear urgent need for an address notation that replaces these "Legacy Notations", and efficiently discourages humans from reading, modifying, or otherwise manipulating IP addresses. Research in this area long ago recognized the potential in ab^H^Hperusing the intricacies, inaccuracies, and chaotic disorder of what humans are pleased to call a "Cultural Technique" (also known as "Script"), and with a certain inexorable inevitability has focused of late on the admirable confusion (and thus discouragement) potential of [UNICODE] as an address notation. In Section 4, we introduce a framework of Confusion Levels as an aid to the evaluation of the effectiveness of any Unicode-based scheme in producing notation in a form designed to be resistant to ready comprehension or, heaven forfend, mutation of the address, and so effecting the desired confusion and discouragement.

The authors welcome RFC8369 as a major step in the right direction. However, we have some reservations about the scheme proposed therein:

  • Our analysis of the proposed scheme indicates that, while
  impressively concise, it fails to attain more than at best a
  Minimum Confusion Level in our classification.
  • Humans, especially younger ones, are becoming skilled at handling
  emoji.  Over time, this will negatively impact the discouragement
  factor.
  • The proposed scheme is specific to IPv6; if a solution to this
  problem is to be in any way timely, it must, as a matter of the
  highest priority, address IPv4.  After all, even taking the
  regrettable effects of RFC 5952 into account, IPv6 does at least
  remain inherently significantly more confusing and discouraging
  than IPv4.

This document therefore specifies an alternative Unicode-based notation, the Internationalized Deliberately Unreadable Network NOtation (I-DUNNO). This notation addresses each of the concerns outlined above:

  • I-DUNNO can generate Minimum, Satisfactory, or Delightful levels
  of confusion.
  • As well as emoji, it takes advantage of other areas of Unicode
  confusion.
  • It can be used with IPv4 and IPv6 addresses.

We concede that I-DUNNO notation is markedly less concise than that of RFC 8369. However, by permitting multiple code points in the representation of a single address, I-DUNNO opens up the full spectrum of Unicode-adjacent code point interaction. This is a significant factor in allowing I-DUNNO to achieve higher levels of confusion. I-DUNNO also requires no change to the current size of Unicode code points, and so its chances of adoption and implementation are (slightly) higher.

Note that the use of I-DUNNO in the reverse DNS system is currently out of scope. The occasional human-induced absence of the magical one-character sequence U+002E is believed to cause sufficient disorder there.

Media Access Control (MAC) addresses are totally out of the question.

Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

Additional terminology from RFC6919 MIGHT apply.

The Notation

I-DUNNO leverages UTF-8 RFC3629 to obfuscate IP addresses for humans. UTF-8 uses sequences between 1 and 4 octets to represent code points as follows:

  +-----------------------+-------------------------------------+
  | Char. number range    | UTF-8 octet sequence                |
  +-----------------------+-------------------------------------+
  | (hexadecimal)         | (binary)                            |
  +=======================+=====================================+
  | 0000 0000 - 0000 007F | 0xxxxxxx                            |
  +-----------------------+-------------------------------------+
  | 0000 0080 - 0000 07FF | 110xxxxx 10xxxxxx                   |
  +-----------------------+-------------------------------------+
  | 0000 0800 - 0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx          |
  +-----------------------+-------------------------------------+
  | 0001 0000 - 0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
  +-----------------------+-------------------------------------+
                              Table 1

I-DUNNO uses that structure to convey addressing information as follows:

Forming I-DUNNO

In order to form an I-DUNNO based on the Legacy Notation of an IP address, the following steps are performed:

1. The octets of the IP address are written as a bitstring in

   network byte order.

2. Working from left to right, the bitstring (32 bits for IPv4; 128

   bits for IPv6) is used to generate a list of valid UTF-8 octet
   sequences.  To allocate a single UTF-8 sequence:
   a.  Choose whether to generate a UTF-8 sequence of 1, 2, 3, or 4
       octets.  The choice OUGHT TO be guided by the requirement to
       generate a satisfactory Minimum Confusion Level (Section 4.1)
       (not to be confused with the minimum Satisfactory Confusion
       Level (Section 4.2)).  Refer to the character number range in
       Table 1 in order to identify which octet sequence lengths are
       valid for a given bitstring.  For example, a 2-octet UTF-8
       sequence requires the next 11 bits to have a value in the
       range 0080-07ff.
   b.  Allocate bits from the bitstring to fill the vacant positions
       'x' in the UTF-8 sequence (see Table 1) from left to right.
   c.  UTF-8 sequences of 1, 2, 3, and 4 octets require 7, 11, 16,
       and 21 bits, respectively, from the bitstring.  Since the
       number of combinations of UTF-8 sequences accommodating
       exactly 32 or 128 bits is limited, in sequences where the
       number of bits required does not exactly match the number of
       available bits, the final UTF-8 sequence MUST be padded with
       additional bits once the available address bits are
       exhausted.  The sequence may therefore require up to 20 bits
       of padding.  The content of the padding SHOULD be chosen to
       maximize the resulting Confusion Level.

3. Once the bits in the bitstring are exhausted, the conversion is

   complete.  The I-DUNNO representation of the address consists of
   the Unicode code points described by the list of generated UTF-8
   sequences, and it MAY now be presented to unsuspecting humans.

Deforming I-DUNNO

This section is intentionally omitted. The machines will know how to do it, and by definition humans SHOULD NOT attempt the process.

I-DUNNO Confusion Level Requirements

A sequence of characters is considered I-DUNNO only when there's enough potential to confuse humans.

Unallocated code points MUST be avoided. While they might appear to have great confusion power at the moment, there's a minor chance that a future allocation to a useful, legible character will reduce this capacity significantly. Worse, in the (unlikely, but not impossible -- see Section 3.1.3 of RFC5894) event of a code point losing its DISALLOWED property per IDNA2008 RFC5894, existing I-DUNNOs could be rendered less than minimally confusing, with disastrous consequences.

The following Confusion Levels are defined:

Minimum Confusion Level

As a minimum, a valid I-DUNNO MUST:

  • Contain at least one UTF-8 octet sequence with a length greater
  than one octet.
  • Contain at least one character that is DISALLOWED in IDNA2008. No
  code point left behind!  Note that this allows machines to
  distinguish I-DUNNO from Internationalized Domain Name labels.

I-DUNNOs on this level will at least puzzle most human users with knowledge of the Legacy Notation.

Satisfactory Confusion Level

An I-DUNNO with Satisfactory Confusion Level MUST adhere to the Minimum Confusion Level, and additionally contain two of the following:

  • At least one non-printable character.
  • Characters from at least two different Scripts.
  • A character from the "Symbol" category.

The Satisfactory Confusion Level will make many human-machine interfaces beep, blink, silently fail, or any combination thereof. This is considered sufficient to discourage most humans from deforming I-DUNNO.

Delightful Confusion Level

An I-DUNNO with Delightful Confusion Level MUST adhere to the Satisfactory Confusion Level, and additionally contain at least two of the following:

  • Characters from scripts with different directionalities.
  • Character classified as "Confusables".
  • One or more emoji.

An I-DUNNO conforming to this level will cause almost all humans to U+1F926, with the exception of those subscribed to the idna-update mailing list.

(We have also considered a further, higher Confusion Level, tentatively entitled "BReak EXaminatIon or Twiddling" or "BREXIT" Level Confusion, but currently we have no idea how to go about actually implementing it.)

Example

An I-DUNNO based on the Legacy Notation IPv4 address "198.51.100.164" is formed and validated as follows: First, the Legacy Notation is written as a string of 32 bits in network byte order:

                 11000110001100110110010010100100

Since I-DUNNO requires at least one UTF-8 octet sequence with a length greater than one octet, we allocate bits in the following form:

               seq1  |   seq2  |   seq3  |   seq4
             --------+---------+---------+------------
             1100011 | 0001100 | 1101100 | 10010100100

This translates into the following code points:

    +-------------+-------------------------------------------+
    | Bit Seq.    | Character Number (Character Name)         |
    +=============+===========================================+
    | 1100011     | U+0063 (LATIN SMALL LETTER C)             |
    +-------------+-------------------------------------------+
    | 0001100     | U+000C (FORM FEED (FF))                   |
    +-------------+-------------------------------------------+
    | 1101100     | U+006C (LATIN SMALL LETTER L)             |
    +-------------+-------------------------------------------+
    | 10010100100 | U+04A4 (CYRILLIC CAPITAL LIGATURE EN GHE) |
    +-------------+-------------------------------------------+
                              Table 2

The resulting string MUST be evaluated against the Confusion Level Requirements before I-DUNNO can be declared. Given the example above:

  • There is at least one UTF-8 octet sequence with a length greater
  than 1 (U+04A4) .
  • There are two IDNA2008 DISALLOWED characters: U+000C (for good
  reason!) and U+04A4.
  • There is one non-printable character (U+000C).
  • There are characters from two different Scripts (Latin and
  Cyrillic).

Therefore, the example above constitutes valid I-DUNNO with a Satisfactory Confusion Level. U+000C in particular has great potential in environments where I-DUNNOs would be sent to printers.

IANA Considerations

If this work is standardized, IANA is kindly requested to revoke all IPv4 and IPv6 address range allocations that do not allow for at least one I-DUNNO of Delightful Confusion Level. IPv4 prefixes are more likely to be affected, hence this can easily be marketed as an effort to foster IPv6 deployment.

Furthermore, IANA is urged to expand the Internet TLA Registry RFC5513 to accommodate Seven-Letter Acronyms (SLA) for obvious reasons, and register 'I-DUNNO'. For that purpose, U+002D ("-", HYPHEN-MINUS) SHALL be declared a Letter.

Security Considerations

I-DUNNO is not a security algorithm. Quite the contrary -- many humans are known to develop a strong feeling of insecurity when confronted with I-DUNNO.

In the tradition of many other RFCs, the evaluation of other security aspects of I-DUNNO is left as an exercise for the reader.

References

Normative References

RFC2119 Bradner, S., "Key words for use in RFCs to Indicate

          Requirement Levels", BCP 14, RFC 2119,
          DOI 10.17487/RFC2119, March 1997,
          <https://www.rfc-editor.org/info/rfc2119>.

RFC3629 Yergeau, F., "UTF-8, a transformation format of ISO

          10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
          2003, <https://www.rfc-editor.org/info/rfc3629>.

RFC5894 Klensin, J., "Internationalized Domain Names for

          Applications (IDNA): Background, Explanation, and
          Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010,
          <https://www.rfc-editor.org/info/rfc5894>.

RFC6919 Barnes, R., Kent, S., and E. Rescorla, "Further Key Words

          for Use in RFCs to Indicate Requirement Levels", RFC 6919,
          DOI 10.17487/RFC6919, April 2013,
          <https://www.rfc-editor.org/info/rfc6919>.

RFC8174 Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC

          2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
          May 2017, <https://www.rfc-editor.org/info/rfc8174>.

Informative References

RFC0791 Postel, J., "Internet Protocol", STD 5, RFC 791,

          DOI 10.17487/RFC0791, September 1981,
          <https://www.rfc-editor.org/info/rfc791>.

RFC1034 Mockapetris, P., "Domain names - concepts and facilities",

          STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
          <https://www.rfc-editor.org/info/rfc1034>.

RFC1123 Braden, R., Ed., "Requirements for Internet Hosts -

          Application and Support", STD 3, RFC 1123,
          DOI 10.17487/RFC1123, October 1989,
          <https://www.rfc-editor.org/info/rfc1123>.

RFC1883 Deering, S. and R. Hinden, "Internet Protocol, Version 6

          (IPv6) Specification", RFC 1883, DOI 10.17487/RFC1883,
          December 1995, <https://www.rfc-editor.org/info/rfc1883>.

RFC4291 Hinden, R. and S. Deering, "IP Version 6 Addressing

          Architecture", RFC 4291, DOI 10.17487/RFC4291, February
          2006, <https://www.rfc-editor.org/info/rfc4291>.

RFC5513 Farrel, A., "IANA Considerations for Three Letter

          Acronyms", RFC 5513, DOI 10.17487/RFC5513, April 2009,
          <https://www.rfc-editor.org/info/rfc5513>.

RFC5952 Kawamura, S. and M. Kawashima, "A Recommendation for IPv6

          Address Text Representation", RFC 5952,
          DOI 10.17487/RFC5952, August 2010,
          <https://www.rfc-editor.org/info/rfc5952>.

RFC8200 Deering, S. and R. Hinden, "Internet Protocol, Version 6

          (IPv6) Specification", STD 86, RFC 8200,
          DOI 10.17487/RFC8200, July 2017,
          <https://www.rfc-editor.org/info/rfc8200>.

RFC8369 Kaplan, H., "Internationalizing IPv6 Using 128-Bit

          Unicode", RFC 8369, DOI 10.17487/RFC8369, April 2018,
          <https://www.rfc-editor.org/info/rfc8369>.

[UNICODE] The Unicode Consortium, "The Unicode Standard (Current

          Version)", 2019,
          <http://www.unicode.org/versions/latest/>.

Authors' Addresses

Alexander Mayrhofer nic.at GmbH

Email: [email protected] URI: https://i-dunno.at/

Jim Hague Sinodun

Email: [email protected] URI: https://www.sinodun.com/