Friend Codes in the Context of Social Networks

Background: What are friend codes? What inspired this post?

Friend codes were introduced by Nintendo Co. Ltd. as part of the Nintendo Wi-Fi Connection. They allow people to add each other as friends without needing usernames. A friend code is a unique numeric identifier. Given two players A and B, a friends connection is established when both players input the other player's friend code into their respective consoles.

In this post, I will take a look at the suitability of this design for social networks. In part, these words are inspired by the recent changes that Discord Inc. announced. To summarize the system Discord currently uses: Every user has a username and a discriminator. The username is chosen by the user and includes effectively the entire printable character set of Unicode. The discriminator, however, is a four-digit number randomly assigned by the system. If two users choose the same username (case sensitively!), the system ensures that discriminators are unique. Users can pay for a Discord feature called Nitro, which allows changing the discriminator to a value of choice among other things; this requires that the discriminator of choice is not currently in use by another user with the same username. A username is blocked from use if all possible values for a discriminator have been used up.

Social networks like Discord allow users not only to find each other by means of the platform itself, but also through side channels. For example, two users may add each other as friends on Discord even though they share no guilds (servers) in common and then chat each other up.

In the aforementioned blog post, Discord Inc. reasoned that their current username system suffers from four systemic weaknesses:

usernames are difficult to share because the legal character set is so large that users can choose impractical ones making full use of what Unicode allows;
the discriminators are difficult to remember;
pronouncing usernames does not account for case-sensitive username matching;
common names (such as common given names) have exhausted their entire four-digit pool of discriminators and have thus turned into a finite resource.

People reading their blog post carefully may realize that their list of five weaknesses is actually in part redundant and in part irrelevant. Quoting the original post (emphasis added):

You try to share your username outside of Discord. Unfortunately, you either can't remember the discriminator, have to explain which letters are uppercase and lowercase, or have to try to specify which special characters your name uses.

[…]

You like to change your username a lot and get rate limited.

Your friend says they changed their name to vernacular but actually it’s 𝖛𝖊𝖗𝖓𝖆𝖈𝖚𝖑𝖆𝖗 and you have trouble finding them.

The final part of the first argument is redundant with their last argument: usage of special characters being difficult to convey off-platform (by all means, go ahead and try to post their vernacular example on Hacker News and see how much of that makes it through). Their second-to-last argument has nothing to do with username discovery directly. Perhaps they rate-limit usernames in some vague hope that people can add each other before the next username change for people who change their username frequently; it is quite unclear to me how this point relates to the others.

The solution they present is to introduce a new, unique username without discriminators, which consists of only lower-case characters, digits, periods and underscore.

An alternative solution

I argue, however, that this change may be a solution, but friend codes are perhaps an alternative solution. The rest of this post explores how friend codes are applicable to the problem Discord is facing right now: side-channel account discovery.

Friend codes act as a unique identifier generated by the system, rather than the user; entering it starts the flow to establish a friend relationship. Though patented by Nintendo (US9931571B2, US11083971B2, US8568239B2), obvious prior art exists (ICQ and phone numbers). I will therefore continue talking about this concept, but the patents may be an issue that could have discouraged Discord from pursuing this angle.

By having a unique identifier that is out of the user's control entirely, all of the issues Discord mentioned in their post go away. This can still be coupled with a display name. Login occurs by using the e-mail address as primary identifier anyway.

A good friend code system needs to:

avoid generating identifiers close to each other;
be easily shared through more limited means of communication, such as human speech or websites with overzealous Unicode stripping.

By avoiding generating identifiers close to each other, friend codes become more distinct. It is difficult to tell 3843-3829-1892-3282 and 3843-3829-1882-3282 apart at a glance. Therefore, identifiers should avoid being close to each other. Techniques for avoiding close identifiers include a larger space (e. g. use an alphanumeric space like base 32) from which friend codes may be drawn or ensuring that a certain amount of digits have to differ and treating similar digits as identical for the purpose of this comparison. The U.S. National Institute of Standards and Technology (NIST) provides tools and a systematic approach to character similarity to compute visual similarity of top-level domains but also other strings. They also employ Levenshtein distances (and suggest exploring Damerau–Levenshtein and Jaro–Winkler as well as cosine distance among others). According to the included documentation with their code, it is not encumbered by copyright under 17 U.S.C. § 105. This kind of approach likely also works well for both numeric and alphanumeric friend codes.

If alphanumeric friend codes are generated by the system, care must be taken not to generate any sequences that are similar or identical to terms that may cause offense, such as medical conditions, crimes, slurs and swear words. The difficulty lies in doing this for dozens if not hundreds of languages at the same time. This does not imply that digits are inherently safe. For example, in Japanese culture it is possible to express entire sentences entirely just using digits by using numeric substitution. As a sequence of digits can be read in different ways, readings of numbers can be chosen to form words or sequences of words that may or may not be offensive. There is likely no winning here.

Visual distinctiveness can also be added by the use of color, albeit of little use for person who can only see a limited color space. Pokémon Showdown! has an extensively elaborate system [Warning: code linked is licensed under AGPLv3] involving the luminosity of the generated color to try and generate visually pleasant, yet distinct colors for different strings for the purpose of visually disambiguating usernames.

Since friend codes must be able to be shared through very limited side channels, it is also necessary to choose a fairly limited space. There is no room for using both lower-case and upper-case. These would not be distinct in human speech; adding the necessary disambiguations would be excessively cumbersome.

Ideally, friend codes would also be memorable. This is not a strict requirement (you can expect people to copy and paste their identifiers on the Internet or normally having their smartphones with them); people have also memorized phone numbers and ICQ numbers in the past. Sequences of words are easier to memorize than a sequence of digits or a random alphanumeric string. Made popular by xkcd 936 (correct horse battery staple), the concept of a word sequence to encode data is essentially the same as a base n encoding, just with a relatively large n and words instead of digits; this has been done at least as easly as 1995 with the S/KEY system described in RFC 1760. This requires a fixed word list, which ideally consists of words that are both visually and phonetically distinct. This, however, is problematic from an internationalization and also accessibility standpoint. The word list, necessarily, has to be either localized or kept the same for everyone. For example, when localizing, users who opt for a Swedish locale would get Swedish words. This not only re-opens the issue of being unable to share the word list, on platforms that aggressively remove Unicode characters, but makes it difficult to share friend codes in international settings, e. g. at an international conference; this issue is further enhanced by not every keyboard layout being able to input every character. Keeping the same word list for everyone presents other problems. It is not very useful for a person that does not or just barely speaks English to expect them to have an easy time remembering a sequence of English words; for them, you might as well have generated an unfathomably long random string. Additionally, this provides ample opportunity for bad PR in the form of being accused of Eurocentrism/Western-centrism. While seemingly an elegant solution on paper, it seems impractical.

Ideally, friend codes would also be self-correcting. A friend code should have a check digit (for example, as described in Chen, Y., Niemenmaa, M., and A. Vinck, "A general check digit system based on finite groups", Designs, Codes and Cryptography Vol. 80, pp. 149–163, DOI 10.1007/s10623-015-0072-8, 2015). This way, the error types of almost certainly a human entry error and friend code genuinely does not exist become distinct error codes that can have distinct messages in the user interface, while simultaneously easing the read load on the database, as an inexpensive check can first be performed in the usual case, which may also be performed to the client side. Additionally, error correction codes may be worth exploring.

Summary

The issue Discord presents with usernames can be solved with friend codes as well. The patents in this space have prior art dating back to at least the era of ICQ if not the introduction of the telephone, but may nonetheless be a hindrance.

A friend code system must find a balance between the allowed set of characters, memorability and causing no offense.