Colorizing Strings

Metadata:
Published: 2023-11-12
Last modified: 2023-11-12 (typo fixes)


Context

In my previous post Friend Codes in the Context of Social Networks: A Concrete Scheme, I proposed a scheme for friend codes as primary user identifiers on social networks. Part thereof was a suggestion of coloring the friend codes as a way of obvious typo identification. This post explores, after having implemented such a thing, various alternatives and collects ideas for useful implementation. This post will assume at points that you are familiar with the concept of the modulus operator and with common techniques of modular reduction.

Hash function

The obvious idea to colorize strings is to use a hash function to turn it into a uniformly distributed number. The property of uniform distribution is desirable here, such that every color has roughly same chance of coming up. What hash function to use for strings depends on your threat model.

For example, my proposed scheme for friend codes involved friend codes chosen at random. No attacker could possibly interfere with the string generation, so I had no reason to account for adversarial hash collisions and thus I thought I also no reason to expect adversarial color collisions. This hinges on the idea that no two visually similar strings will generate visually similar colors. Ex post, I now realize that I should probably have verified this property. Because of my lax threat model, I chose a trivial polynomial hash of the form: i = 0 n 200105 x i ( mod 2 35 31 )

The multiplier and modulus were chosen in accordance with L’Ecuyer’s tables, which address essentially the same issue. The modulus was chosen such that the largest multiplication result for the hash is still representable in a JavaScript number.

In hindsight, this was a suboptimal choice. The Bundesamt für Sicherheit in der Informationstechnik (BSI) recommends in BSI TR-03111 to generate a number at least 64 bits greater than the number of bits required to reduce modulo bias to a level it is no longer exploitable. Given the threat model, this is unlikely to become a problem, but it is somewhat of a slight.

In a more adversarial context (e.g. colorizing usernames chosen by people directly), colors also aid in disambiguating identity. It is therefore desirable that the hash values for two people with similar names are distinct. There are hash functions specifically designed for avoiding hash collisions, namely SipHash, but this depends on having a secret key that no attacker can learn. In my toy implementation on my static website, this would obviously have been impractical because I could not have any server state. However, in an actual scheme, this would probably be an ideal choice.

Revisiting Pokémon Showdown!

Now that I have a number by means of a hash function, how does one turn it into color? I essentially reimplemented the Pokémon Showdown! code I’d linked in the previous post, which I will summarize here.

  1. Obtain three hash values. Pokémon Showdown! hashes the string with MD5 and then takes 16 bits for each of the three hash values.
  2. Reduce the first hash modulo 360 to get a hue in the HSL (hue, saturation, lightness) color model.
  3. Reduce the second hash modulo 50 and add 40 to it to get the saturation in the HSL color model.
  4. Reduce the third hash modulo 20 and add 30 to it to get the lightness in the HSL color model.
  5. Based on luminance (computed in a way I could not find documented anywhere), darken or brighten by as much as appropriately 25 %; the greatest darkening is 26.67 % for the brightest yellow at HSL (60, 89, 49); the greatest brightening is 25 % for the darkest blue at HSL (240, 40, 30).
  6. Apply extra brightening to any hues within 15 steps of hues 180 and 240, by dividing the distance from these colors by three and adding that to the lightness.

The modulus operations have clear modulo bias with reducing three 16-bit integers down to a 9-bit, 6-bit and 5-bit integers, though it probably does not matter in practice; it is merely inelegant. More importantly, it should be observed how much effort is spent on adjusting perceived lightness in two steps. This issue stems from the fact that the HSL color model has large variances in perceived lightness, even at constant values for saturation and lightness.

Alternative color models have existed for years, particularly uniform color spaces, which try to address this exact issue. In the meantime since the Pokémon Showdown! code was originally written, in particular the Oklab color space has emerged to address this issue, while being reasonably simple to implement. Not all values of Oklab can be mapped back to standard RGB (sRGB) colors due to limitations of RGB. In recent years, all major browsers except the defunct Internet Explorers have implemented it.

If I were to greenfield an implementation today, I would probably just use OkLCh and keep the lightness as well as chroma values constant or with some very small amount of variation. That way, the colors will be easily largely uniformly bright with no need for guesswork how to fix the HSL space. Manual computation to convert back to sRGB for browsers that don’t support it is also possible.

Guessing color names

The last feature I proposed in my previous post was providing a readable color name as a trivially visible indicator whether the friend code is actually the correct one. I ended up implementing a hilarious method of Euclidean distance of sRGB values between ones that I pre-generated effectively at random and that seemed to me like yeah, that’s pretty red, good enough to represent the color red. Interestingly, doing this in CIELAB (which is another color space that was meant to be perceptually uniform, but ended up having shortcomings in its uniformity) or Oklab color spaces ended up with worse detection rates.

Incidentally, over the course of implementing this in CIELAB, I found that the formulæ on Wikipedia for converting from CIEXYZ (which is yet another color space) are slightly misleading. The page gives specific values for Xn, Yn and Zn. However, they need to be either divided by 100 or require CIEXYZ coordinates on a 0–100 scale instead of a 0–1 scale. Figuring this out ended up costing me a significant amount of time. Yet I ended up not fixing the Wikipedia page.

However, as several people after the publication of my last post told me, in the context of friend codes generated by a trusted server, it would have been equally easy to just use a fixed list of e. g. 16 colors and pick between them. This is obviously the more reasonable approach, but I just wanted to have an excuse to reimplement the Pokémon Showdown! random name generation code. In the course of which, I was forced to learn way more about color spaces than I ever wanted to. I hope that I have to learn no more about them.

Conclusion

Hash functions are a fuck. Colors are a fuck.