Meditations on identity

Who are you?

Of all the things one deals with in technology, I find identity to be particularly difficult for many people. Given this, I thought I would write down some reflections on identity in general and some historical and future developments in the space.

Technically speaking, an identity is little more than some group of claims we make about some entity, also called a principal. Put differently, an identity is not the principal itself, but rather the things one can claim about the principal. These claims allow us to distinguish one principal from another. A user ID or driver's license number may count as a claim, along with other attributes such as name, date of birth, or height. Identity in a computing context extends to running programs and machines as well; in these cases, attributes like hostname or process ID might be part of what identifies a principal.

As an example, consider a room full of people where everyone inside is named Alice. "This person's first name is Alice" is a perfectly valid claim, but does not do much to distinguish one Alice from another. To address this, we might add other claims to the mix. We might have an Alice with brown hair, or an Alice that wears glasses, or an Alice that was born on a particular day. We might even go so far as to give every Alice in the room a badge with a number and use that to refer to them. Regardless of the claims we choose, the goal remains the same of uniquely identifying an individual.

The primary purpose of an identity is to facilitate communication by enabling a participant in a system to trust, or know with some confidence, who the other participants in the system are. Communication, after all, can only work if one knows who is communicating with them. For this to work, an identity belonging to one participant must be something that another participant can verify to ensure it in fact belongs to whoever possesses it. Examples of this in regular life are easy to come by. Credit card processors generally require a CVV along with a card number when making a payment; office buildings have badge cards for each employee; and, in the US at least, a driver's license is a reliable photo ID for almost any occasion. (It is worth asking why driving is bound up with evidence of one's existence at all, but this is not the post for that.) Each of these follows the same basic flow where a person wishes to interact with another party on their own behalf to achieve some end, the party requests some evidence that the person is who they claim to be, and the person provides that evidence. This process, also known as authentication, has its counterparts in computing systems. Password logins and biometrics are not very different in principle from a driver's license.

Authentication schemes vary in the guarantees they offer, both when working with humans and computers. A photo ID may provide evidence of name and residence, backed by a state-issued ID number and photograph, but does not provide evidence of affiliation with some employer. A password may provide evidence that some party knows a secret corresponding to a username, but it makes no claims about whether the holder of the username and password is actually the legitimate principal to which the username refers.

Because of the contingent nature of trust, the participants in a communication system must agree on what kinds of guarantees are sufficient for their purposes, and must also communicate whatever is necessary establish trust. Trust in communication thus forms a system within a larger system that we can treat more or less independently from whatever parties communicate once trust is established. As I wrote previously, situations where many parties must agree on how to communicate often results in consolidation; it is easier to agree on a small set of practices than to endlessly renegotiate new ones.

On the Web, to use a proper W3C term, this has historically meant settling on usernames and passwords for authentication of users, with occasional extra factors (i.e., evidence) added like one-time codes and biometrics as needed. As the number of Web services that require authentication grows - apps, websites, and so on - this forces users to track identities and credentials for each service they use. It's a bit like adding keys to a keychain (itself a popular metaphor in identity management systems).

If there is some law of consolidation in communication, it applies here as well. Users frequently use small numbers of passwords across the web to minimize complexity in authentication. Web service providers by convention use a broadly consistent set of factors when authenticating users. The most interesting means of managing identity to me, however, is delegating the whole process to another party entirely.

Storing credentials that bind a user to an identity is generally risky for Web service providers. It makes them a valuable target for attackers, and users' frequent reuse of passwords on the Web means a credential stolen from one service is likely usable in other services. This is where we enter the world of the identity provider (IdP), a third party that acts as an intermediate between users and Web services for distributing evidence of one's identity.

Identity providers are nothing new in networked systems. As early as 1984, the International Telecommunications Union (ITU) began work on what would become the X.500 series of specifications for management of directory entries, analogous to a phone book for computer networks. The year 2005 brought SAML, or Security Assertion Markup Language, as a more Web-centric standard for an IdP to provide evidence of a user's identity to a Web service provider. OpenID Connect, standardized in 2014, is a comparatively new framework for distributing identity across the Web.

The general flow of authentication when using an intermediate identity provider works roughly like so. A service provider, wishing to authenticate a user, directs that user to an identity provider for some evidence of the user's identity. The identity provider authenticates the user through its own means - an existing session, username and password, and so on - and sends the user back to the service provider either with some identity directly or enough information for a service provider to retrieve it. From here, the user and service provider communicate directly without the identity provider getting involved.

It is important to note that a user that accesses a Web service using an external identity provider effectively cedes the power to prove their own legitimacy to that identity provider. This is true regardless of whether one is required to use such a provider for professional purposes or chooses to do so for convenience. In the former case, such a situation is generally expected, at least in places like the US. In the case of the latter, however, this means a user is in less control of their own ability to access the Web than they might think. A large platform like Google or Apple can disable or delete a user's identity for any reason they choose. Giving them the authority to do so is a choice, and one that should not be taken lightly.

Is this a good arrangement for users? I'm not sure there's one clear answer here. It is probably better to say that user identities living under the control of a few large parties is simply a likely outcome given the nature of communication in general. Memorizing passwords or maintaining a password manager is hard, but clicking a button is easy, especially if the button behaves consistently across the Web. It is even easier if that button is somehow tied into things one uses every day; it is likely not a coincidence that Apple and Google provide both identity providers and smartphone operating systems.

More recent developments may change how identity on the Web looks. Passkeys, for example, offer a standard for authenticating users based on public-key cryptography and external authenticators, including physical devices. Such a framework allows a user to hold physical proof of who they are, or to delegate the management of that proof (distinct from one's identity) to some credential manager. The rollout of passkeys shows some promise in managing the complexity of authentication without forcing users to be subordinate to some external identity provider.

Still, the same rules of consolidation apply. The WebAuthn standard is largely authored by Google, which owns a major identity provider and distributes a popular operating system; Microsoft, which does the same; and Yubico, which manufactures hardware security tokens. The net effect of passkeys will likely be to tie most users more closely to devices either operated or manufactured by large providers such as these.

All of this development continues against the background of actual states growing increasingly interested in the ability of these quasi-state entities to determine who is a legitimate actor on the Web. Recent legislation in the UK and US - ostensibly aimed at social media platforms - requires Web service providers to verify the ages of their users with a variety of methods, including state-issued identity documents. Laws like the UK Online Safety Act 2023 or Mississippi HB 1126 are in effect today and represent a rapid closing of the gap between one's digital and legal identities.

It is possible we are headed towards a future where large providers control the lives of people on the Web by tying their identities to devices those providers manage and render those same identities subordinate to the states in which their users reside. It is also possible that such an outcome does not come to pass for a number of reasons. Much of the legislation mentioned above has faced successful legal challenges, for one. Passkey standards are also not intrinsically bound to any one provider; anyone can manage their own keys. Finally, there is simply the changing reality of the Web itself. The large and growing share of Web traffic coming from automated agents (in the general sense, not the AI sense) may make state-backed or hardware-backed authentication infeasible to implement. In the meantime, a YubiKey 5C is about $55, which is a worthwhile price for me to have some sense of control over who I am online.