Which data type would be best suited to a field that lists customer email addresses

I've always used varchar(320), but really it should probably be varchar(319). Here's why. The standard dictates the following limitations:

  • 64 characters for the "local part" (username).
  • 1 character for the @ symbol.
  • 254 characters for the domain name (I always believed 255, but I was wrong according to this errata - it's actually 256 minus the potential < surrounding angle brackets >).

Now, some folks will say you need to support more than that. Some folks will also say that you need to support Unicode for domain names (meaning you have to switch to nvarchar). While the standard may change in the meantime (it's been a while since I've had skin in the game), I am quite confident that at this time most servers in the world will not accept Unicode e-mail addresses, and I am sure many servers will have issues creating and/or accepting addresses with > 319 characters (and possibly > 254/255/256). A reasonable upper bound should be something more like 128 characters because, really, e-amil addresses longer than this simply aren't practical, even if they're automated out of some service.

That said, you can prepare for the worst now, if you like (and if you are using Data Compression in SQL Server 2008 R2 or better, you will benefit from Unicode compression, meaning you only pay the 2 byte penalty for characters that actually need it). This way you can make your column as wide as you want, and you can let people stuff any too-long junk in there that they want - they won't receive an e-mail if they give you junk just like they won't receive an e-mail if the insert fails. The problem is if you let invalid junk in, you have to deal with it. And no matter what size you make it - if someone will try to stuff 400 characters into a 319-character column, someone will try to stuff 1025 characters into a 1024-character column. There is no reason any sensible person should have an e-mail address > 319 characters unless they are using it to explicitly test system boundaries.

But I think we need to stop asking for opinions on this - and stop looking at other implementations for guidance (it just so happens in this case that the ones you referenced did not bother to do their own homework and just picked numbers out of their, well, you know). You have direct access to the standard - make sure you consult the most current version, support that as a minimum, and stay on top of the standard so you can adapt to changes in specs.


EDIT thanks to @ypercube for the ping in chat.

As an aside, perhaps you don't want to dump the whole address into a single column in the first place. Normalization might suggest that you don't want to store @hotmail.com 15 million times when a much skinnier FK int would work just fine and not have the additional overhead of variable length columns. You could also normalize the username, as and share a common username - they don't know each other but your database doesn't care about that.

I talked about some of this here:

  • Storing E-mail addresses more efficiently in SQL Server
  • Storing E-mail addresses more efficiently in SQL Server - Part 2

This introduces challenges however to the 254-character limit above, since there doesn't seem to be consensus about what happens when a valid 255-character domain is combined with a valid 1-character localpart. This should be accepted by most servers around the world but seem to violate this 254-character limit. So do you create a Domains table that has an artificially lower restriction on length for e-mail addresses, when the domain could be re-used as a valid 255-character URL?


EDIT There was a comment:

wanted to add this for postgres, dont use varchar(n) wiki.postgresql.org/wiki/…

perhaps you can use a check constraint on the email column instead

While I agree there are use cases for "unlimited" string columns, this isn't one of them. When you know the data domain from well-established standards, you should use them. The link talks about how if you choose poorly this could lead to errors for end users. So what? There is no reason to let people insert values outside the domain (e.g. an e-mail address that is 600 million characters long) just so they don't get an error for doing so. In fact I would argue that e-mail is precisely the kind of counter-example the link talks about.

Defining the column properly, in SQL Server at least, means you won't suffer from the documented performance penalties of max types or wasted memory due to varchar/nvarchar declarations too wide. While employing a check constraint to limit the length makes it easier to adjust the max length in either direction later, this doesn't seem to have any other benefit over proper column definition (the user gets an error either way).

What data type would be best suited to a field that lists customer email addresses?

Names and email addresses are always of the type string, while numbers can be stored as a numerical type or as string since a string is a set of characters including digits.

Do some nonrelational databases support SQL based languages?

The term NoSQL refers to data stores that do not use SQL for queries. Instead, the data stores use other programming languages and constructs to query the data. In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries.

Which programming language supports relational databases?

SQL or Structured Query Language is the primary interface used to communicate with Relational Databases.

Which of the following is a key aspect of database security according to the CIA triad security model choose two?

Confidentiality, integrity and availability together are considered the three most important concepts within information security. Considering these three principles together within the framework of the "triad" can help guide the development of security policies for organizations.