From Situation #158,
Windows 7 Professional
June 2007
The net Engineering Process Power (IETF) document, RFC 3696,
Office 2007 Keygen, “Application
Techniques for Checking and Transformation of Names” by John
Klensin,
presents a number of valid e-mail addresses which are rejected by many PHP
validation routines. The addresses:
Abc\@def@example.com,
customer/department=shipping@example.com and
,
Office 2007 Download!def,
Office 2010 Professional Plus!xyz%abc@example.com
are all legitimate. Among the far more common regular expressions identified inside the
literature rejects all of them:
"^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)
↪*(\.[a-z]2,3)$"
This typical expression allows only the underscore (_) and hyphen
(-) characters, numbers and lowercase alphabetic characters. Even
assuming a preprocessing step that converts uppercase alphabetic
characters to lowercase, the expression rejects addresses with
legitimate characters, such as the slash (/), equal sign (=), exclamation
point (!) and percent (%). The expression also requires that the
highest-level domain component has only two or three characters, thus
rejecting valid domains, such as .museum.
Another favorite regular expression solution is the following:
"^[a-zA-Z0-9_.-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$"
This typical expression rejects all the valid examples from the preceding paragraph.
It does have the grace to allow uppercase alphabetic characters, and
it doesn't make the error of assuming a high-level domain name has only
two or three characters. It allows invalid domain names, such as
illustration..com.
Listing 1 shows an example from PHP Dev Shed (www.devshed.com/c/a/PHP/Email-Address-Verification-with-PHP/2).
The code contains (at least) three errors. First,
Office 2010, it fails to recognize
several legitimate e-mail address characters, such as percent (%). Second, it
splits the e-mail address into user name and domain parts at the at sign
(@). E-mail addresses that contain a quoted at sign, such as
Abc\@def@example.com will break this code. Third, it fails to check
for host address DNS records. Hosts with a type A DNS entry will accept
e-mail and may not necessarily publish a type MX entry. I'm not
picking on the author at PHP Dev Shed. More than 100 reviewers gave
this a four-out-of-five-star rating.