I came across a situation where I had to parse a list of e-mail addresses. E-mail clients these days take e-mail addresses in two forms: one showing the name of the individual as well as their e-mail address, and one with only the e-mail address.
When multiple e-mail addresses are listed, they are separated by commas, whether they're of the full form or of the simple form.
When I had to extract the list of e-mail addresses initially, I assumed only that I could separate them using commas. This would capture a list such as the following.
"Joshua Go" <firstname.lastname@example.org>, email@example.com
It would capture two e-mail addresses: "Joshua Go" <firstname.lastname@example.org> and email@example.com.
A problem arose when I came across one form of the full e-mail address that threw off my simple parsing technique: the occurence of e-mail addresses such as "Go, Joshua" <firstname.lastname@example.org>.
Since I am no master of regular expressions, and working with regular expressions in Java has somewhat been painful for me, I decided to review my EBNF parsing.
The following is the EBNF syntax, from what I know.
EmailAddressList = GeneralEmailAddress [ ',' EmailAddressList ] ;
GeneralEmailAddress = [ RecipientName ] '<' EmailAddressOnly '>'
| EmailAddressOnly ;
EmailAddressOnly = Username '@' Domain ;
My co-worker, Wilson, pointed out that I defined neither RecipientName, Username, nor Domain. For that, I cite the practical demands of industry as my explanation for not adhering to strict academic formality. I also omit it for clarity. Basically, assume that they'll just be alphanumeric (letters and numbers).
Perhaps in a later post, I'll put up the source code to the parser. As it stands, I've yet to move it over from being a test program to being integrated with the rest of our product.