Parsing lists of e-mail addresses
I came across a situation where I had to parse a list of e-mail addresses. E-mail clients these days take e-mail addresses in two forms: one showing the name of the individual as well as their e-mail address, and one with only the e-mail address.
When multiple e-mail addresses are listed, they are separated by commas, whether they're of the full form or of the simple form.
When I had to extract the list of e-mail addresses initially, I assumed only that I could separate them using commas. This would capture a list such as the following.
It would capture two e-mail addresses: "Joshua Go" <joshua.go@playpure.com> and joshuago@gmail.com.
A problem arose when I came across one form of the full e-mail address that threw off my simple parsing technique: the occurence of e-mail addresses such as "Go, Joshua" <go.joshua@yahoo.com>.
Since I am no master of regular expressions, and working with regular expressions in Java has somewhat been painful for me, I decided to review my EBNF parsing.
The following is the EBNF syntax, from what I know.
My co-worker, Wilson, pointed out that I defined neither RecipientName, Username, nor Domain. For that, I cite the practical demands of industry as my explanation for not adhering to strict academic formality. I also omit it for clarity. Basically, assume that they'll just be alphanumeric (letters and numbers).
Perhaps in a later post, I'll put up the source code to the parser. As it stands, I've yet to move it over from being a test program to being integrated with the rest of our product.
When multiple e-mail addresses are listed, they are separated by commas, whether they're of the full form or of the simple form.
When I had to extract the list of e-mail addresses initially, I assumed only that I could separate them using commas. This would capture a list such as the following.
"Joshua Go" <joshua.go@playpure.com>, joshuago@gmail.com
It would capture two e-mail addresses: "Joshua Go" <joshua.go@playpure.com> and joshuago@gmail.com.
A problem arose when I came across one form of the full e-mail address that threw off my simple parsing technique: the occurence of e-mail addresses such as "Go, Joshua" <go.joshua@yahoo.com>.
Since I am no master of regular expressions, and working with regular expressions in Java has somewhat been painful for me, I decided to review my EBNF parsing.
The following is the EBNF syntax, from what I know.
EmailAddressList = GeneralEmailAddress [ ',' EmailAddressList ] ;
GeneralEmailAddress = [ RecipientName ] '<' EmailAddressOnly '>'
| EmailAddressOnly ;
EmailAddressOnly = Username '@' Domain ;
My co-worker, Wilson, pointed out that I defined neither RecipientName, Username, nor Domain. For that, I cite the practical demands of industry as my explanation for not adhering to strict academic formality. I also omit it for clarity. Basically, assume that they'll just be alphanumeric (letters and numbers).
Perhaps in a later post, I'll put up the source code to the parser. As it stands, I've yet to move it over from being a test program to being integrated with the rest of our product.
Comments