Thursday, February 22, 2007

Parsing lists of e-mail addresses

I came across a situation where I had to parse a list of e-mail addresses. E-mail clients these days take e-mail addresses in two forms: one showing the name of the individual as well as their e-mail address, and one with only the e-mail address.

When multiple e-mail addresses are listed, they are separated by commas, whether they're of the full form or of the simple form.

When I had to extract the list of e-mail addresses initially, I assumed only that I could separate them using commas. This would capture a list such as the following.

"Joshua Go" <joshua.go@playpure.com>, joshuago@gmail.com

It would capture two e-mail addresses: "Joshua Go" <joshua.go@playpure.com> and joshuago@gmail.com.

A problem arose when I came across one form of the full e-mail address that threw off my simple parsing technique: the occurence of e-mail addresses such as "Go, Joshua" <go.joshua@yahoo.com>.

Since I am no master of regular expressions, and working with regular expressions in Java has somewhat been painful for me, I decided to review my EBNF parsing.

The following is the EBNF syntax, from what I know.
EmailAddressList    = GeneralEmailAddress [ ',' EmailAddressList ] ;
GeneralEmailAddress = [ RecipientName ] '<' EmailAddressOnly '>'
| EmailAddressOnly ;
EmailAddressOnly = Username '@' Domain ;

My co-worker, Wilson, pointed out that I defined neither RecipientName, Username, nor Domain. For that, I cite the practical demands of industry as my explanation for not adhering to strict academic formality. I also omit it for clarity. Basically, assume that they'll just be alphanumeric (letters and numbers).

Perhaps in a later post, I'll put up the source code to the parser. As it stands, I've yet to move it over from being a test program to being integrated with the rest of our product.

Friday, February 9, 2007

Strained relationships and great undertakings

Last night, I picked up a book and read the preface at the beginning of the book, and the "Special Thanks" section caught my attention. The author thanked his wife and daughter for "putting up" with the authoring process. This isn't the first time I've seen that kind of thing written in a preface.

Does writing a book necessarily have to put a strain on the author's family?

Thursday, February 8, 2007

Arrays in Visual Basic and classic ASP

For the programmer who is used to C-like syntax, working with arrays in Visual Basic or classic ASP can be aggravating.

In this post, I will briefly go over declaring single- and multi-dimensional arrays, then iterating through them — the basic operations that make arrays useful.

One-dimensional arrays


Let's declare an array with six elements.
Dim OneDimArray(5)

Yes, that says "5", but it has six elements. When we're going through the elements of this array, we'll start counting from zero and end at five.

Iterating through one-dimensional arrays


For i = 0 to UBound(OneDimArray)
Response.Write(i)
Next

There will be six elements iterated through.

General notes about arrays in Visual Basic


So far, we're left with the impression that Visual Basic is a strange language. When we declare arrays in VB, the real size is the declared array size plus 1.

If you're used to programming in a C-like programming language such as C++ or Java, it's the declared array size, period — although you still start counting at zero. The following would give you a five-element array of integers in C++ and Java.
int one_dim_array[5];

You would access the elements of this C++/Java array with one_dim_array[i] where i = 0,1,...,4. Accessing it with index 5 would take you outside the bounds of the array.

In Visual Basic, however, you get a six element array when you declare the following.
Dim OneDimArray(5)

You can access OneDimArray(i) with i = 0,1,...,5.

Multi-dimensional arrays


The following is a declaration of a two-dimensional array.
Dim TwoDimArray(4,2)

This declaration would give us an array with five rows and three columns.

Here's how a three-dimensional array is declared.
Dim ThreeDimArray(5,6,7)

This gives us a 6 by 7 by 8 array. That's (5+1) by (6+1) by (7+1) because of Visual Basic's array syntax.

Finally, we'll generalize into the case of an n-dimensional array.
Dim EnDimArray(x_1, x_2, ..., x_n)

Iterating through multi-dimensional arrays


Here's how you'd iterate through a two-dimensional array.
For i = 0 to UBound(TwoDimArray)
For j = 0 to UBound(TwoDimArray, 2)
Response.Write(i & "," & j)
Next
Next

The major difference between iterating through this and iterating through a one-dimensional array is that we called UBound() with two arguments instead of one. This is so that UBound() knows which dimension to look up the upper bound for. In this case, our inner loop stops at the upper bound of the second dimension — that's why we specified the "2" there.

For three dimensions, the logic is similar.
For i = 0 to UBound(ThreeDimArray)
For j = 0 to UBound(ThreeDimArray, 2)
For k = 0 to UBound(ThreeDimArray, 3)
Response.Write(i & "," & j & "," & k)
Next
Next
Next

In the innermost loop, we specified that we needed to look up the upper bound for the third dimension.

Finally, the general form for iterating through an n-dimensional array's n dimensions.
For i_1 = 0 to UBound(EnDimArray)
For i_2 = 0 to UBound(EnDimArray, 2)
...
For i_n = 0 to UBound(EnDimArray, n)
Response.Write(i_1 & "," & ... & "," & i_n)
Next
...
Next
Next

Why we need UBound()


The UBound() function can be called with either one argument (just the array name), or with two arguments: the array name, and a number representing which dimension we want to count the upper bound on.

Without UBound(), we can't know the upper bound for the particular dimension we're looping through at the moment.

Wednesday, February 7, 2007

Accented characters with a US keyboard in X11

I've always been too busy to figure out how to map the useless Windows flag keys on my keyboard to do something useful in Linux/X11.

On traditional Unix systems, there's a Compose key. According to Wikipedia, "On some computer systems, a compose key is a key which is designated to signal the software to interpret the next keystrokes as a combination in order to produce a character not found on the keyboard."

To see what the Windows flag and menu keys are mapped to, I ran the following.

xmodmap -pk | grep 11{5,6,7}


This resulted in the following output:


115 0xff20 (Super_L)
116 0xff20 (Super_R)
117 0xffcc (Menu)


This output told me that the flag keys (left and right) were free to map to Multi_key. I figured I would leave the menu alone.

Next, I had to perform the remapping. I created a file, .Xmodmap, which is sometimes already there in a user's home directory. For me, it wasn't, so I went ahead and created .Xmodmap with these contents:

keycode 115=Multi_key
keycode 116=Multi_key


I then refreshed my keyboard mapping by running:

xmodmap .Xmodmap


To see if the re-mapping worked, I held the flag key down while pressing the n key, and typed in a tilde (~). The resulting character was the ñ character. Presto!

Friday, February 2, 2007

Goals are good for people who tinker

I like to tinker so that I know how every little bit works. I also like the feeling of knowing I've achieved something. I'm sure it's already apparent to the reader that these tendencies sometimes come into conflict with each other.

My tinkering leads to more extensive knowledge. Undoubtedly, it has helped me many times in the past, and I continue to reap the benefits of my past fiddling.

Still, I hold myself to strict standards of productivity, and I become frustrated when I can't do enough in one day. Setting clear and reasonable goals for myself allows me to satisfy both my hankering to tamper and my drive to have something to show.