Personality plug-in for Eudora

The Personality plug-in for Eudora adds 3 stereotypical “personalities” to your Message Plug-ins menu in Eudora 3.x and 4.x for Windows. It translates English messages into either mock Swedish (like the Swedish Chef on the Muppet Show), mock Jive (a black dialect of the 70s), or B1FF-speak (a fictitious person who is “new” on the ‘Net who spells poorly and uses all capital letters).

Warning: you may find these personalities offensive. They may use profanity and other offensive slang. Use at your own risk.

To use the Personality plug-in for Eudora, you’ll need:

  • Windows 95, 98, or NT (sorry, Windows 3.x not available)
  • Eudora Light 3 or Eudora Pro 3 or 4
  • WinZIP (or some other program that will extract .zip files)

Download persplugin.zip for Windows 95, 98 and NT (25 kbytes). To install:

  • extract person32.dll from persplugin.zip into your \Program Files\Eudora\Plugins folder
  • restart Eudora
  • with a message open, chose Edit – Message Plugins – and then Swedish ChefJive, or B1FF to add personality.

Here is an example of an original message without any personality:

Dear Kerri,

How are you?  This was an everyday boring email message
until I decided to apply the Personality plug-in for Eudora.
Now my email is cooler and more fun to read than ever.  You
can even convert incoming email as well!  Just imagine how
much fun you'll have when you apply the Jive, the Swedish
Chef, or B1FF personality to those boring memos you get in
your Inbox.

-Michael

… and the same message with the Jive personality:

Dear Kerri,

How you is?  Dis wuz an everyday bo'in' email message until
ah' decided t'apply de Personality plug-in fo' Eudo'a.
Sheeeiit.  Now mah' email be coola' and mo'e fun t'read dan
ever.  Ah be baaad...  You's kin even convert incomin' email
as sheeit.  Right On!  Just imagine how much fun ya''ll
gots' when ya' apply de Jive, de Swedish Chef, o' B1FF
sucka'ality t'dose bo'in' memos ya' git in yo' Inbox.

-Michael

… and the same message with the Swedish Chef personality:

Deer Kerree,

Hoo ere-a yuoo?  Thees ves un iferydey bureeng imeeel
messege-a unteel I deceeded tu epply zee Persuneleety
ploog-in fur Ioodura.  Noo my imeeel is cuuler und mure-a
foon tu reed thun ifer.  Yuoo cun ifee cunfert incumeeng
imeeel es vell!  Joost imegeene-a hoo mooch foon yuoo'll
hefe-a vhee yuoo epply zee Jeefe-a, zee Svedeesh Cheff, oor
B1FF persuneleety tu thuse-a bureeng memus yuoo get in yuoor
Inbux.

-Meecheel

… and the same message with the B1FF personality:

DEER KERR1,

H0W R U?!  THI5 WA5 AN EVERYDAY BOR1NG EMAIL ME55AG3 UNTIL I
DECIDED 2 APPLY THE PERS0NALITY PLUG-1N 4 EUDORA.  N0W MY
EMAIL 1S C00L!!!ER & MERE FUN 2 REED THAN EVER!!  Y0U
CAN EVEN C0NVERT 1NC0M1NG EMA1L A5 WELLL!1!  JUST 1MAG1NE
H0W MUCH FUN U"L HAVE WHEN U APPLY THE J1V3, TH3 SWEDI5H
CHEF, 0R B1FF PERS0NALITY 2 TH05E BER1NG MEM0S U GET 1N
YOU'RE 1NB0X.

-MICHAEL

The Personality plug-in for Eudora borrows heavily from some good old Unix standbys. Credit should be given to:

Jive
jive.l lexer – (unknown origin)
Swedish Chef
chef.x lexer – convert English on stdin to Mock Swedish on stdout. Apr 15, 1992; John Hagerman.
B1FF
b1ff.x lexer – by Matt Welsh (mdw@tc.cornell.edu) +1 607 253 2737 modified and improved by David Whitten. 92/11/03.

Further information about B1FF, courtesy of Matt Welsh:

For the information of some:

B1FF is a fictitious person who is "new" on the 'net.

You can recognize his posts by:

  He always SHOUTS as he types.

  He sometimes spells FONET1KLEE (phonetically) the rest of the time, he  
    just misspells words or punctuates them incorrectly.

  He shortens some words (presumably for ease in typing)
   like '4' for 'for' or 'four'
   like '2' for 'two' or 'to' or 'too'
   like '&' for 'and'
   like 'U' for 'you' or 'ewe'
   like 'R' for 'are'
   like 'C' for 'sea' or 'see'

He can't see the difference between certain letters
       like '1' and 'i'
    or like '0' and 'o'
    or like '5' and 's'
    or like '2' and 'z'

  His speech is peppered with profanity.

  He never uses only one '!' or '?' when '!!!!!!!!' or '????!!!!' will do.

  He never uses the apostrophe (') and always uses the double quote (")
     instead.

  I have some vague memory of B1FF being mentioned in the New Hackers
dictionary, but I can't verify the validity of this.

  I can't even validate that his full name is NELSON 0TB1FF , but it sounds
correct.

Developers: download source code perspluginsrc.zip for MS Visual C++ 5.0 (75 kbytes).

Everyone: find other Eudora plug-ins at The FEL-X Eudora PlugIns & Add-Ins Site.

You might also check out John B. Chambers’ Chef/Jive/ValSpeak/Pig-Latin CGI form.

Java Network File System

The Java Network File System was my Honors Thesis project at Brown University and second place winner for the 1997 ACM Quest for Java student programming contest.

README

See my contest README for an overview of the project and instructions for running the server and client.

Paper

We introduce and discuss the Java Network File System ( JNFS), a network file system for Network Computers (NCs). JNFS works on all NC-compliant NC devices, provides authentication and authorization support, works with other file systems such as NFS and NTFS, and offers reasonable performance. 16 pages.

Source code

  • jnfs_src.tar.gz — source code, test suites, sample client applet, etc. (49K)
  • jnfs.jar — signed java archive of classfiles.
  • jnfs.x509 — X509 certificate for jnfs entity.

API Documentation (generated with javadoc)

Related work

What I learned from this project

The greatest disservice one can do to Java is to perpetuate the belief that it is merely a programming language for the Web. Once we begin to think of Java as a general-purpose programming language, people will use it to build real applications and systems. Only then will we really see that object-orientation, garbage collection, and run-time safety make code easier to develop, debug, maintain, and understand.

— Michael J. Radwin, self-proclaimed Java evangelist


Nearest-Neighbor Machine Learning Bakeoff

Introduction

This document presents the results from my Topics in Machine Learning final project at Brown University, written in December 1996. The project was a “bakeoff” for machine learning algorithms that learned classification problems (categories instead of continuous functions). I discuss my implementation and results for Nearest-Neighbor and k-Nearest-Neighbor, two well-known machine learning algorithms, on three classification datasets. In addition, I introduce and discuss the Agglomerative-Nearest-Neighbor algorithm, a variation of nearest-neighbor that clusters points together.

Datasets

Three classification datasets from the UC Irvine Machine Learning Repository [3] were chosen for testing:

  • Wisconsin Breast Cancer from Dr. William H. Wolberg at the University of Wisconsin Hospitals. Records in the dataset represent the results of breast cytology tests and a diagnosis of benign or malignant. Attributes include clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. All values were integers in the range of 1-10.
  • BUPA Liver Disorders from Richard S. Forsyth at BUPA Medical Research Ltd. Each record in the dataset constitutes the record of a single male individual. The six attributes in each record are results from blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption, as well as the number of drinks per day.
  • Wine Recognition Data from Forina, M. et al at the Institute of Pharmaceutical and Food Analysis and Technologies. Each record represents results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wine.

Nearest-Neighbor

The nearest-neighbor algorithm is extremely simple to implement and leaves itself open to a wide variety of variations. In brief, the training portion of nearest-neighbor does little more than store the data points presented to it. When asked to make a prediction about an unknown point, the nearest-neighbor classifier finds the closest (according to some distance metric) training-point to the unknown point and predicts the category of that training-point.

Put more formally, given a collection of points S and a query point q, what is the point x closest to q in S? The nearest-neighbor classifier predicts that the category of q is the same as the category of x.

Although nearest-neighbor can be efficiently implemented with a Vornoi diagram, I took the brute force approach. This method computes the distance from q to every point xi in S.

Distance Metric

I chose to use plain Euclidean distance for the distance metric in all three algorithms. The distance from point x to q (where both x and q are d-dimensional vectors) is:

d(xq) = sqrt((x – q)T(x – q))

K-Nearest-Neighbor

The k-nearest-neighbor algorithm is a variation of nearest-neighbor. Instead of looking at only the point x that is closest to q, we instead look at the k points in S = {x1x2, … , xk} that are closest to q. Since we know the categories of each of the k nearest points, we find the majority category in this subset. If it exists, the query point q is predicted to be in the that category. If no majority exists, the result is considered inconclusive.

K-nearest-neighbor is more general than nearest-neighbor. Put another way, nearest-neighbor is a special case of k-nearest-neighbor, where k = 1. For the tests in this project, I chose k = 3, k = 5, andk = 7.

Agglomerative-Nearest-Neighbor

The agglomerative-nearest-neighbor algorithm is a local learning algorithm for classification. The idea is to capture the essence of nearest-neighbor without having to store all of the points. As the classifier sees new training points, it clusters points points together that predict the same category. The approach is named agglomerative-nearest-neighbor because points are put together in clusters as the learning process goes on. Once the clusters are formed, the prediction for unknown points (discussed below) is very similar to nearest neighbor.

Training

A cluster xi = (cirigi) where ci is the center of a hypersphere with radius ri, predicting category gi. During the training stage, a new cluster (prnewg) is formed for each training data (pg), where rnewis a default radius for new clusters. For my tests, I arbitrarily chose rnew = 1.0.

Once initialized, the new cluster is then compared to every cluster in the system. Two clusters that are close together (within some epsilon of each other according to the distance metric d) and classify the same category are combined to build a new aggregate cluster (a hypersphere of a larger radius). If the new cluster is not close to any other clusters, it remains by itself. For my tests, I chose epsilon = 2.15.

When two clusters are merged together, the center and the radius are adjusted to be the average of the two clusters. If the center of cluster xa is inside another cluster xb, the inner cluster xa is removed from the system and the outer cluster xb remains unchanged.

Pre-processing and normalization

Since the Euclidean distance function treats every dimension equally, it may be necessary to normalize the data. By analyzing all of the training data in a pre-processing stage, we can determine the range of every attribute and transform the entire dataset appropriately [1]. Normalizing every point involves scaling every attribute between 1 and n. I chose n = 10.0 to facilitate merging of clusters. Forepsilon = 2.15, n << 10.0 would result in overlapping of conflicting classifiers (discussed below) and n >> 10.0 would result in degeneration to nearest-neighbor (also discussed below).

For some of the test datasets, normalization greatly decreased the misclassification rate and for others it actually contributed to more misclassified points. The wine recognition dataset, for example, had a wide variation among the ranges of its thirteen attributes. One attribute had values in the range of 0.1 – 1.7, while another ranged from 278 – 1403. Without pre-processing, the large-valued attribute dominated all distance comparisons, so performance was poor.

Normalization of the bupa liver disorders dataset, on the other hand, increased the misclassification rate. In this case, one or more attributes with large values probably contributed more to the classification of the data than the other attributes. By scaling all attributes equally, the relative importance of these attributes was diminished and other attributes were given equal weight in the clustering.

Advantages: space-efficiency

The goal of agglomerative-nearest-neighbor was to take advantage of the redundancy of data and save memory over nearest-neighbor. This goal was achieved with varying success depending on the dataset.

For the breast-cancer-wisconsin dataset, agglomerative-nn stored an average of only 45% of the points needed by nearest-neighbor and k-nearest-neighbor. For the normalized bupa and wine datasets, agglomerative-nn stored an average of 30% and 92%, respectively, of the points needed by nn and k-nn. The lesser memory savings for the normalized wine dataset are likely due to the compactness of that data; there was little redundancy to be removed by clustering data points.

Space efficiency suffered on the datasets that were not pre-processed. For bupa, 99% of the data was needed, and for wine 100% of the data was needed. As discussed below, agglomerative-nn degenerated into nearest-neighbor in these cases because the epsilon was set too high (clusters never merged). Looking at the error rates for these sets shows that agglomerative-nn behaved quite similarly to nearest-neighbor.

Problems with the approach

Space-efficiency and accuracy tradeoff

Since the agglomerative-nn algorithm throws away some data, performance cannot be any better than nearest-neighbor. In other words, the algorithm trades some accuracy for space-efficiency. This effect is most notable in the normalized bupa dataset: as noted above, agglomerative-nn used only 30% of the data-points needed by nearest-neighbor, but misclassified over 10% more query points. See theresults section for specifics.

Overlap of clusters

In the current implementation, clusters occasionally grow even when doing so overlaps with nearby clusters that classify differently. Suppose two nearby clusters xm and xn that classify into categories Cmand Cn. Clearly agglomerative-nn wouldn’t merge them. However, what happens when we introduce new training data xnew for category Cm that is right between xm and xn and is within the epsilon of both? Since xm predicts the same category as the new data, the radius of xm‘s hypersphere grows a little to accommodate the new data. Eventually, with enough xnew‘s of this type, either xm or xn could grow so that it overlaps the other nearby cluster. Then, query points in the area of the overlap could easily be misclassified.

One solution to the overlap problem might be to avoid merging clusters that are close to other clusters that classify differently. That is, if xnew causes xm to grow, check to see if xm overlaps xn. If so, leavexnew as a separate cluster. This avoids overlap and blurring of the area of expertise.

Degeneratation

Another interesting problem is that agglomerative-nn degenerates into nearest-neighbor when all of the training points are far away from each other. That is, if no two points are within some epsilon of each other, they stay as separate clusters instead of being merged together. For every dataset there’s probably an ideal epsilon, but if it is chosen to be too large, clusters will never merge and we’re left with nearest-neighbor.

If epsilon is chosen too low, then every point of a given class is merged with every other point of that class; the system degenerates into c clusters where c is the number of categories in the dataset. Even for a linearly separable dataset, overlap of clusters is bound to occur because clusters are hyperspheres, not ellipsoids or rectangular prisms.

Test results summary

Eight training and test runs were performed for each algorithm using a randomly chosen 90% of the data for training and 10% for testing. With the bupa and wine datasets, tests were run with both the plain data as it appeared in the dataset, as well as normalized.

The average misclassification rates for several datasets on each of the three algorithms are reported below. As predicted, the agglomerative-nn approach had a higher misclassification rate than nearest-neighbor for all datasets.

  • breast-cancer-wisconsin, breast cancer data
    • nearest-neighbor: 3.13%
    • 3-nearest-neighbor: 2.02%
    • 5-nearest-neighbor: 1.84%
    • 7-nearest-neighbor: 1.84%
    • agglomerative-nearest-neighbor: 5.7%
  • bupa, liver disorders
    • nearest-neighbor: unweighted 40.81%, normalized 37.5%
    • 3-nearest-neighbor: unweighted 37.5%, normalized 39.34%
    • 5-nearest-neighbor: unweighted 39.34%, normalized 38.24%
    • 7-nearest-neighbor: unweighted 35.29%, normalized 40.07%
    • agglomerative-nearest-neighbor: unweighted 38.6%, normalized 48.53%
  • wine, wine recognition
    • nearest-neighbor: unweighted 24.26%, normalized 2.21%
    • 3-nearest-neighbor: unweighted 27.21%, normalized 2.94%
    • 5-nearest-neighbor: unweighted 24.26%, normalized 1.47%
    • 7-nearest-neighbor: unweighted 24.26%, normalized 1.47%
    • agglomerative-nearest-neighbor: unweighted 33.82%, normalized 2.94%

Conclusions

The agglomerative-nearest-neighbor algorithm worked quite well with very redundant datasets. Although the average misclassification rates were higher with agglomerative-nn than nearest-neighbor for the breast cancer and wine recognition datasets, the memory savings were quite noticable. Other redundant datasets would probably see similar improvements.

Unfortunately, agglomerative-nn is substantially slower than nearest-neighbor in the training stage, since nearest-neighbor merely stores points during training. Since agglomerative-nn compares the new cluster to every other cluster in the system, worst case time-complexity is O(N2). For large datasets (more than 1000 records), agglomerative-nn is too slow to see real-time results. With nonredunant datasets, an efficient nearest-neighbor implementation would be preferred over agglomerative-nn for both speed and accuracy.

Choice of parameters is an area that needs more exploration. Although the choice for epsilon = 2.15 was a function of the values used for rnew and the normalization factor (1.0 and 10.0, respectively), these values were chosen somewhat arbitrarily. Varying rnew and the normalization factor with the number of attributes in a dataset might lead to better accuracy because as dimensionality grows, a radius of 1.0 becomes less and less significant.

Some more careful weighting of data in the pre-processing stage could also lead to better accuracy in all three algorithms. Heavily weighting those attributes that more greatly contribute to the classification of points would skew the distances more in the dimension that matters most.

References

  1. Bishop, C. 1995. Neural Networks for Pattern Recognition, p. 6-7. Oxford: Clarendon Press.
  2. Schaal, S., and Atkeson, C. 1996. From Isolation to Cooperation: An Alternative View of a System of Experts. In: Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., eds., Advances in Neural Information Processing Systems 8, pp. 605-11. Cambridge, Massachusetts: MIT Press. ftp://ftp.cc.gatech.edu/pub/people/sschaal/schaal-NIPS95.html.
  3. Merz, C.J., & Murphy, P.M. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

Acknowledgements

Thanks to Prof. Leslie Pack Kaelbling for her suggestions on the paper and guidance in the machine learning seminar. Thanks also to Fred Piccioto and Jason Lango for algorithmic suggestions, and to Mike Cafarella for coining the name “agglomerative-nearest-neighbor”.

 


Source code (in Common Lisp)

  • learning.lsp, the nearest-neighbor, k-nn, and agglomerative-nn learning algorithms.
  • test.lsp, automated tests for the three algorithms.
  • datapoint.lsp, definitions for the data structures used in learning.
  • fileio.lsp, code to snarf up text files.

Entire project

You can grab all of the code, perl scripts for munging data into lisp-readable format, datasets, and results in a gzip’ed tarfile:

 

wine results (Nearest-Neighbor Machine Learning Bakeoff)

Wine Recognition Data from Forina, M. et al at the Institute of Pharmaceutical and Food Analysis and Technologies. Each record represents results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wine.

nearest-neighbor

unweighted data

results for nearest-neighbor on wine.

 

normalized data

results for nearest-neighbor on wine-norm.

 

k-nearest-neighbor

unweighted data

results for k-nearest-neighbor on wine.

 

normalized data

results for k-nearest-neighbor on wine-norm.

 

agglomerative-nearest-neighbor

unweighted data

results for agglomerative-nearest-neighbor on wine.

 

normalized data

results for agglomerative-nearest-neighbor on wine-norm.

 

bupa results (Nearest-Neighbor Machine Learning Bakeoff)

BUPA Liver Disorders from Richard S. Forsyth at BUPA Medical Research Ltd. Each record in the dataset constitutes the record of a single male individual. The six attributes in each record are results from blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption, as well as the number of drinks per day.

nearest-neighbor

unweighted data

results for nearest-neighbor on bupa.

 

normalized data

results for nearest-neighbor on bupa-norm.

 

k-nearest-neighbor

unweighted data

results for k-nearest-neighbor on bupa.

 

normalized data

results for k-nearest-neighbor on bupa-norm.

 

agglomerative-nearest-neighbor

unweighted data

results for agglomerative-nearest-neighbor on bupa.

 

normalized data

results for agglomerative-nearest-neighbor on bupa-norm.

 

breast-cancer-wisconsin results (Nearest-Neighbor Machine Learning Bakeoff)

Wisconsin Breast Cancer from Dr. William H. Wolberg at the University of Wisconsin Hospitals, Madison. Records in the dataset represent the results of breast cytology tests and a diagnosis of benign or malignant. Attributes include clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. All values were integers in the range of 1-10.

nearest-neighbor

results for nearest-neighbor on breast-cancer-wisconsin.

 

k-nearest-neighbor

results for k-nearest-neighbor on breast-cancer-wisconsin.

 

agglomerative-nearest-neighbor

results for agglomerative-nearest-neighbor on breast-cancer-wisconsin.