| October 2002 | ||||||
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | ||
| Sep Nov | ||||||
This site is no longer maintained.
My current weblog.
I got POPFile working, at another user's suggestion I exported some mail folders as .csv text files instead of individual .msg binary files. So far I've put 5500 messages into the corpus, with about 5% being known spam and half of the remainder coming from 16 mailing lists.
I'm keeping all of my Outlook rules in place until I am confident in POPFile's classifications. I've added two rules for POPFile, for spam and a mailing list that I just joined. The new list is a good test of how quickly POPFile can be taught. The intial corpus was just 15 messages, so far it has correctly classified 3 out of 5 new messages.
One problem I see with teaching POPFile is that the web interface only allows for negative reinforcement, ie: this message is classified wrong, it should be this. For a small corpus, my gut feeling is that positive reinforcement would be more beneficial. There's probably a tipping point where that sort of feedback loop would have a negative affect on accuracy, but that is something for a math genius to figure out.
POPFile's author will be on TechTV today at 19:00 Eastern.
I wish I'd known about these types of filtering programs years ago. From 1998 until 2001, I would receive 10,000 messages on a normal day and several times that on really bad days. I needed over 100 Outlook rules to manage the chaos and focus my attention on the 5% that mattered to me. ifile was first released in late 1996.