Bryce's Radio Experiments
The Intersection of PDAs, Wireless, Radio, and CSS.

Permanent Link Monday, November 04, 2002

Moving

I'm finally taking the plunge and switching to Movable Type. I'm not going to bother importing this weblog, at least not initially. Too much work for too little benefit. My archives can stay here indefinitely.

My new home page and weblog. Feeds are available in RSS 0.91, RSS 1.0, and RSS 2.0 flavors. I'm not going to set up RSS redirects. I don't like Userland's solution because any aggregator that doesn't understand the format will barf on it. HTTP 301 redirects are better supported, but I don't feel like reconfiguring Apache to allow .htaccess files.

For the couple of people that subscribe to my category feeds, I'll get around to re-creating those eventually. Stay subscribed to the current feeds and wait for an update.

4:15:07 PM | Comments: | Topics: movable_type radio 

Permanent Link Saturday, November 02, 2002

Distributed Tivo cracking

With the newest version of the TiVo software (Version 3.2), TiVo has once again changed the secret password to enter "backdoor" mode, which lets advanced users enable hidden features. Unlike last time, people were not able to quickly find the new code, so a distributed computing project was started to find the backdoor codes. You can read about it Here, grab the Linux or Windows clients and pitch in some CPU time for a good cause." [Slashdot]

11:02:59 AM | Comments: | Topics: pvr 

Permanent Link Friday, November 01, 2002

Not moving Radio

Lots of things blew up while I was trying to move Radio to another PC. I'll give it another shot once I figure out why Radio is crashing whenever I try to compress weblogData.root.

11:55:10 PM | Comments: | Topics: radio 

Peeve: Large Screenshots

Why am I seeing so many screenshots lately that are so large that I have to scroll my maximized browser window on a 1280x1024 display? Is there any application that truly requires 750,000+ pixels to demonstrate a single screen or task? Give my mouse hand a friggin' break.

800x600 ought to be enough for anything.

11:52:06 PM | Comments: | Topics: annoyances rants 

Moving Radio

I'm about to try moving my Radio installation to another computer, including moving Radio.root.

Crossing my fingers that this doesn't blow up...

1:52:50 PM | Comments: | Topics: radio 

More spam stuff

Microsoft Research published A Bayesian Approach to Filtering Junk E-mail way back in 1998. The Adaptive Systems and Interaction group has done quite a bit of research on filtering and classification.

Amy Wohl is ranting about Spam and Irresponsible ISPs. I find it interesting that a web hosting provider is using a blacklist to reject incoming mail. Is this a transparent feature that nobody knows about, or do customers have to opt-in? In my experience, web hosting customers are an unforgiving lot when it comes to anything that interferes with email.

I've always been anti-blacklist because they are subject to so much abuse. They've mostly been run by zealots demanding that commercial messages not be transferred across the Internet unless it was explicitly asked for, twice (double opt-in). Stopping spam isn't enough for them, they want to rid the world of all unwanted email. These are not the same problem.

Collaborative filtering falls into the same trap. SpamNet lets users "vote" for spam and against false positives. Supposedly a trust metric is employed to determine which votes are meaningful. The whole thing boils down to a popularity contest and unwanted non-spam messages are unpopular. The risk of a damaging false positive seems low, but I still have to waste time examining the messages that SpamNet captures.

In my view, both approaches to fail because the end-user isn't empowered. Blacklists put the power with the maintainers, collaborative techniques shift it to the masses, but neither gives the power to me. Bayesian filtering and other methods that adapt to my mailbox are the ticket.

1:49:39 PM | Comments: | Topics: bayesian spam 

POPFile, Part III

The mail parser has been updated to handle Outlook .MSG files.

There's a thread on corpus drifting that covers my thoughts on using positive reinforcement to help POPFile to learn. On the mailing list I am training POPFile on, it has missed 3 of 22 messages today. I'm thinking that POPFile needs about 100 messages in the corpus to get accuracy into the high 90s for mailing lists.

On the spam front, I seem to be in the middle of a drought. POPFile has missed 1 of 5 messages since yesterday.

I've found another bug, POPFile seems to top out at 8 simultaneous connections. I have 10 POP accounts in three of Outlook's "Send/Receive Groups." They have staggered times for checking mail but every so often they all overlap...

11:48:27 AM | Comments: | Topics: bayesian spam 

Run D.M.C.

Anil Dash:

You can't miss the music. Check out the current Top 5 on the Billboard charts, you'll find Missy Elliot's "Work It". The last minute of the song is a straight lift from Run-DMC's "Peter Piper". The incredible breakdown to Bob James' Mardi Gras, which Jay cut up for the song, is still so purely grooving and ass-moving that it can top the charts a decade and a half later. Chuck D said it best years ago in one of his rhymes, "Run-DMC first said a DJ could be a band." The "band" behind Run-DMC is still echoing out of people's rides in Queens today. There's no better legacy.

I came across a Run D.M.C. timeline. In spite of being the first rap artists to go gold, platinum, and multi-platinum, they have no Grammy awards. Eminem has five.

11:20:22 AM | Comments:

Permanent Link Thursday, October 31, 2002

Jam Master Jay, RIP

I'm blacking out for the rest of the day in remembrance of Jason Mizell, aka Jam Master Jay.

Jam Master Jay, the D.J. who provided beats and scratches to the rap group Run-DMC's groundbreaking records, was shot and killed in a recording studio in Queens on Wednesday night. [...]

For most of its history, rap has been criticized for promoting violence, and several rappers who sang the praises of the gangster life, including Tupac Shakur and the Notorious B.I.G., were murdered. But Run-DMC and Jam Master Jay, all middle-class natives of Hollis, Queens, a mile or so from where Mr. Mizell was shot, created rap with a social conscience, urging listeners (between boasts) to stay in school, fight prejudice and respect one another. [NYTimes via Adam Curry]

I saw Run D.M.C. in '93, part of the 19-Naughty-3 tour. One act cancelled at the last second and both of the headline acts stank, with Naughty by Nature practically boo'd off the stage. Run D.M.C. took up the slack, coming back for a second high-energy performance. They "rocked the house" for over two hours.

Run D.M.C. turned an otherwise miserable concert into one that I've always remembered fondly.

3:14:26 PM | Comments:

Airport security leads to topless checkpoint

A French tourist got so fed up with having her chest wanded by airport security in the USA that she took off her shirt and bra to demonstrate her bomb-and-boxcutter-free chestular region. The airport was closed for 10 minutes. Under the USAPATRIOT Act, she faces up to three years in jail. Link (German-English translation here: Link) [via Boing Boing Blog]

This would be funny if it weren't so friggin' sad. America has become such a fearful place that even a half-naked woman threatens us.

When I lived in Europe, I made a concious effort not to blend in. I was a proud American and rarely wanted to conceal that fact.

These days I dream of blending in somewhere and never looking back.

2:36:05 PM | Comments:

POPFile, Part II

I got POPFile working, at another user's suggestion I exported some mail folders as .csv text files instead of individual .msg binary files. So far I've put 5500 messages into the corpus, with about 5% being known spam and half of the remainder coming from 16 mailing lists.

I'm keeping all of my Outlook rules in place until I am confident in POPFile's classifications. I've added two rules for POPFile, for spam and a mailing list that I just joined. The new list is a good test of how quickly POPFile can be taught. The intial corpus was just 15 messages, so far it has correctly classified 3 out of 5 new messages.

One problem I see with teaching POPFile is that the web interface only allows for negative reinforcement, ie: this message is classified wrong, it should be this. For a small corpus, my gut feeling is that positive reinforcement would be more beneficial. There's probably a tipping point where that sort of feedback loop would have a negative affect on accuracy, but that is something for a math genius to figure out.

POPFile's author will be on TechTV today at 19:00 Eastern.

I wish I'd known about these types of filtering programs years ago. From 1998 until 2001, I would receive 10,000 messages on a normal day and several times that on really bad days. I needed over 100 Outlook rules to manage the chaos and focus my attention on the 5% that mattered to me. ifile was first released in late 1996.

1:16:30 PM | Comments: | Topics: bayesian spam 

POPFile

Decided to give POPFile a whirl tonight. Exported a few thousand messages from Outlook 2002, upgraded to the latest version of Perl, uninstalled SpamNet.

The program that builds the corpus doesn't seem to like me. It acts like it is importing my messages but the corpus file winds up empty. I suspect that it has problems with Outlook's binary message format, in spite of the documentation saying otherwise. Tried the latest version from CVS, no difference.

I filed a bug report, we'll see what happens...

1:24:00 AM | Comments: | Topics: bayesian spam 

Permanent Link Wednesday, October 30, 2002

I Have a (Backup) Dream

A few months ago I wrote about the pains of backing up large drives. I use a 60GB drive for backups of important files from my main 120 gigger, but I think that I'll outgrow this solution in 6 months. Fortunately I have a pair of 30 giggers lying around...

Looking at the files I am backing up, well over 90% of the space used is static -- changes are rare, additions are infrequent. I need a long-term archiving solution. Burning those files to CD isn't very appealing, I would need about 100 of them (I'd want two copies of everything because I have little faith in CDRs for long-term storage). DVDs would be more practical, I could probably find a Firewire burner to borrow...

What I'd really like is a hybrid online backup service. My upstream bandwidth is about 8KB/s on a good day, doing an initial backup of this data over the Internet would take an insane amount of time. NetFlix has the right idea for moving large quantities of data around: the US Postal Service. Send me a Firewire/USB drive for that initial backup, use the Internet for incrementals. Archive my static data to tape and warehouse it somewhere -- if my system crashes I won't mind it taking some time to retrieve that data, so long as I get it back eventually. Keep my last incremental online and recent ones near-line, that's the stuff that I'll want back quickly.

I've got no idea if such a service could be made affordable for consumers, but it would certainly be more useful than a purely Internet-based backup service.

11:46:05 AM | Comments: | Topics: storage 

Client-side Spam Filtering

I've always wondered why client-side spam filters for Windows are designed to work only with certain mail clients. SpamNet and Spam Assasin Pro only work with Outlook 2000+, SpamNix for Eudora 3+, etc... These tools could reach a wider audience if they were built as generic POP/IMAP proxies.

Open Source to the rescue. POPFile is a POP3 proxy that uses "Naive Bayes" for classification, written in Perl but geared for Windows users. Pop3proxy and IMAPAssasin use the Spam Assasin engine.

10:43:39 AM | Comments: | Topics: bayesian spam 


© Copyright 2003 T Bryce Yehl Click here to send an email to the editor of this weblog.
Last update: 6/29/2003; 9:31:53 PM.
the