OSCON 2003 registration

hornbill.gif It looks like the O’Reilly folks have finally posted the abstract for my One Year of PHP at Yahoo! talk I’ll be giving this summer in Portland, Oregon.

I filled out the speaker registration page today and picked some tutorials to attend. Here’s what I’ll be going to:


- Tutorial

Session ID: 3959

Title: Introduction to XSLT

Date: 07/07/2003

Time: 8:45am to 12:15pm

Location: Columbia

- Tutorial

Session ID: 4149

Title: Designing and Creating Great Shared Libraries

Date: 07/07/2003

Time: 1:45pm to 5:15pm

Location: Willamette

- Tutorial

Session ID: 3982

Title: Building Data Warehouses with MySQL

Date: 07/08/2003

Time: 8:45am to 12:15pm

Location: Salon H

On Monday afternoon I’ll probably bounce back and forth between Theodore Ts’o’s “Designing and Creating Great Shared Libraries” and Bradley M. Kuhn’s “The GNU General Public License for Developers and Businesspeople.”

Instead of registering for something on Tuesday afternoon, I think I’ll explore Portland. I’ve never been there before.

Early Bird registration is now open (through May 23rd) at http://conferences.oreillynet.com/os2003/

Upgrade my servers? Yeah, right.

In software engineering, laziness is a positive attribute. If one can accomplish the same task in 3 lines of code instead of 30, a good engineer opts for the 3-line version. That’s why libraries of code are so popular.

Engineers are also risk-averse. Every change you make to the system can possibly de-stabilize it, so engineers like to leave a running system alone. Fred Brooks writes in The Mythical Man-Month that every change has about a 50% chance of introducing a new bug. Two steps forward, one step backwards.

But laziness and risk-aversion can be really negative attributes. How can you ever make any progress if you never touch the system? What if WordPerfect 5.1 was still the state of the art in 2003? We’d be missing out on a decade of improvements like WYSIWYG.

Consider the hypothetical case of the guy who’s trying to get the other 599 engineers at the company to upgrade their web servers to version N, when the vast majority of folks are still running version M.

If I’m happily running version M, what’s my incentive to upgrade? Sure, the guy who maintains the web server says it’s got some great new features, is faster, gives you some better management tools, and fixes a couple of bugs. But I don’t have time to skim the README to see if any of those features would be useful to me. Version M seems just fine to me, and something could go wrong if I go to version N.

Most importantly, senior management does not require that I pay any attention to the guy who maintains the web server. Even if I procmail all of the web server guy’s messages into /dev/null, I can still get a good review at the end of the year just for keeping my crappy property up and running.

The bummer for the guy who works on the web server is that he also happens to be one of the folks who spent the past 2 years trying to improve development process at the company. He helped build a software package-management tool that can tell you in near-realtime what versions of what software are installed on what servers. And when he checks the stats, he finds out that a lot of folks are running really old versions of the web server: versions J, K, and L. Getting people to upgrade to version N is going to be even more difficult.

Maybe this explains why most of his co-workers are still running Netscape 4.08.

I am a grad-school dropout

ucla_seal_color.gif This makes it official. Today is the first day of the Winter 2003 quarter at UCLA, and I’m not enrolled in any classes. My short career as a part-time graduate student has come to an end.

I enrolled in the MSCS program at UCLA last year in part because I was hoping to round out my undergraduate education. I actually even considered doing a PhD, but I couldn’t really make up my mind as to whether I was more interested in artificial intelligence or computational theory. (I figured that if I was going to throw myself into a 5- or 6-year program, I should have a much stronger sense of what I wanted to research.)

Instead of rounding out my education, it felt more like I was re-hashing the same stuff I learned as an undergrad. Don’t get me wrong; UCLA’s Computer Science faculty is superb, and the department and university have some really good resources. It’s just that after working for 5 years in the industry, academia seemed to me like it was dealing with rather marginal problems.

Perhaps I didn’t give it my best effort. I was only in the program part time (I was too chicken to give up my full time job) and maybe if I had taken more classes and devoted more energy to the program I would’ve gotten more out of it.

Maybe doing a PhD would’ve been a better choice. A Masters degree wouldn’t have gotten me a significantly higher salary or qualified me do more innovative research. The best I could’ve gotten out of it was the ability to teach CS at the community college level.

Or, perhaps I got such a fantastic education at Brown that I don’t need me no mo’ learnin’. 😉

It’s hard to say why it didn’t work out. Apparently, I’m feeling a little melancholy about the whole thing.

Udi Manber: The First 10 Years on the Web

Introduction to Algorithms: A Creative Approach Udi Manber gave the first talk of this year’s Jon Postel Distinguished Lecture Series today at UCLA.

It seems fitting that I should have a link to Udi’s book on Amazon.com at the beginning of my review of his talk; he started working for Amazon just about a month ago.

While a handful of professors and grad students scrambled around trying to get the laptop to work correctly with the LCD projector, Udi spoke a bit about his personal history as the Web developed. He mentioned his contributions to the field, including suffix arrays (1989), agrep (1991), glimpse (1992), and even the web’s first screen scraper (1996).

What makes the web so fundamentally new and exciting

When Udi returned from a sabbatical in 1993, he was very excited about how the web was going to change everything. His colleagues cautioned him, “But there’s nothing new in the Web. We’ve done it all before. The web is just databases, networks and information retrieval all over again.” He acknowledged that his peers were correct in some respects, but scale is what makes the web fundamentally new: the sheer number of users, and the amount of content. He also related the importance of the ubiquity of the web with the advent of television:

  • TV didn’t invent storytelling

  • TV didn’t invent motion pictures
  • TV didn’t invent actors
  • It wasn’t even in color
  • But it’s in everyone’s home!

Because everything on the web is traceable, Udi feels that data available to websites also allows for companies to create a fundamentally different experience:

  1. More data == better experience. For example, an Amazon.com product detail page shows not only the price of a product, but also related items based on what customers bought, editorial and user-generated reviews, and sometimes even scanned excerpts from a book.

  2. Instant data == instant QA. Companies get instant feedback from users both in the form of emails and also what customers do and don’t click on or buy. Any problems with the software are noticed quickly, are solved faster, and a company is able to lose less money.
  3. Flexible data == better business decisions. By running controlled experiments (Amazon calls them A/B tests) the company can decide whether a new feature should be placed on the left or right side of the page, or whether the color should be blue or green. Almost every new feature is first tested by showing it to a random sampling of the user base to see how they react to it. It’s really easy to see after some small amount of testing if something is going to make more money or improve the user experience.

Udi gave an example of a feature that the company tested. When you’re about to purchase a product, they look through all of your past purchases to see if you’ve already bought that item from Amazon.com. If you have, they pop up a big red warning telling you that you might be buying a duplicate item. There are some legit reasons why you’d want a duplicate; maybe you lost the item, or maybe it’s going to be a gift. But many times it turns out that people put a CD in their shopping cart simply because they forgot that they already own it. So Amazon developed this feature out and tested it out.

Sure enough, it decreased sales, because much of the time the consumer didn’t need a duplicate. But Amazon decided to adopt the feature anyways! Even though it meant less revenue in the short term, the better user experience by not having to return an item (hopefully) translates to increased customer loyalty and therefore more long-term revenue.

Search

Udi spent a bit of time talking about the importance of Search. He described what he sees as 4 generations of web search:

  • 1992-1993: index data from selected sites (Harvest, archie)

  • 1995: collect data from the entire web (Lycos, AltaVista, InfoSeek, Inktomi)
  • 1998: it’s all about relevancy, stupid! (Google)
  • 2001: it’s all about monetization, stupid! (Overture)
  • and the next generation of Web Search is yet to come

What is missing from Search today? Udi pointed out a bunch of problems waiting to be solved:

  • Understanding the query (these days we’re still treating search queries as strings of characters)

  • Understanding the users
  • Personalization (instead of today’s “democratic” search engines which show everyone the same results for a particular query, should we customize the search results based on what we know about the user?)
  • Helping the user with query refinement
  • Better visualization of search results (something better than pages and pages of text, but also something easy enough for people to understand)
  • Anti-SPAM (there are hundreds of companies in the Search Engine Optimization business who are essentially spamming Google to improve rankings of particular sites.)

E-Commerce

Udi prefaced his comments about e-commerce by pointing out that “business” and merchants are hated in almost all cultures, yet somehow commerce/trading started as early as 4000 BC. Why? Because the alternative for acquiring goods is war and that doesn’t scale too well.

He spoke a bit about the beginnings of Amazon.com (Jeff Bezos’ garage) and showed the audience a screen shot of what Amazon’s home page looked like in 1995 complete with LOTS OF TEXT IN SMALL CAPS. We’ve come a long way, baby.

Udi then moved on to discuss in broad terms some of the problems involved in order fulfillment. Deciding what products to ship from what distribution centers and what to order from publishers or distributors involves all sorts of combinatorics and traveling salesman problems. He gave a particular hairy example of a Stochastic Linear Program used to optimize shipping of an order of just 2 books. Most of these problems are exponential in complexity, and the site has got only 500 milliseconds to make an intelligent decision so it can tell the user how much shipping is going to cost for their order.

Udi was hoping to talk about Security, too, but he ran short on time. Instead he took some Q&A from the audience. Many questions had to do with specifics about Amazon’s business and development culture, which Udi couldn’t really answer because he’s only been there a month. When asked about what he would change about academia given his experience in both worlds, he said he wanted to see more of a focus on solving real problems. Too many toy problems are given to students just for the sake of learning. As a result, academics don’t often understand the problems of real users. To help remedy this, he would be interested in providing academic institutions some of Amazon’s real data to use for teaching algorithms and modeling.

Lastly, Udi announced that he would be available on Friday morning at UCLA to speak to students about jobs at Amazon.com. I’m guessing that he’s building up a kewl R&D team and wants a crop of freshly minted PhDs.

Joe Andrieu: Carpe Diem or Caveat Emptor?

I’m off to UCLA to hear a lecture for my CS239 class. Here’s the abstract:

For the prepared and alert entrepreneur, “Opportunity knocks far more than once.” Indeed, as the subtitle implies, the challenge is to erecognize the right opportunity and then stay focused on it. Many factors can lure one into taking the wrong direction or scuttle seizing the right opportunity when it arises. Strong emotional attachments to effort already expended and the associated dream of the end game can be seductive, substituting wishful thinking for sober analysis. Pressure from investors anxious to cash out or founders coveting the image of an IPO may overcome a less sexy but correct private sale alternative. Distinguishing reality from subtly masked fantasy may well be the keystone of leadership talent. Having been through multiple ventures during the last 10 years, today’s speaker will illuminate the desiderata and pitfalls attendant to deciding when to act and which course to take among competing alternatives.

Sounds like a cool lecture. Tomorrow, I’m going to the CS201 talk entitled The First Ten Years on the Web: A Personal Perspective by Udi Manber, Chief Scientist, Amazon.com.

Prime numbers

Last night on the plane ride home I was reading a copy of Dr. Dobb’s Journal, a magazine for programmers (I think I got a free subscription to this when I registered for PHPCon). I came across Michael Swaine’s column and read about a polynomial-time algorithm for testing primes that was discovered this summer.

I know this is old news (blogs are supposed to be up-to-the-minute, right?) but it’s still totally fascinating for anyone who understands how public-key cryptography works (I learned about it in Math 42 at Brown).

Although this algorithm is considered super-fast for what it does, it is actually slower than the ones used by RSA and PGP when generating new public/private key pairs. The difference is that this new algorithm has the distinct advantage of telling you definitively whether or not a number is prime. The more common apporach is to run a probabilistic algorithm so that there still exists a possibility that the number you’re evaluating is not prime, but it’s more likely that you’ll be struck by lightning. Although the traditional probablistic approach is still faster than this new technique, it doesn’t give you the 100% confidence that some people (bankers?) would rather have.

From a practical standpoint, 99.99999999999999% confidence ought to be enough for anyone, because insurance policies aren’t that expensive when you’re talking about covering something that has a 0.00000000000001% likelihood of happening.

But from a theorhetical computer science standpoint, this discovery is just plain cool. If they could only prove that P != NP, I’d be really psyched.