Michael J. Radwin's blog

Tales of a software engineer who keeps kosher and hates the web.


Computer Science

June 15, 2007

Software Engineer, Java - Click Fraud Prevention

Want to build something that hunts down the bad guys and puts 'em out of business? Got experience building complex systems in Java? Fraudwall Technologies has the job for you.

We're looking for engineers at all experience levels who want to help build a massive data processing and modeling pipeline, using cutting-edge machine learning and network forensics. You'll be writing code that will make real-time decisions to prevent click fraud, and there's going to be a fire hose of data coming at you.

This particular job comes with as much responsibility as you can handle. You won't just be writing code; you'll be doing design, architecture, implementation, testing, support, and more. Passion, talent, and raw brains are more important than tons of industry experience.

Required experience:

* 3-5 years of software development in Java (top-notch C++ and C# engineers can apply, too)
* Superb understanding of data structures and algorithms
* Effective communication skills: you'll have to be able to fluently communicate with modelers/analysts, business people, and other coders
* Experience with Unix/Linux, and relational databases such as MySQL or Oracle
* BS or MS in Computer Science or equivalent

Desirable experience:

* Machine learning, information retrieval, TCP/IP internals
* Java frameworks: Hibernate, Servlets, Jakarta Commons
* Proficiency with scripting languages such as Python or Perl

About the company:

Fraudwall Technologies provides advertising networks and advertisers with a pioneering solution for identifying click fraud. Fraudwall combines cutting edge science with the aggregation of data and characteristics from networks, search engines, and advertisers into one complete scalable solution.

Fraudwall values honesty and integrity in dealing with each other and with our partners and customers. We offer competitive salaries, 401K, stock options, and health, dental, and vision plans. And of course, we provide an opportunity to work with world-class fraudfighters, systems builders, and serial entrepreneurs.

All positions are for our office in Palo Alto, California.

Send your resume to michael.radwin@fraudwall.net

Posted by mradwin at 03:08 PM

June 09, 2004

Senior C++ Windows hackers wanted in LA

A colleague of mine is looking to hire a couple of hard-core C++ hackers for two jobs in Santa Monica. Unlike most Y! jobs that want folks with lots of Unix experience, these ones are all about Win32 development.

If that's you, send me a resume and I'll pass it along.

Posted by mradwin at 05:35 PM | Comments (0)

January 14, 2004

Threads considered harmful

In the past month I've seen at least 3 messages on the development email lists at work asking questions about developing multi-threaded applications. From a software engineering standpoint, this troubles me.

I've always thought that multi-threaded apps in C/C++ are simply too difficult for most engineers to understand. There's too much non-determinism, too many race conditions, and too few language-level constructs to keep yourself from screwing up.

This isn't to say that some engineers can't figure it out, it's just that most engineers can't. I'll borrow a diagram from Ousterhout to illustrate this point:

What's Wrong With Threads?
John Ousterhout, Why Threads Are a Bad Idea (for most purposes), 1996. PDF slides from USENIX 1996 talk (local mirror).

I've been reading The Art of UNIX Programming by Eric Raymond over the past few weeks and it appears that he agrees with me. He avoids the Dijkstra-esque pun on threads being harmful and instead perfers the equally-provoking title Threads -- Threat or Menace?

My attitude about threads Java is different because the language has supported the concept of threads since day one. It's still tricky to do threads correctly in Java, but not as painful as it is in C++.

Posted by mradwin at 03:16 PM | Comments (3)

August 12, 2003

XML for Makefiles?

ant.jpg XML hasn't cured our ills or saved the world, but people keep using it for absurd purposes anyways. I finally took a quick look at Apache Ant today to see what all the fuss is about. Apparently with some additional components you can actually get Ant to build C/C++ code.

However, compare this build.xml for Ant:

<?xml version="1.0"?>
<project name="Hello" default="hello" basedir=".">
 <taskdef resource="cpptasks.tasks"/>
 <taskdef resource="cpptasks.types"/>
 <target name="hello">
  <cc name="gcc" outfile="hello">
   <fileset dir="." includes="hello.c"/>
   <compilerarg value="-O2"/>
  </cc>
 </target>
</project>

with this Makefile for gmake:

hello: hello.c
	gcc -O2 $< -o $@

I think I'll stick with gmake for now.

Posted by mradwin at 09:43 AM | Comments (4)

April 28, 2003

How to Be a Programmer

I stumbled across How to Be a Programmer, a 40-page paper by Robert L. Read, a principal engineer at Hire.com.

It's a relatively good paper so I'd recommend it to anyone who's new to the field or is a college student considering a career in Software Engineering. The distinction between Computer Science and Software Engineering, while subtle, is an important one. This paper focuses more on the Software Engineering side of things, spending a good 50% of the time discussing interpersonal skills and how to be effective working with your team.

The paper does need some polishing, however. A simple grammar checker would catch a bunch of the mistakes that interrupt the flow.

This reminds me a little bit of a great lecture I heard by Leslie Pack Kaelbling back in 1996 about why she loves programming. Like Read, Kaelbling belives that debugging is the most important part of programming, but she spins it slightly differently.

In short, debugging is like detective work. You've got a problem that you need to solve, but it's not obvious what the solution is. There are little hints here and there, and you begin to investigate each one. Each clue brings you closer and closer to the solution, but sometimes you realize that you just spent the last 6 hours going down a path that led nowhere, and you need to start over again. But at each moment, you always feel like you're making forward progress.

As a consequence, debugging becomes an all-engrossing activity. It's impossible to walk away from your desk when you're just 5 minutes away from solving the mystery and fixing the bug! Of course, 20 minutes later, you still feel like you'll get it nailed in another five.

Posted by mradwin at 05:10 PM | Comments (0)

April 10, 2003

MySQL Users Conference 2003

mysql.png The MySQL Users Conference 2003 is running from April 10 - 12 in San Jose, CA. I was nearby in Sunnyvale for work on Tuesday & Wednesday this week, so I stuck around a day longer than my usual LAX-SJC travel schedule to catch the beginning of the conference.

Thanks to Zak for all of his hard work organizing the show. The first day was great; I'm sorry I'll be missing the rest of it.

P4100101.JPG

The State of the Dolphin Address

David Axmark and and Monty Widenius, creators of MySQL (and co-founders of MySQL AB) kicked off the event with "The State of the Dolphin Address."

The first 15 minutes of the presentation was all bragging -- they listed off some big customers (such as Yahoo! and Slashdot), awards they had won, and some notable events in the lifetime of the product and company. Axmark takes great pride in the fact that Oracle introduced a MySQL migration kit in 2001.

Speaking a little bit about MySQL AB, Axmark indicated that they now have 12 full time engineers working on the server, and dozens of customer support folks. They've been making money via commercial licenses (for companies that don't want to GPL their code), and also from selling support, training, certification and consulting. The recently-introduced MySQL Certification program costs $200 (with a $50 discount until this fall).

As a product, MySQL has a variety of features. Aside from supporting "an extended subset" of the ANSI SQL89 standard, they support ACID transactions, User Defined Functions (unfortunately not the same thing as Stored Procedures), and a handful of SQL extensions (such as SELECT ... LIMIT). Client interfaces are available in over a dozen programming languages and operating systems.

It also provides about 5 different storage engines (MyISAM, InnoDB, Hash/InMemory, BerkeleyDB, etc.) which allow different tradeoffs depending on the application needs. For example, if you need fast row-level locking, you should pick the InnoDB, and recoginize that there will be some extra overhead on inserts.

Axmark also bragged a bit about the eWeek benchmarking tests which compared MySQL, Oracle9i, and a handful of other relational databases using JDBC drivers in a web server environment on Microsoft Windows. The MySQL performance curve (in terms of web pages per second and latency) matched Oracle's and outperformed all others.

Lastly, the two co-founders gave a high-level overview of the various server versions (3.23, 4.0, 4.1, 5.0) and some new interesting features coming soon.

P4100100.JPG

Schmoozing

After the keynote, I grabbed coffee and a pastry and chatted a bit in the hallway with Rasmus and Zak. Zak introduced me to Sascha (the one from Utah) and Monty. No business cards, just a few handshakes.

Someone (not the person in the picture) asked Rasmus a question about using the PHP mail() function to send hundreds of thousands of messages.

I was tickled to see Brad from Zend; I saw him in Israel just a couple of weeks earlier.

P4100106.JPG

Using MySQL Replication in Large Scales

I stepped into Jeremy's standing-room only talk on "Using MySQL Replication in Large Scales."

Being a MySQL novice, I didn't understand much of the talk. It's always a neat experience to surround yourself in a technical environment where everyone around you knows more than you do. A good way to pick up a bunch of ideas. There were ton of questions posed by the audience during the talk; it's rare to see this high of a level of interaction with an audience this large.

Aesthetic note: Jeremy finally switched his slide colors from white-on-blue to the more boring (but easy to read) black-on-white.

Lunch

Lunch was pretty good. Lots of vegetarian options. I sat at a table full of Yahoos and Brian Aker. It started to rain, so we all scrambled inside. We went to hear the talk about Lufthansa Systems porting MySQL to NetWare. Novell is desperate to remain relevant, and it looks like they're trying to embrace Open Source as a way to stay alive.

P4100099.JPG

A Guided Tour of the MySQL Source Code

Monty and Zak's talk on A Guided Tour of the MySQL Source Code was a great introduction to a codebase I've never read before. The 5.0 source code became available via BitKeeper just a few days ago.

Unfortunately, the talk was plagued by technical difficulties. The LCD projector just wouldn't cooperate with the laptop. Zak had a copy of the presentation on a floppy disk, but nobody else in the room had a laptop that could read it. Bummer.

Posted by mradwin at 10:21 PM | Comments (5)

April 03, 2003

OSCON 2003 registration

hornbill.gif It looks like the O'Reilly folks have finally posted the abstract for my One Year of PHP at Yahoo! talk I'll be giving this summer in Portland, Oregon.

I filled out the speaker registration page today and picked some tutorials to attend. Here's what I'll be going to:

- Tutorial
  Session ID: 3959
  Title: Introduction to XSLT
  Date: 07/07/2003
  Time: 8:45am to 12:15pm
  Location: Columbia

- Tutorial
  Session ID: 4149
  Title: Designing and Creating Great Shared Libraries
  Date: 07/07/2003
  Time: 1:45pm to 5:15pm
  Location: Willamette

- Tutorial
  Session ID: 3982
  Title: Building Data Warehouses with MySQL
  Date: 07/08/2003
  Time: 8:45am to 12:15pm
  Location: Salon H

On Monday afternoon I'll probably bounce back and forth between Theodore Ts'o's "Designing and Creating Great Shared Libraries" and Bradley M. Kuhn's "The GNU General Public License for Developers and Businesspeople."

Instead of registering for something on Tuesday afternoon, I think I'll explore Portland. I've never been there before.

Early Bird registration is now open (through May 23rd) at http://conferences.oreillynet.com/os2003/

Posted by mradwin at 04:03 PM | Comments (0)

January 27, 2003

Upgrade my servers? Yeah, right.

In software engineering, laziness is a positive attribute. If one can accomplish the same task in 3 lines of code instead of 30, a good engineer opts for the 3-line version. That's why libraries of code are so popular.

Engineers are also risk-averse. Every change you make to the system can possibly de-stabilize it, so engineers like to leave a running system alone. Fred Brooks writes in The Mythical Man-Month that every change has about a 50% chance of introducing a new bug. Two steps forward, one step backwards.

But laziness and risk-aversion can be really negative attributes. How can you ever make any progress if you never touch the system? What if WordPerfect 5.1 was still the state of the art in 2003? We'd be missing out on a decade of improvements like WYSIWYG.

Consider the hypothetical case of the guy who's trying to get the other 599 engineers at the company to upgrade their web servers to version N, when the vast majority of folks are still running version M.

If I'm happily running version M, what's my incentive to upgrade? Sure, the guy who maintains the web server says it's got some great new features, is faster, gives you some better management tools, and fixes a couple of bugs. But I don't have time to skim the README to see if any of those features would be useful to me. Version M seems just fine to me, and something could go wrong if I go to version N.

Most importantly, senior management does not require that I pay any attention to the guy who maintains the web server. Even if I procmail all of the web server guy's messages into /dev/null, I can still get a good review at the end of the year just for keeping my crappy property up and running.

The bummer for the guy who works on the web server is that he also happens to be one of the folks who spent the past 2 years trying to improve development process at the company. He helped build a software package-management tool that can tell you in near-realtime what versions of what software are installed on what servers. And when he checks the stats, he finds out that a lot of folks are running really old versions of the web server: versions J, K, and L. Getting people to upgrade to version N is going to be even more difficult.

Maybe this explains why most of his co-workers are still running Netscape 4.08.

Posted by mradwin at 03:02 PM | Comments (4)

January 15, 2003

Rachel's a hacker

peruvian-sm.jpg My friend Rachel who likes rabbits, always wears red, and talks about weird diseases has become a hacker. She's sportin' some slick CSS on her blog and kickin' around some phat SAS.

Welcome to the club, Rach. It won't be long before you're coding PHP like the rest of us.

Posted by mradwin at 05:36 PM | Comments (0)

January 06, 2003

I am a grad-school dropout

ucla_seal_color.gif This makes it official. Today is the first day of the Winter 2003 quarter at UCLA, and I'm not enrolled in any classes. My short career as a part-time graduate student has come to an end.

I enrolled in the MSCS program at UCLA last year in part because I was hoping to round out my undergraduate education. I actually even considered doing a PhD, but I couldn't really make up my mind as to whether I was more interested in artificial intelligence or computational theory. (I figured that if I was going to throw myself into a 5- or 6-year program, I should have a much stronger sense of what I wanted to research.)

Instead of rounding out my education, it felt more like I was re-hashing the same stuff I learned as an undergrad. Don't get me wrong; UCLA's Computer Science faculty is superb, and the department and university have some really good resources. It's just that after working for 5 years in the industry, academia seemed to me like it was dealing with rather marginal problems.

Perhaps I didn't give it my best effort. I was only in the program part time (I was too chicken to give up my full time job) and maybe if I had taken more classes and devoted more energy to the program I would've gotten more out of it.

Maybe doing a PhD would've been a better choice. A Masters degree wouldn't have gotten me a significantly higher salary or qualified me do more innovative research. The best I could've gotten out of it was the ability to teach CS at the community college level.

Or, perhaps I got such a fantastic education at Brown that I don't need me no mo' learnin'. ;-)

It's hard to say why it didn't work out. Apparently, I'm feeling a little melancholy about the whole thing.

Posted by mradwin at 01:01 PM | Comments (1)

December 11, 2002

Turing Tests for Humans

gimpy-severe.jpg Article about Udi Manber in New York Times Science Section:

Human or Computer? Take This Test. As chief scientist of the Internet portal Yahoo, Dr. Udi Manber had a problem: how to differentiate human intelligence from that of a machine. By Sara Robinson.

The guy is brilliant. Amazon.com is lucky to have him. Yahoo! was pretty stupid to let him leave.

Posted by mradwin at 10:55 AM | Comments (2)

December 05, 2002

Udi Manber: The First 10 Years on the Web

Introduction to Algorithms: A Creative Approach Udi Manber gave the first talk of this year's Jon Postel Distinguished Lecture Series today at UCLA.

It seems fitting that I should have a link to Udi's book on Amazon.com at the beginning of my review of his talk; he started working for Amazon just about a month ago.

While a handful of professors and grad students scrambled around trying to get the laptop to work correctly with the LCD projector, Udi spoke a bit about his personal history as the Web developed. He mentioned his contributions to the field, including suffix arrays (1989), agrep (1991), glimpse (1992), and even the web's first screen scraper (1996).

What makes the web so fundamentally new and exciting

When Udi returned from a sabbatical in 1993, he was very excited about how the web was going to change everything. His colleagues cautioned him, "But there's nothing new in the Web. We've done it all before. The web is just databases, networks and information retrieval all over again." He acknowledged that his peers were correct in some respects, but scale is what makes the web fundamentally new: the sheer number of users, and the amount of content. He also related the importance of the ubiquity of the web with the advent of television:
  • TV didn't invent storytelling
  • TV didn't invent motion pictures
  • TV didn't invent actors
  • It wasn't even in color
  • But it's in everyone's home!

Because everything on the web is traceable, Udi feels that data available to websites also allows for companies to create a fundamentally different experience:

  1. More data == better experience. For example, an Amazon.com product detail page shows not only the price of a product, but also related items based on what customers bought, editorial and user-generated reviews, and sometimes even scanned excerpts from a book.
  2. Instant data == instant QA. Companies get instant feedback from users both in the form of emails and also what customers do and don't click on or buy. Any problems with the software are noticed quickly, are solved faster, and a company is able to lose less money.
  3. Flexible data == better business decisions. By running controlled experiments (Amazon calls them A/B tests) the company can decide whether a new feature should be placed on the left or right side of the page, or whether the color should be blue or green. Almost every new feature is first tested by showing it to a random sampling of the user base to see how they react to it. It's really easy to see after some small amount of testing if something is going to make more money or improve the user experience.

Udi gave an example of a feature that the company tested. When you're about to purchase a product, they look through all of your past purchases to see if you've already bought that item from Amazon.com. If you have, they pop up a big red warning telling you that you might be buying a duplicate item. There are some legit reasons why you'd want a duplicate; maybe you lost the item, or maybe it's going to be a gift. But many times it turns out that people put a CD in their shopping cart simply because they forgot that they already own it. So Amazon developed this feature out and tested it out.

Sure enough, it decreased sales, because much of the time the consumer didn't need a duplicate. But Amazon decided to adopt the feature anyways! Even though it meant less revenue in the short term, the better user experience by not having to return an item (hopefully) translates to increased customer loyalty and therefore more long-term revenue.

Search

Udi spent a bit of time talking about the importance of Search. He described what he sees as 4 generations of web search:
  • 1992-1993: index data from selected sites (Harvest, archie)
  • 1995: collect data from the entire web (Lycos, AltaVista, InfoSeek, Inktomi)
  • 1998: it's all about relevancy, stupid! (Google)
  • 2001: it's all about monetization, stupid! (Overture)
  • and the next generation of Web Search is yet to come

What is missing from Search today? Udi pointed out a bunch of problems waiting to be solved:

  • Understanding the query (these days we're still treating search queries as strings of characters)
  • Understanding the users
  • Personalization (instead of today's "democratic" search engines which show everyone the same results for a particular query, should we customize the search results based on what we know about the user?)
  • Helping the user with query refinement
  • Better visualization of search results (something better than pages and pages of text, but also something easy enough for people to understand)
  • Anti-SPAM (there are hundreds of companies in the Search Engine Optimization business who are essentially spamming Google to improve rankings of particular sites.)

E-Commerce

Udi prefaced his comments about e-commerce by pointing out that "business" and merchants are hated in almost all cultures, yet somehow commerce/trading started as early as 4000 BC. Why? Because the alternative for acquiring goods is war and that doesn't scale too well.

He spoke a bit about the beginnings of Amazon.com (Jeff Bezos' garage) and showed the audience a screen shot of what Amazon's home page looked like in 1995 complete with LOTS OF TEXT IN SMALL CAPS. We've come a long way, baby.

Udi then moved on to discuss in broad terms some of the problems involved in order fulfillment. Deciding what products to ship from what distribution centers and what to order from publishers or distributors involves all sorts of combinatorics and traveling salesman problems. He gave a particular hairy example of a Stochastic Linear Program used to optimize shipping of an order of just 2 books. Most of these problems are exponential in complexity, and the site has got only 500 milliseconds to make an intelligent decision so it can tell the user how much shipping is going to cost for their order.

Udi was hoping to talk about Security, too, but he ran short on time. Instead he took some Q&A from the audience. Many questions had to do with specifics about Amazon's business and development culture, which Udi couldn't really answer because he's only been there a month. When asked about what he would change about academia given his experience in both worlds, he said he wanted to see more of a focus on solving real problems. Too many toy problems are given to students just for the sake of learning. As a result, academics don't often understand the problems of real users. To help remedy this, he would be interested in providing academic institutions some of Amazon's real data to use for teaching algorithms and modeling.

Lastly, Udi announced that he would be available on Friday morning at UCLA to speak to students about jobs at Amazon.com. I'm guessing that he's building up a kewl R&D team and wants a crop of freshly minted PhDs.

Posted by mradwin at 06:32 PM | Comments (0)

December 04, 2002

Joe Andrieu: Carpe Diem or Caveat Emptor?

I'm off to UCLA to hear a lecture for my CS239 class. Here's the abstract:

For the prepared and alert entrepreneur, "Opportunity knocks far more than once." Indeed, as the subtitle implies, the challenge is to erecognize the right opportunity and then stay focused on it. Many factors can lure one into taking the wrong direction or scuttle seizing the right opportunity when it arises. Strong emotional attachments to effort already expended and the associated dream of the end game can be seductive, substituting wishful thinking for sober analysis. Pressure from investors anxious to cash out or founders coveting the image of an IPO may overcome a less sexy but correct private sale alternative. Distinguishing reality from subtly masked fantasy may well be the keystone of leadership talent. Having been through multiple ventures during the last 10 years, today's speaker will illuminate the desiderata and pitfalls attendant to deciding when to act and which course to take among competing alternatives.

Sounds like a cool lecture. Tomorrow, I'm going to the CS201 talk entitled The First Ten Years on the Web: A Personal Perspective by Udi Manber, Chief Scientist, Amazon.com.

Posted by mradwin at 03:38 PM | Comments (2)

November 06, 2002

Prime numbers

Last night on the plane ride home I was reading a copy of Dr. Dobb's Journal, a magazine for programmers (I think I got a free subscription to this when I registered for PHPCon). I came across Michael Swaine's column and read about a polynomial-time algorithm for testing primes that was discovered this summer.

I know this is old news (blogs are supposed to be up-to-the-minute, right?) but it's still totally fascinating for anyone who understands how public-key cryptography works (I learned about it in Math 42 at Brown).

Although this algorithm is considered super-fast for what it does, it is actually slower than the ones used by RSA and PGP when generating new public/private key pairs. The difference is that this new algorithm has the distinct advantage of telling you definitively whether or not a number is prime. The more common apporach is to run a probabilistic algorithm so that there still exists a possibility that the number you're evaluating is not prime, but it's more likely that you'll be struck by lightning. Although the traditional probablistic approach is still faster than this new technique, it doesn't give you the 100% confidence that some people (bankers?) would rather have.

From a practical standpoint, 99.99999999999999% confidence ought to be enough for anyone, because insurance policies aren't that expensive when you're talking about covering something that has a 0.00000000000001% likelihood of happening.

But from a theorhetical computer science standpoint, this discovery is just plain cool. If they could only prove that P != NP, I'd be really psyched.

Posted by mradwin at 12:14 PM | Comments (0)

Copyright © 2007 Michael J. Radwin. Some rights reserved.