I’m back at work after my paternity leave.
Apparently my team found a suitable replacement for me while I was out.
In 2002, Yahoo selected PHP for Web site development and began to phase out its own proprietary server-side scripting language. Three years later, Michael Radwin reflects on how the switch to PHP offered both technical challenges and productivity increases.
The first part of the presentation offers a look inside Yahoo’s decision-making process to adopt an open-source scripting language. Radwin addresses why Yahoo selected PHP over other languages, focusing on the performance and stability required to serve billions of page views a day.
In the second part, Radwin discusses Yahoo’s PHP development methodology, which has enabled its engineers to rapidly implement features while still creating software that is maintainable over long periods of time.
The Burton Group published a report entitled The P-Languages: PHP, Perl, and Python for Enterprise Scripting yesterday. I’m quoted twice in the article.
Page 11 (PHP in Web Development):
Not only is PHP used extensively throughout the Web, it is also used by some of the busiest websites in the world. For example Yahoo!, which serves up 2.85 billion page views a day and supports 345 million visitors a month, uses PHP for all its presentation logic. For Yahoo!, searching and delivering web content quickly is a mission-critical issue, as is the ability to quickly add new features and maintain existing code. According to Michael Radwin, engineering manager in the Infrastructure Group for Yahoo!, “All of our presentation logic is in PHP. We avoid putting presentation logic in C/C++ because of the longer code-compile-debug cycle.” Other busy websites that use PHP include the social networking site Friendster (http://www.friendster.com/), which switched from JSP to PHP in 2004, and Freshmeat.org, an open source resource site that uses PHP to process between 600,000 and 700,000 page views a day.
Page 12 (Perl in System Administration and Integration):
Burton Group found that Perl, more than any other language, is heavily used in UNIX and Linux system administration. Ford Motor Company, for example, has been using Perl with their UNIX systems in this capacity for years. In fact, it would be difficult today to find an organization that has a number of UNIX boxes that do not use Perl in some capacity. Michael Radwin of Yahoo! told Burton Group: “We use Perl all the time here for almost everything that’s not web-related and not super performance-related. It’s a superb general-purpose scripting language. We use it for all of the typical uses (text processing, system administration, algorithmic prototyping, automation, light data crunching, report generation).” Yahoo! owns 90 web properties (Yahoo! Mail, Yahoo! Store, etc.) and supports 345 million visitors per month.
Those Yahoo! statistics (pageviews, visitors per month) are from December 2004.
The 10-year anniversary of Yahoo! Inc’s incorporation is this week. We’re having a party at work on Wednesday to celebrate. They put a big tent up on campus this morning. Apparently Sugar Ray is going be giving a private concert, and the weather has been threatening rain. Rumor has it that the www.yahoo.com site will have a special look on the anniversary. In the meantime, you can read about the company’s history.
I heard another rumor (so far unsubstantiated) that as part of the 10-year celebration, the company would be offering sabbaticals for long-time employees. SGI, for example, offers 6-week paid sabbatical every 4 years. Alas, we don’t have that perk (although we do have three espresso bars staffed with full-time baristas).
I’ve been at Yahoo! for more than half of its 10 years. I’ve often dreamed of taking a short break to try something else for a change of pace. If the sabbatical rumor proves to be true, I’d be sure to use mine for a semester at the Jerusalem School of Kosher Culinary Arts.
Yahoo == Yet Another Hierarchical Officious Oracle? I don’t know.
I’m in Beijing and checked into the hotel. Extremely nice place. Complimentary broadband Internet.
Air China flight 984 from LAX ended up being delayed 12 hours due to mechanical problems. It’s a real bummer, since I was supposed to give a “Platform Overview” talk at the conference this morning, but I missed the entire first day. I’ve been rescheduled to speak tomorrow morning.
Ironically, as I cleared immigration, I heard over the loudspeaker that United Airlines flight 889 from SFO had just arrived.
My flight to China was delayed 10 hours due to mechanical trouble with the plane. We’re supposedly leaving at 12 noon today, but who knows if that will really happen.
I’m off to Beijing tonight to visit the Yahoo! China office.
There has been much discussion about open e-mail relays, but very little about open HTTP redirectors. An open redirector is hosted by foo.com, but will unintentionally send you to bar.com. This can have interesting effects on PageRank or can trick users into clicking on something that isn’t what it seems.
After many months of abuse by spammers, the rd.yahoo.com redirect server is now closed.
Yahoo! has used a redirect server for a long time for tracking clicks from one Yahoo! website to another.
Last year, spammers started using rd.yahoo.com in email messages to trick unsuspecting users into thinking that they were clicking on a Yahoo! website. They started sending out emails with links that looked like this:
Users saw the yahoo.com domain name and figured it must be some official Yahoo! site, not realizing that the server would redirect to another IP address. So we started blocking those types of URLs (easy to do since we’d never use a dotted-quad for anything legit). So the spammers switched to something a little more clever:
The trick here was a misuse of the clear-text “username:password@server” authentication feature. It made it look like you were accessing a yahoo.com URL, but in fact were going somewhere else. These were particularly insidious, since they didn’t even go through our redirect servers, so there was nothing we could do to block them. Microsoft got rid of the problem for us with an update to Internet Explorer 5 and 6 that simply disabled the feature altogether. Mozilla followed suit by displaying a warning dialog box when this type of URL is used:
You are about to log into the site “18.104.22.168″ with the username “finance.yahoo.com,” but the website does not require authentication. This may be an attempt to trick you.
Is “22.214.171.124″ the site you want to visit?
So the spammers went back to abusing Yahoo!, but this time with actual hostnames:
This not only tricks email users, but when used on the web can (in theory) also influence PageRank-type algorithms.
We had no choice but to either maintain a whitelist (lots of server-side state to manage) or implement a digital signature algorithm. We went with the digital signature approach. So now you can safely click through to partner sites:
But if you try to recycle the same signature with a different URL, you’ll get a 403 Forbidden:
Finally, rd.yahoo.com does what it’s supposed to do and nothing else. Frustrated spammers out there have probably already started to abuse someone else.