Michael J. Radwin's blog

Tales of a software engineer who keeps kosher and hates the web.


Open Source

March 28, 2007

MySQL User Defined Functions for FNV (Fowler/Noll/Vo) Hash

Sometimes you only need a 32- or 64-bit hash function. One of my favorites at Yahoo and something we're using at Fraudwall Technologies is the FNV (Fowler/Noll/Vo) Hash.

If you'd like to use FNV inside of MySQL, you might find our udf_fnv.c useful. For example:

mysql> select FNV1A_64('The quick brown fox jumps over the lazy dog.');
+----------------------------------------------------------+
| FNV1A_64('The quick brown fox jumps over the lazy dog.') |
+----------------------------------------------------------+
| 75c4d4d9092c6c5a                                         | 
+----------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select FNV1A_32('The quick brown fox jumps over the lazy dog.');
+----------------------------------------------------------+
| FNV1A_32('The quick brown fox jumps over the lazy dog.') |
+----------------------------------------------------------+
| ecaf981a                                                 | 
+----------------------------------------------------------+
1 row in set (0.00 sec)

mysql> 

The functions behave similarly to the MySQL built-ins MD5() and SHA1() in the sense that they return hex strings. The module defines 32- and 64-bit versions of all three variants of the FNV hash: FNV-0, FNV-1, and FNV-1a. Enjoy.

Posted by mradwin at 03:56 PM | Comments (0)

July 26, 2006

php.ini hacks: --with-config-file-scan-dir and ini variable expansion

I whipped up a quick 3-minute presentation entitled php.ini hacks for today's PHP Lightning Talks session at the O'Reilly Open Source Convention. It demonstrates two features:

  1. The --with-config-file-scan-dir option to ./configure
  2. ini variable expansion ("open_basedir = ${open_basedir}:/tmp")

Why? Because George and Laura asked me to, and this is all I could think of with 20 minutes notice. And because the ini variable expansion feature isn't documented anywhere on the php.net website except for a passing reference in the PHP 5 ChangeLog:

Added possibility to access INI variables from within .ini file. (Andrei)

Tomorrow I'll be giving a talk entitled Hacking Apache HTTP Server at Yahoo! It's a repeat performance of the well-attended presentation I gave at ApacheCon 2005.

Posted by mradwin at 06:12 PM | Comments (1)

March 22, 2006

Migrating MVHS Alumni Directory data from BerkleyDB to MySQL

MVHS Spartan I recently rewrote large parts of the MVHS Alumni Directory to use MySQL instead of BerkleyDB. I've been on paternity leave from Yahoo! for 7 weeks now, and this is one of the few projects on my todo list that I have actually completed.

I've been maintaining this list of alumni for over 10 years. It began as a bunch of Perl 4 scripts and a single text file (colon-delimited, a la /etc/passwd) back when I was an undergraduate in college, and has morphed over the years as I have moved from ISP to ISP.

I was forced to port it to Perl 5 at one point when one of my ISPs did an OS upgrade, and although I got it to work, there was no way I was going to go through the pain to make it use strict. Later, I rewrote all of the DBM access routines to use DB_File::Lock to avoid race conditions that occasionally corrupted the data.

At the end of last year, my ISP (DreamHost) upgraded their Linux distro from Perl 5.6 to Perl 5.8 and everthing broke again. Plus, the BerkleyDB file format on their new distro was incompatible with the old files, so I had to recreate the files from a text dump. I got it working again with a little hackery, but still wasn't ready to spend the time to dump BerkleyDB for MySQL.

Well, it's finally done. The only new functionality is an RSS feed for each graduating class. It was fun to do a little bit of hacking.

The new version is about 7,000 lines of code, and it's still very ugly, largely because I have tried to adhere to the Principle of Least Change, and I wasn't such a great coder back in 1995. Download it if you so desire; it is released under the BSD License. The README needs a little updating, but the Makefile should actually work.

Posted by mradwin at 08:26 PM | Comments (3)

August 04, 2005

OSCON 2005 presentation slides online

oscon_logo_2005.gif My OSCON presentation slides (It's Time to Share: Calendar Data Interchange and HTTP Caching and Cache-busting for Content Publishers) are now online.

Grab the slides from radwin.org/michael/talks.

Posted by mradwin at 10:19 AM | Comments (0)

July 29, 2004

OSCON 2004 Sessions

I haven't attended too many tutorials or sessions this year. Yesterday I saw Jim Winstead's Practical I18N with PHP and MySQL and David Sklar's Cleaning Up SOAP.

php-version5.gif Right now I'm sitting in Adam Trachtenberg's PHP 5 + MySQL 5 = A Perfect 10. He quipped that it really should've been called PHP 5 + MySQL 4.1 = A Perfect 9.1, but the O'Reilly folks didn't think the title was sexy enough.

Initially we looked at the mysqli ("MySQL Improved") extension which offers prepared statements, an Object-Oriented interface, and the ability to query the database over SSL.

mysql.png Next, Adam started speaking about new MySQL 4.1 features. He gave some tips on how to use the new subselect functionality, reminding the audience to think carefully about using = or IN if the subselect returns a single or multiple rows. Then he spoke about MySQL 5.0 features such as Stored Procedures, Cursors and Views.

Posted by mradwin at 02:22 PM | Comments (0)

July 28, 2004

HTTP Caching and Cache-busting for Content Publishers

oscon-logo.gif Slides are now online (HTML, PPT) for today's talk on HTTP Caching and Cache-busting for Content Publishers.

Abstract: A user's web experience can often be improved by the proper use of HTTP caches. Radwin discusses when to use and when to avoid caching, and how to employ cache-busting techniques most effectively. Radwin also explains the top 5 caching ad cache-busting techniques for large content publishers.

Posted by mradwin at 02:56 PM | Comments (2)

July 27, 2004

OSCON 2004

oscon-logo.gif I just arrived in Portland, Oregon. I'll be speaking about HTTP caching and cache-busting at the O'Reilly Open Source Convention tomorrow. If the talk goes well, I'll propose it for ApacheCon this fall.

The conference hotel was all booked up by the time I made my travel arrangements, so I'm staying at the closest available hotel (which is about a mile away). Not sure if there's something else going on here in Portland this week or if OSCON's attendance spiked this year.

Posted by mradwin at 04:52 PM | Comments (1)

May 12, 2004

stubgen 2.06

Today I released stubgen 2.06, the first release since 1998.

stubgen is a C++ development tool that keeps code files in sync with their associated headers. When it finds a member function declaration in a header file that doesn't have a corresponding implementation, it creates an empty skeleton with descriptive comment headers.

Last week Raphael Assenat sent me a message suggesting two new command-line flags to customize the output to his liking. He included very clean patch to implement the feature and his code worked perfectly. He even included manpage updates in his diff! This is exactly the way Open Source is supposed to work.

I took the opportunity to remove copies of getopt() and basename() that were bundled with the distribution since they're found in any modern libc. Doing so also let me change the license from GNU to BSD, since I no longer want to contribute to RMS's zealotry.

stubgen's parser does not conform to the latest C++ standard. It's a gigantic hack that I created when I was teaching myself lex/yacc. Hacking the yacc grammar further probably isn't a good idea, since C++ isn't an LALR(1) language anyways. It really oughta be rewritten to use a real C++ parser library.

Posted by mradwin at 05:13 PM | Comments (0)

July 23, 2003

The Cathedral, The Bazaar, and Apache

The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary A couple of weeks ago I read Eric Raymond's The Cathedral and The Bazaar, a collection of essays about Open Source software. Raymond writes quite well for a techie (either that or he has a superb editor), and the book is coherent. I didn't agree with most of the book, but I think it's important to keep abreast of what other folks are writing about the space.

Despite my general disappointment in the book, Homesteading the Noosphere was quite good. In an essay describing how "ownership" of Open Source projects works, Raymond accurately states the previously unwritten code of behavior. Projects have owners. Contributions are welcome, especially when they're written well. Project ownership can be transferred. Forking is strongly discouraged, although sometimes necessary as a last resort when the owner won't accept changes and refuses to relinquish control of the project.

apache-feather.gif The Homesteading the Noosphere essay has actually prompted me to think a little bit about what's going to happen with the Apache HTTP Server. The Apache Software Foundation is currently maintaining two separate versions of this product, 1.3.x and 2.0.x (and is also is working on 2.1.x). Although the 2.0 server has been stable and "recommended" for over a year now, there are lots of organizations that are still using the 1.3 platform. The ASF would like folks to move to 2.0, but the fact that they're still making 1.3.x releases indicates that they recognize that migrating to 2.0 is no small undertaking. When there are security problems (and sometimes features) these changes are always made in 2.0 first, but need to get "backported" to 1.3.

But what if maintaining two separate products became too cumbersome and the ASF decided to stop making 1.3.x releases? I've wondered privately if any of the organizations that have a substantial investment in Apache/1.3 would want to take over the codebase (i.e. fork it). What would happen to the Apache community if someone decided to make an Apache/1.4 release? If the development was split across two projects, would both lose momentum (and therefore market share)? Would the vast majority of folks stand by the ASF and swallow the complexity of the 2.x server, while a "rogue" bunch of hackers simply caused social turmoil with 1.4 but never really made it successfully as a project? Or vice-versa?

Regardless of technical or social reasons, something called "Apache/1.4" couldn't really happen without the ASF's blessing. Although the code is Open Source so you could re-use it for another project, the Apache License is written in such a way that derivative products aren't allowed to use the name "Apache". But maybe there could be a Hopi/1.4 or a Mohican/1.4 HTTP server...

As Raymond writes in Homesteading the Noosphere, the natural motivation is to avoid forking unless absolutely necessary. In the case of Apache HTTP Server, there are decent technical and social alternatives to this last resort. So I'd hazard to guess that we'll never see Apache/1.4.

Instead, we'll probably see at most two more Apache/1.3 releases before the code is officially declared deprecated (which will probably happen right around the time that Apache/2.1 is released). Folks who have put off the 1.3-to-2.0 migration effort will take a serious look at a 1.3-to-2.1 jump, and the vast majority of them will make the move over the next two years. Sure, there will always be some laggards who are stuck using Apache/1.3.31, but by the end of 2005 their numbers will be so small that they're not worth mentioning.

Posted by mradwin at 11:00 AM | Comments (2)

July 11, 2003

MySQL Scaling Pains

MySQL logo Jeremy Zawodny spoke Friday morning about MySQL Scaling Pains.

I'm still just waking up, so here are some abbreviated notes.

  • Security administration (don't just GRANT ALL PRIVILEGES ON *.* TO someuser, but think seriously about delegating privileges to separate users)
  • Size Limits (MyISAM default 4GB limit can be modified, you just need to know the magic incantation)
  • Lock Contention - consider using InnoDB instead of MyISAM if you have as many readers as writers. MyISAM tends to work fine when you've got 90-95% readers and just a few writers (or vice-versa) but you can run into lock contention when there are lots of both. InnoDB doesn't fix locking problems; it actually introduces some problems of its own.
  • ALTER TABLE is slow. Requires an exclusive write lock on the entire table, all queries will back up until it finishes. Plan ahead.
  • Disks often tend to be the bottleneck. You can add all of the CPU power in the world and it won't matter if it's waiting on a slow disk. Low seek times are more important than high transfer rates. RAID can help. If you have time, benchmark different disk combinations (suggested a tool called Bonnie++).
  • Load balancers. If you use one, choose the correct algorithm. Sometimes the "least connections" algorithm can make things worse. Often a simple "round-robin" algorithm works just great.
  • Handling many connections. Setting wait_timeout to a lower value will force idle connections to disconnect. Sometimes this can improve overall efficiency.
  • Data partitioning by servers (i.e. putting 1/Nth of your data on each of N clusters of servers). Instead of a single "users" table, you have 4 different tables ("users_abcdefg", "users_hijklmn", "users_opqrstu", "users_vwxyz") and the application needs to look at the first letter of the key to figure out which table to query.
  • Full-Text search is neat, but it has its limits. First, be sure to use 4.x, not 3.23. Also, it's not as flexible as other software.

Zawodny also inserted a small Yahoo! advertisement in his slides; Yahoo! is hiring engineers. His incentive is twofold. (1) Smart folks tend to go to OSCON, so it's a targeted audience, and (2) if you send him (or me) your resume we can get the employee referral bonus if you end up getting hired.

Posted by mradwin at 11:16 AM | Comments (3)

July 10, 2003

Why XML Hasn't Cured Our Ills or Saved the World

lg-xml-sticker.gif

After lunch and a little bit of work-related email, I went to Randy Ray's Why XML Hasn't Cured Our Ills or Saved the World (slides).

The talk centered around five things Ray thinks we do wrong with XML:

  1. People are too quick to use XML. You have to aks yourself if it's really necessary. Is it just for buzzword-compliance?
    • If there is no reason other than the fact that there are XML parsers, then there probably is a simpler solution
    • If only a single consumer, there may be a more economical solution.
  2. People are too slow to use XML.
    • Plan ahead for more than one customer of data?
    • If another part of the system is already using XML for a more "legitimate" task, why not use XML for other things, too? (i.e. configuration data)
    • It isn't always an extra cost. If the data format (and therefore the parser) would be sufficiently complex, maybe using an XML parser would be easier?
  3. Lack of cooperation or sharing.
    • Not often due to malice, perhaps lack of central authority. Who moderates DTD repositories? Registries on xml.com and xml.org contain outdataed information, and UDDI is too business-centric.
    • Example: difficult to find schema for recipies. Had to wade through 3 pages of Google results to eventually find RecipeML
    • Intellectual Property issues. For example, Microsoft hasn't openened up the XML formats for Office 2003. Compare to open formats like DocBook
  4. Misunderstanding the application of XML
    • XML is the "NetPBM" of generic data. (NetPBM broke new ground in image file format transformations by reducing an N * M problem to N + M).
    • People think that XML is only for "document" data.
  5. People want to make XML hard.
    • Tough topics make money. How can businesses sell books/tools/software/training/services when customers think that XML is "easy"? Vested interest in making it complicated.

In conclusion, Ray mused that no one technology is (yet) a universal solution and XML is no different when it comes data formats. His charge to the audience: just think about XML before using (or not using) it. Self-described experts don't necessarily have all the answers.

Posted by mradwin at 05:07 PM | Comments (0)

Ruby for Perl Programmers

I stuck around for local software guy Phil Tomson's Ruby for Perl Programmers talk. This session was more technical, with the first code example showing up on the 4th slide.

Phil's slides are online, so I won't attempt to replicate them here.

Something listed as a "gotcha" actually seems to be a feature to me. Since all variables hold references to objects, you have to explicitly call .dup to clone an object. It's more Java-like than Perl, but it probably ends up being higher performance since you only make copies when you explicitly want them.

Posted by mradwin at 12:04 PM | Comments (0)

The Power and Philosophy of Ruby

Tower of Babel Yukihiro Matsumoto spoke about The Power and Philosophy of Ruby on Thursday morning. The talk was all philosophy, no code. Very entertaining.

We started off by discussing natural languages and the Tower of Babel, with a comparison of Japanese and its use of ideograms versus English. Matsumoto said that he was heavily influenced by the science fiction novel Babel-17. In some part, the power of the "super-language" in this book inspired him to create the Ruby programming language.

He spoke about the importance of choosing good names; those that are short and well-chosen usually convey meaning very easily. He also spoke about the importance of the machine making it easier for humans (Moore's Law, evolution of programming languages to higher-level concepts). He feels it's important for programming languages to cause the programmer as little stress as possible, and pointed out that one metric of a good programming language is that the programmer still has time to go out and have fun.

However, Matsumoto made it clear that simplicity is not a goal of Ruby. After all, human thoughts are not simple, and programs are essentially complex things. Rather, the design adheres to the principle of least surprise. If some aspect of the language meets your expectation, then it's achieving its goal. Succinctness is highly valued because Matsumoto believes it leads to productivity and efficiency.

In Ruby, like in Perl, There's More Than One Way To Do It, but the language can encourage one way. For example, Ruby does allow global variables, but you have to put a $ character before globals. Since too many $ are considered ugly, it discourages use of globals. "Dangerous" methods in Ruby have a ! in their name, for example sort and sort!. The "dangerous" methods might be faster, but they have side-effects, and the ! character reminds you to be careful.

Posted by mradwin at 11:27 AM | Comments (5)

July 09, 2003

Perl Lightning Talks

The Sound of Music Wandering around after lunch, I stopped by the Perl Lightning Talks (slides) session. I was delighted to hear Autrijus Tang's five-minute rap These are 1% of my favourite CPAN... in Chinese, followed by an English translation sung to the tune of These are a few of my favorite things... from The Sound of Music.
It was incredible. Standing ovation.

Allison Randal's lightning talk was a parody of Arlo Guthrie's Alice's Restaurant. "You can get anything you want / in Perl 6 development." Clever, but Autrijus is a hard act to follow.

Also notable was Dave Rolsky's talk on DateTime. Dave, like my friends Gabriel and Rachel, is from Minnesota.

Posted by mradwin at 03:47 PM | Comments (0)

OSCON Wednesday morning

I bounced around on Wednesday between a bunch of different sessions. In the morning, I did some last-minute touch-ups on my slides, then caught the tail end of John Coggeshall's Interfacing Java / COM with PHP. After my talk on One Year of PHP at Yahoo! (slides), I grabbed some lunch in the speaker's room. Shane asked me to collect some feedback from my co-workers about Komodo since they're starting to think about what might go into their 3.0 release.

I showed up a little bit late for Adam Trachtenberg's Introduction to Web Services in PHP: SOAP versus REST talk, but the room was packed so I couldn't find a seat. So I stuck my head inside Zak and Monty's Guided Tour of the MySQL Source Code to catch an updated version of what had changed since the users conference in April.

I also checked out Shane's Introduction to PEAR talk, but the conference room had run out of seats again. Too bad they didn't pick a bigger room for the PHP talks this year.

Posted by mradwin at 02:53 PM | Comments (0)

Tim O'Reilly: Paradigm Shift

oreilly_header_part1.gif Tim O'Reilly gave this morning's keynote address, "The Open Source Paradigm Shift". The talk was reminiscent of last year's Watching The Alpha Geeks keynote at ApacheCon, although now he is able to say the phrase "paradigm shift" with a straight face.

Largely the talk was trying to make the case that we shouldn't try to think about Open Source software in the traditional commercial software business model. Instead, we should recognize that the software (to some extent) has become a commodity, just like hardware has become a commodity. The true value in Open Source is the businesses that grow up around it. For example, nobody pays for Sendmail and Apache, yet thousands of ISPs make money from providing web/email hosting services for their customers.

His charge to the audience was to embrace the fact that Open Source software has become a commodity, and to start to think of it (and all of the services that have grown up around it) as a platform. If we can develop services that support collaboration and end-user customization, and the data flows freely enough, we'll somehow find a way to feed our families.

Posted by mradwin at 09:32 AM | Comments (3)

July 08, 2003

"One Year" slides now online

yahoo.gif Slides for my talk tomorrow, One Year of PHP at Yahoo! are now available online.

The talk will be from 11:30am - 12:15pm in Salon D.

Posted by mradwin at 01:17 PM | Comments (0)

Building Data Warehouses with MySQL

John Ashenfelter spoke about Building Data Warehouses with MySQL. After surveying the audience with some questions about what database technology people use and how much data they store, he described what he felt was the one and only reason to create a data warehouse: to answer business questions.

The first two-thirds of the talk discussed DW in general and made very little reference to MySQL in particular.

One of the Ashenfelter's "if you only learn 3 things from this talk" statements was architect for a data warehouse, but build a data mart. Data marts answer "vertical"-type questions. Each are focused on answering one narrow business process. But marts should share a consistent view of the data from the warehouse. You can think of a data warehouse as a collection of standardized data marts.

Getting your definitions consistent is important. What's an order? The salesperson might think of an order as "I sold 59 baseball cards and I got $100" but the shipping depratment might send it out in 3 different shipments from 2 different order fulfillment centers. How many "orders" is that?

Also important to standardize on how the DW represents business policies and practices. For example, is revenue booked at sale or collection? How do you define "top customer"? Someone who buys more than half a million dollars a year, or someone who buys more than once a week? Gets these questions answered by the business people so when they use the DW they know what they're getting.

An interesting sidebar: never use anything "meaningful" for a key. Product numberings/SKUs will be guaranteed to change, merger or acquisition with another company means that you'll have to do customer id reassignments. Recommendation: use an int (not a varchar) which gives you flexibility for the inevitable change.

Ashenfelter described using a Star schema (not a snowflake schema) for representing the data. The DW should be centered around Facts which have Dimensions, but be sure not to normalize your Dimensions or you'll end up doing joins of 17 different tables for your queries. It may drive traditional relation database engineers crazy, but denormalized data means fewer joins and faster performance. Some extra redundancy is worth that performance boost.

Next, we went through an example of a DW for Vmeals, a take-out/catering delivery service for businesses. We went through 6 steps for designing the DW:

  1. Plan the data warehouse design
  2. Create corporate metadata standards
  3. Pick a business process
  4. Determine the grain of the fact table
  5. Detail the dimensions of the facts
  6. Find the relevant facts

Speaking about MySQL in particular, Ashenfelter mentioned that MySQL 4.0 has greatly improved the speed of bulk insert, which is important for the E-T-L (Export-Transform-Load) part of data warehousing. His basic model is to get data in batch from Microsoft SQL Server or Oracle via some sort of dump, do some transformation (for example, to denormalize the data), then load the data into MySQL.

A couple of interesting notes: using a staging environment is a good way to provide efficiency and concurrency (so folks can still query yesterday's data while you're preparing today's data). It also gives you a hook to do validation tests. For example, you could sum all of the January sales and compare whether or not the total matched what the computed total was yesterday. If it's July and the data changed, it indicates that something with your source data is wrong, and it's better to flag it so someone can investigate instead of releasing the data to production and giving the business folks an inconsistent view.

As the talk started wrapping up, Ashenfelter mentioned several Open Source tools (mostly written in Java) that work with MySQL for data warehousing. For E-T-L, he suggested CloverETL or Enhydra Octopus. For Reporting, he recommended Jasper Reports, jFreeReport, and DataViz. For OLAP tools, he mentioned Mondrian, JPivot, and BEE. For Delivery Frameworks, you could think about using Jetspeed, Webworks, or PHP-Nuke.

Posted by mradwin at 12:39 PM | Comments (0)

July 07, 2003

Designing and Creating Great Shared Libraries

Theodore Ts'o spoke about Designing and Creating Great Shared Libraries. It was a truly geeky talk, sprinkled with interesting historical trivia and packed with really useful guidelines and real-world examples.

He started out by describing his personal history with shared libraries by descibing his involvement with Kerberos V5 and the Linux Standards Base. As a motivating example, Ted pointed out a flaw in the ELF shared object model (used, for example, by Linux and FreeBSD) which doesn't have the concept of namespaces for the symbols contained in shared objects. You can end up with a real headache if

  • Shared library "A" uses db2
  • Shared library "B" uses shared libraries "A" and db3
  • Application uses shared libraries "A", "B", and db4

Oftentimes this manifests itself in core dumps, because conflicting symbols from various different libraries collide with each other.

Most people understand API (Application Programming Interface) compatibility (issue: source-level compatibility) but many people don't think about ABI (Application Binary Interface) compatibility (issue: link-time compatibility). In addition to keeping all of your C function signatures around, you've also got to make sure that none of the arguments (or return types) change.

From a portability perspective, Ted recommends that you "avoid global variables in shared libraries at all costs." But in 2003, why care about portability? "There's a disease going around where people think that all the world is Linux. It used to be that people thought that all the world is VAX, then all the world was Solaris, now all the world is Linux."

Tangent: Performance-sensitive PIC (position independent code) libraries have a minor disadvantage on the x86 chip because there aren't many general-purpose registers. Ted has noticed a 5% (or more) performance hit in some cases using -fPIC because the compiler essentially needs to reserve one of those registers for the relocation and can't use it for algorithm-specific storage.

Another tangent: Try to remain bug-for-bug compatible. For example, the Linux libc (back in the version 4 days) changed at one point so that calling fclose() twice would result in a core dump. This was considered a good thing, since calling fclose() twice is considered wrong to begin with, and it would be better for the programmer to realize this sooner and fix the bug than to have some other mysterious bug appear that's harder to track down. Apparently a well-known application (Netscape) incorrectly called fclose() twice, and when users upgraded their libc to the next minor release, it started crashing. Who's fault was it? Netscape's or the libc author?

After seeing a live demo of how to build a shared library and link an application against it, Ts'o spent quite a bit of time on a feature called ELF Symbol Versioning which allows you to provide multiple implementations of a function that get automatically selected by the application depending on when they linked against a shared library. He spoke about some of the differences between the Solaris and Linux implementations (mapfiles vs. the FSF __asm__(".symver ...") extension).

Ts'o warned the audience, that this technique should rarely be used. A couple of examples when it might be appropriate are for when you want to preserve bug-for-bug compatibility, or when a poorly-designed API is so enshrined that you can't change it (i.e. getopt(), stdio functions, or strtok()).

During the break we chatted about whether the ELF Symbol Versioning feature would work on FreeBSD (which has been using ELF since the 3.0 release). Ts'o suggested that it would definitely work if we were using the GNU ld (which I don't think we are) or that it might work if the FreeBSD folks had implemented the same functionality into the linker. Neither of us knew the answer, but a guy sitting nearby tried it out and said that it worked for him.

After the break, Ts'o switched gears to talk about How To Do It Right. In brief, he gave the following high-level guidelines:

  1. Use public and private header files. Only expose the parts of your API that you really need to expose.
  2. Use "namespaces" by prefixing all functions with a common string (such as "ext2fs_")
  3. Avoid exposing data structures. Use opaque pointers and (non-inline) function accessors.
  4. If you must use public data structures, reserve spare data elements for later additions.
    int spare_int[8];
    long spare_long[8];
    void *spare_ptrs[8];

  5. If you must use public data structures, never reorder or delete structure fields. Add new fields to the end or use the reserved space.
  6. Use structure magic numbers. At the beginning of each data structure, store a unique 4-byte magic number. Library can do run-time checking to make sure that the right data structure is passed to the right program.
  7. Don't use static variables.
  8. Be consistent about caller vs. callee memory allocation. Pros and Cons both ways, but Ts'o prefers callee allocation.
  9. Consider doing Object-Oriented programming in C. Simulate data encapsulation via opaque pointers, virtual functions with function pointers, and don't bother with class inheritance (or use void * pointers or unions and type variables if you really need it).

We also saw some case studies of common APIs that were done wrong, such as gethostbyname() and getopt() and the types of headaches that they cause.

The last part of the talk focused on two topics: plug-ins and the GNU build tool chain. Ts'o gave a bunch of examples of how to use the dlfcn family of functions (dlopen(), dlsym(), and dlclose()) to develop a plug-in model for your application. We also got a high-level overview of autoconf, automake, and libtool which try to make it easier to write portable libraries and applications. It's a good thing we didn't spend too much time on these, as they can be extremely complicated beasts. Ts'o reminded us that these tools are designed with portability in mind; he pointed out that he's seen projects that use these tools, yet only build on Linux!

Posted by mradwin at 04:55 PM | Comments (0)

"Urgent: MacOS X users, please turn off Rendezvous"

As Jeremy pointed out, the wireless network at OSCON was having problems this morning. During the break in the afternoon session, there were little laser-printed signs all around asking people to please disable Rendezvous as it's causing interference. There were even instructions on how to turn it off!

sudo mDNSResponder stop

Perhaps the "Networking, simplified" motto should be renamed "Networking, all screwed up."

Posted by mradwin at 03:25 PM | Comments (2)

Introduction to XSLT

Sitting in a small room with about 20 other folks, I'm hoping to learn something about XSL and XSLT. Our instructor for this half-day tutorial is Mike Fitzgerald of Wy'east Communications (whose website appears to be unavailable right now).

XSLT has been around for 3 or 4 years now, but this is the first time I've had an opportunity to look at it in any detail.

We started simple, with a basic transformation:

<!-- msg.xml -->
<msg/>

<!-- msg.xsl -->
<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
<template match="msg">Found it!</template>
</stylesheet>

On the surface, XSLT looks simple and elegant. But things get complicated very quickly. Over the course of the next 3 hours, Mike built upon the basics, teaching us the syntax and concepts involved.

XSLT uses a language called XPath to access or refer to parts of an XML document. I quickly grew tired of all the magic characters that XPath uses: /, //, @, {}, *, ::, [], |, etc. It seems to me that the designers of XPath had love affair with braces, brackets, and other operators. Instead of doing some sort of human-readable query language, you end up with stuff that looks like id("foo")/child::para[position()=5]. Haven't these folks ever heard of something called whitespace?

Even though I tend to think of things procedurally, I really do like the idea of using a declarative language to describe a way of transforming data into presentation. I guess when you're coding XPath every day, the idea is to keep things as terse as possible; XPath excels at that.

However, when you start using XSLT Functions and Variables, things start to look more & more like a scripting language like PHP or Perl. Apparently you can't do everything with the declarative approach.

XSLT also seems very well integrated with other XML-related concepts. You've gotta be namespace-savvy to get things right in XSLT.

Overall, it was a very good session. The pace was a little slow for me, but he did a couple of things really well:


  1. Almost every single slide was accompanied by an example. Mike stepped through the source code line-by-line, and then ran the examples live to show us how it all worked.
  2. He handed out CD-ROMs of all of the examples (and 3 or 4 XSLT processors) at the beginning of the talk so we could try the examples right then & there on our laptops.

Posted by mradwin at 01:49 PM | Comments (0)

Blogging OSCON 2003

oreilly_header_part1.gif I'm at OSCON 2003 in Portland this week.

I've created a new "Open Source" category in my blog for entries that I type up during this event. Most useful feature: built-in spell-check.

I'm also testing out my first post using the Zempt blogging tool.

Posted by mradwin at 08:56 AM | Comments (1)

February 23, 2003

One Year of PHP at Yahoo!

oscon2003-speaker-125x125.gif I will be speaking at the O'Reilly Open Source Software Convention 2003 this summer in Portland, Oregon.

The title of my 45-minute session is One Year of PHP at Yahoo!

The conference runs from July 7-11. Registration begins in April.

[Update 8 July 2003: Slides for the talk are now available online.]

Posted by mradwin at 06:19 PM | Comments (4)

Copyright © 2007 Michael J. Radwin. Some rights reserved.