Designing and Creating Great Shared Libraries

Theodore Ts’o spoke about Designing and Creating Great Shared Libraries. It was a truly geeky talk, sprinkled with interesting historical trivia and packed with really useful guidelines and real-world examples.

He started out by describing his personal history with shared libraries by descibing his involvement with Kerberos V5 and the Linux Standards Base. As a motivating example, Ted pointed out a flaw in the ELF shared object model (used, for example, by Linux and FreeBSD) which doesn’t have the concept of namespaces for the symbols contained in shared objects. You can end up with a real headache if

  • Shared library “A” uses db2

  • Shared library “B” uses shared libraries “A” and db3
  • Application uses shared libraries “A”, “B”, and db4

Oftentimes this manifests itself in core dumps, because conflicting symbols from various different libraries collide with each other.

Most people understand API (Application Programming Interface) compatibility (issue: source-level compatibility) but many people don’t think about ABI (Application Binary Interface) compatibility (issue: link-time compatibility). In addition to keeping all of your C function signatures around, you’ve also got to make sure that none of the arguments (or return types) change.

From a portability perspective, Ted recommends that you “avoid global variables in shared libraries at all costs.” But in 2003, why care about portability? “There’s a disease going around where people think that all the world is Linux. It used to be that people thought that all the world is VAX, then all the world was Solaris, now all the world is Linux.”

Tangent: Performance-sensitive PIC (position independent code) libraries have a minor disadvantage on the x86 chip because there aren’t many general-purpose registers. Ted has noticed a 5% (or more) performance hit in some cases using -fPIC because the compiler essentially needs to reserve one of those registers for the relocation and can’t use it for algorithm-specific storage.

Another tangent: Try to remain bug-for-bug compatible. For example, the Linux libc (back in the version 4 days) changed at one point so that calling fclose() twice would result in a core dump. This was considered a good thing, since calling fclose() twice is considered wrong to begin with, and it would be better for the programmer to realize this sooner and fix the bug than to have some other mysterious bug appear that’s harder to track down. Apparently a well-known application (Netscape) incorrectly called fclose() twice, and when users upgraded their libc to the next minor release, it started crashing. Who’s fault was it? Netscape’s or the libc author?

After seeing a live demo of how to build a shared library and link an application against it, Ts’o spent quite a bit of time on a feature called ELF Symbol Versioning which allows you to provide multiple implementations of a function that get automatically selected by the application depending on when they linked against a shared library. He spoke about some of the differences between the Solaris and Linux implementations (mapfiles vs. the FSF __asm__(".symver ...") extension).

Ts’o warned the audience, that this technique should rarely be used. A couple of examples when it might be appropriate are for when you want to preserve bug-for-bug compatibility, or when a poorly-designed API is so enshrined that you can’t change it (i.e. getopt(), stdio functions, or strtok()).

During the break we chatted about whether the ELF Symbol Versioning feature would work on FreeBSD (which has been using ELF since the 3.0 release). Ts’o suggested that it would definitely work if we were using the GNU ld (which I don’t think we are) or that it might work if the FreeBSD folks had implemented the same functionality into the linker. Neither of us knew the answer, but a guy sitting nearby tried it out and said that it worked for him.

After the break, Ts’o switched gears to talk about How To Do It Right. In brief, he gave the following high-level guidelines:

  1. Use public and private header files. Only expose the parts of your API that you really need to expose.

  2. Use “namespaces” by prefixing all functions with a common string (such as “ext2fs_”)
  3. Avoid exposing data structures. Use opaque pointers and (non-inline) function accessors.
  4. If you must use public data structures, reserve spare data elements for later additions.

    int spare_int[8];

    long spare_long[8];

    void *spare_ptrs[8];

  5. If you must use public data structures, never reorder or delete structure fields. Add new fields to the end or use the reserved space.
  6. Use structure magic numbers. At the beginning of each data structure, store a unique 4-byte magic number. Library can do run-time checking to make sure that the right data structure is passed to the right program.
  7. Don’t use static variables.
  8. Be consistent about caller vs. callee memory allocation. Pros and Cons both ways, but Ts’o prefers callee allocation.
  9. Consider doing Object-Oriented programming in C. Simulate data encapsulation via opaque pointers, virtual functions with function pointers, and don’t bother with class inheritance (or use void * pointers or unions and type variables if you really need it).

We also saw some case studies of common APIs that were done wrong, such as gethostbyname() and getopt() and the types of headaches that they cause.

The last part of the talk focused on two topics: plug-ins and the GNU build tool chain. Ts’o gave a bunch of examples of how to use the dlfcn family of functions (dlopen(), dlsym(), and dlclose()) to develop a plug-in model for your application. We also got a high-level overview of autoconf, automake, and libtool which try to make it easier to write portable libraries and applications. It’s a good thing we didn’t spend too much time on these, as they can be extremely complicated beasts. Ts’o reminded us that these tools are designed with portability in mind; he pointed out that he’s seen projects that use these tools, yet only build on Linux!

“Urgent: MacOS X users, please turn off Rendezvous”

As Jeremy pointed out, the wireless network at OSCON was having problems this morning. During the break in the afternoon session, there were little laser-printed signs all around asking people to please disable Rendezvous as it’s causing interference. There were even instructions on how to turn it off!


sudo mDNSResponder stop

Perhaps the “Networking, simplified” motto should be renamed “Networking, all screwed up.”

Introduction to XSLT

Sitting in a small room with about 20 other folks, I’m hoping to learn something about XSL and XSLT. Our instructor for this half-day tutorial is Mike Fitzgerald of Wy’east Communications (whose website appears to be unavailable right now).

XSLT has been around for 3 or 4 years now, but this is the first time I’ve had an opportunity to look at it in any detail.

We started simple, with a basic transformation:


<!-- msg.xml -->

<msg/>

<!-- msg.xsl -->

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">

<output method="text"/>

<template match="msg">Found it!</template>

</stylesheet>

On the surface, XSLT looks simple and elegant. But things get complicated very quickly. Over the course of the next 3 hours, Mike built upon the basics, teaching us the syntax and concepts involved.

XSLT uses a language called XPath to access or refer to parts of an XML document. I quickly grew tired of all the magic characters that XPath uses: /, //, @, {}, *, ::, [], |, etc. It seems to me that the designers of XPath had love affair with braces, brackets, and other operators. Instead of doing some sort of human-readable query language, you end up with stuff that looks like id("foo")/child::para[position()=5]. Haven’t these folks ever heard of something called whitespace?

Even though I tend to think of things procedurally, I really do like the idea of using a declarative language to describe a way of transforming data into presentation. I guess when you’re coding XPath every day, the idea is to keep things as terse as possible; XPath excels at that.

However, when you start using XSLT Functions and Variables, things start to look more & more like a scripting language like PHP or Perl. Apparently you can’t do everything with the declarative approach.

XSLT also seems very well integrated with other XML-related concepts. You’ve gotta be namespace-savvy to get things right in XSLT.

Overall, it was a very good session. The pace was a little slow for me, but he did a couple of things really well:

  1. Almost every single slide was accompanied by an example. Mike stepped through the source code line-by-line, and then ran the examples live to show us how it all worked.
  2. He handed out CD-ROMs of all of the examples (and 3 or 4 XSLT processors) at the beginning of the talk so we could try the examples right then & there on our laptops.