Designing and Creating Great Shared Libraries

Theodore Ts’o spoke about Designing and Creating Great Shared Libraries. It was a truly geeky talk, sprinkled with interesting historical trivia and packed with really useful guidelines and real-world examples.

He started out by describing his personal history with shared libraries by descibing his involvement with Kerberos V5 and the Linux Standards Base. As a motivating example, Ted pointed out a flaw in the ELF shared object model (used, for example, by Linux and FreeBSD) which doesn’t have the concept of namespaces for the symbols contained in shared objects. You can end up with a real headache if

  • Shared library “A” uses db2
  • Shared library “B” uses shared libraries “A” and db3
  • Application uses shared libraries “A”, “B”, and db4

Oftentimes this manifests itself in core dumps, because conflicting symbols from various different libraries collide with each other.

Most people understand API (Application Programming Interface) compatibility (issue: source-level compatibility) but many people don’t think about ABI (Application Binary Interface) compatibility (issue: link-time compatibility). In addition to keeping all of your C function signatures around, you’ve also got to make sure that none of the arguments (or return types) change.

From a portability perspective, Ted recommends that you “avoid global variables in shared libraries at all costs.” But in 2003, why care about portability? “There’s a disease going around where people think that all the world is Linux. It used to be that people thought that all the world is VAX, then all the world was Solaris, now all the world is Linux.”

Tangent: Performance-sensitive PIC (position independent code) libraries have a minor disadvantage on the x86 chip because there aren’t many general-purpose registers. Ted has noticed a 5% (or more) performance hit in some cases using -fPIC because the compiler essentially needs to reserve one of those registers for the relocation and can’t use it for algorithm-specific storage.

Another tangent: Try to remain bug-for-bug compatible. For example, the Linux libc (back in the version 4 days) changed at one point so that calling fclose() twice would result in a core dump. This was considered a good thing, since calling fclose() twice is considered wrong to begin with, and it would be better for the programmer to realize this sooner and fix the bug than to have some other mysterious bug appear that’s harder to track down. Apparently a well-known application (Netscape) incorrectly called fclose() twice, and when users upgraded their libc to the next minor release, it started crashing. Who’s fault was it? Netscape’s or the libc author?

After seeing a live demo of how to build a shared library and link an application against it, Ts’o spent quite a bit of time on a feature called ELF Symbol Versioning which allows you to provide multiple implementations of a function that get automatically selected by the application depending on when they linked against a shared library. He spoke about some of the differences between the Solaris and Linux implementations (mapfiles vs. the FSF __asm__(".symver ...") extension).

Ts’o warned the audience, that this technique should rarely be used. A couple of examples when it might be appropriate are for when you want to preserve bug-for-bug compatibility, or when a poorly-designed API is so enshrined that you can’t change it (i.e. getopt(), stdio functions, or strtok()).

During the break we chatted about whether the ELF Symbol Versioning feature would work on FreeBSD (which has been using ELF since the 3.0 release). Ts’o suggested that it would definitely work if we were using the GNU ld (which I don’t think we are) or that it might work if the FreeBSD folks had implemented the same functionality into the linker. Neither of us knew the answer, but a guy sitting nearby tried it out and said that it worked for him.

After the break, Ts’o switched gears to talk about How To Do It Right. In brief, he gave the following high-level guidelines:

  1. Use public and private header files. Only expose the parts of your API that you really need to expose.
  2. Use “namespaces” by prefixing all functions with a common string (such as “ext2fs_”)
  3. Avoid exposing data structures. Use opaque pointers and (non-inline) function accessors.
  4. If you must use public data structures, reserve spare data elements for later additions.

    int spare_int[8];

    long spare_long[8];

    void *spare_ptrs[8];

  5. If you must use public data structures, never reorder or delete structure fields. Add new fields to the end or use the reserved space.
  6. Use structure magic numbers. At the beginning of each data structure, store a unique 4-byte magic number. Library can do run-time checking to make sure that the right data structure is passed to the right program.
  7. Don’t use static variables.
  8. Be consistent about caller vs. callee memory allocation. Pros and Cons both ways, but Ts’o prefers callee allocation.
  9. Consider doing Object-Oriented programming in C. Simulate data encapsulation via opaque pointers, virtual functions with function pointers, and don’t bother with class inheritance (or use void * pointers or unions and type variables if you really need it).

We also saw some case studies of common APIs that were done wrong, such as gethostbyname() and getopt() and the types of headaches that they cause.

The last part of the talk focused on two topics: plug-ins and the GNU build tool chain. Ts’o gave a bunch of examples of how to use the dlfcn family of functions (dlopen(), dlsym(), and dlclose()) to develop a plug-in model for your application. We also got a high-level overview of autoconf, automake, and libtool which try to make it easier to write portable libraries and applications. It’s a good thing we didn’t spend too much time on these, as they can be extremely complicated beasts. Ts’o reminded us that these tools are designed with portability in mind; he pointed out that he’s seen projects that use these tools, yet only build on Linux!