Technical Interview: movie SQL schema design

When interviewing candidates, sometimes we need to know if they grok SQL. Here’s a simple interview question that gives me an idea that someone has had to create a schema from scratch, vs. always using a schema that someone else created.

Setup

We’re going to design a movies database. Each movie has a title and year, one (and only one) director, and some number of actors. Actors can star in multiple movies. Directors can direct multiple movies. Some movies have the same title such as Ocean’s Eleven (the 2001 version directed by Steven Sodenbergh had George Clooney, Brad Pitt, Matt Damon, Julia Roberts, and many others, but the 1960 version was directed by Lewis Milestone and starred Frank Sinatra, Dean Martin and Sammy Davis Jr).

The schema should be normalized enough to avoid duplicating strings too much, and also to be able to efficiently answer questions like these two:

  • Who acted in Fight Club (1999)?
  • What are the 10 most recent movies that George Clooney starred in?

Questions

Use your judgement about what a good answer is in terms of what to name the tables, what keys they should have. To be consistent, we’ll ask every candidate these 3 questions:

  1. How many tables?
  2. What are the names of the tables?
  3. What are the names of each column in those tables? What are the types of each column?

When designing the schema, remind the candidate that we’re not looking to over-design the perfect, most flexible schema. We want the simplest possible schema that answer questions like the two aforementioned ones. We can mention that if time permits, we’ll ask the candidate to actually write the query for one of them.

Schema design

One possible answer (and there are many others that are acceptable):

movie
--------
id (int PK)
title (varchar)
year (int)
director_id (int)
 
person
----------
id (int pk)
name (varchar)
 
movie_actor
-----------
actor_id (int)
movie_id (int)

Sometimes candidates will get excited by the different roles (actors vs directors) and want to create a role table. Sometimes they’ll go to town on the person attributes (date of birth, splitting first and last names into separate columns). Some movie buff candidates will also point out that in real life, movies can have multiple directors (for example, The Matrix was directed by the Wachowski Brothers).

These are outside of the artificially simplified constraints we specified earlier, but can be fine if the candidate is really good at relational database design and fast at articulating their thoughts. But the candidate can also get distracted, which is a good simulation of what often happens with real-world business problems. Can the candidate focus on only the essentials?

If the candidate gets really distracted, you can invoke the YAGNI principle. This could be an opportunity to ask the candidate to share an anecdote about when they had to apply YAGNI.

Query

If the candidate is doing well, we ask them to write the SQL for either or both of the aforementioned queries.

What are the 10 most recent movies that George Clooney starred in?

SELECT m.title,m.year
FROM movie m, movie_actor ma, person p
WHERE p.name = 'George Clooney'
AND ma.actor_id = p.id
AND ma.movie_id = m.id
ORDER BY m.year DESC
LIMIT 10;

Again, there are many correct solutions (such as using an explicit JOIN instead of a natural join as we’ve done above). And yes, we know that Oracle doesn’t support LIMIT clauses, so you can do a nested query and then use the first ten rownums. The goal here isn’t to get into vendor-specific SQL syntax discussion, but more to see if they can write a query that uses more than one table.

“Unique” Random Numbers technical interview question

Dan Karp used to ask this question at Zimbra. I’ve always loved it.

1. warmup

Given this API from the java.lang standard Math class in the JDK:

/**
* Returns a double value with a positive sign,
* greater than or equal to 0.0 and less than 1.0.
* Returned values are chosen randomly with
* uniform distribution from that range.
*/
public static double Math.random();

Your task is to implement a random integer in the range [0,999]. Don’t overthink it; this is just a warmup.

Solution:

public class Foo {
public static int randomInt1000() {
return (int) (Math.random() * 1000);
}
}

Idea is to truncate a number like 123.99748 to 123 by casting a double to an int. No need to use Math.floor(), that just wastes a function call and does some floating-point arithmetic.

Some candidates will want to avoid hard-coding the 1000 constant, and will create a function signature like this:

public static int randomInt(int N);

Great. Either answer is totally fine.

2. introduce the idea of unique-ness

Things are about to get harder.

If I call your Foo.randomInt1000() function a bunch of times, I’ll get values like this:

316, 675, 22, 671, 724, 394, 31, 177, 997, …, 82, 671, 810, 433, …

If you call the function enough times, of course you’ll see the same number more than once.

Create a new function called uniqueRandomInt1000() that returns a random integer in the range [0,999] inclusive but doesn’t return duplicates. Your function will need to keep some state, and it’s up to you how to manage that state.

We’re looking for (1) a correct solution to the problem, and (2) an efficient solution. Correctness is more important than efficiency. But efficiency matters, too.

Edge case: if the caller calls uniqueRandomInt1000() more than 1000 times, the implementation needs to signal in some way that there are no more unique random numbers that can be returned because every number in the range [0,999] has been returned to the caller. Either return -1 or throw an Exception.

3.0. a naive, correct, but O(unbounded) solution

Keep an array of booleans or a java.util.BitSet, or a Set<Integer> of every number you’ve returned as an instance variable for your class. Every time the method gets called, generate a random integer in the range 0,999 and then check your instance variable to see if that number has been “seen” by the caller. If so, randomly generate another integer in the range 0,999 and repeat until you find an “unseen” number. Mark that number as seen and return to the caller.

3.1. a correct, O(N^2) solution using a List

Key idea here to avoid looping is to generate a random index in an ever-smaller range of numbers, and use that index to select a number from a LinkedList or an ArrayList.

In the ctor you populate a List with 1000 elements – every number between 0,999. That’s O(N) time.

Then, in the function, you generate a random index in the range 0,size()-1. size() is initially 1000, but shrinks over time as you remove() elements from the List.

get() followed by remove() is always O(N). With an ArrayList, you get constant time indexing into the position you want to retrieve (get()), but then you spend linear time copying elements over to compact the list (remove()). With a LinkedList you get linear scanning to move to the node at a particular index (get()), and then you get constant time to tidy up the list to remove that element (remove()).

If you ask for N random numbers from the class you get N * O(N) == O(N^2) cost.

3.2. solution using Java collections O(N)

This solution does O(2N) in the ctor and then O(1) for each call. It’s kinda “cheating” a little bit because all of the hard algorithmic work is done by a single function call Collections.shuffle().

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class UniqueRandom {
private List numbers;
private int pos = 0;

public UniqueRandom(int n) {
numbers = new ArrayList(n);
for (int i = 0; i < n; i++) {
numbers.add(i);
}
Collections.shuffle(numbers);
}

public int nextInt() {
if (pos == numbers.size()) {
return -1;
}
return numbers.get(pos++);
}

public static void main(String[] args) {
UniqueRandom ur = new UniqueRandom(35);
for (int i = 0; i < 40; i++) {
System.out.println(ur.nextInt());
}
}
}

3.3. solution using an int[]

This solution does O(1N) in the ctor and then O(1) for each call.

public class UniqueRandom2 {
private int[] numbers;
private int avail;

public UniqueRandom2(int n) {
avail = n;
numbers = new int[avail];
for (int i = 0; i < avail; i++) {
numbers[i] = i;
}
}

public int nextInt() {
if (avail == 0) {
return -1;
}
int idx = (int) (Math.random() * avail);
int retval = numbers[idx];
numbers[idx] = numbers[--avail];
return retval;
}

public static void main(String[] args) {
UniqueRandom2 ur = new UniqueRandom2(1000);
for (int i = 0; i < 40; i++) {
System.out.println(ur.nextInt());
}
}
}

Thanks to Ryan Hamilton for reminding me about this so I had an excuse to dig up my sample code and post it online.

All-day untimed events in RFC 2445 iCalendar

Timely‘s WordPress Calendar plugin doesn’t interoperate well with RFC 2445 iCalendar feeds that contain all-day untimed events specified by both DTSTART and a DURATION:P1D. Instead of interpreting them as a one-day all-day event, it seems to interpret them as a two-day event.

I examined a half-dozen calendar apps to see how they generate these kinds of events and here’s what I found:

Apple Calendar.app (Mac OS X 10.9.2)

BEGIN:VEVENT
DTSTART;VALUE=DATE:20140312
DTEND;VALUE=DATE:20140313
SUMMARY:New Event
END:VEVENT

Google Calendar

BEGIN:VEVENT
DTSTART;VALUE=DATE:20140312
DTEND;VALUE=DATE:20140313
SUMMARY:Untitled event
END:VEVENT

Microsoft Outlook Mac 2011 (version 14.3.9)

BEGIN:VEVENT
DTSTART;TZID="Coordinated Universal Time":20140312T000000
DTEND;TZID="Coordinated Universal Time":20140313T000000
SUMMARY:New Appointment
END:VEVENT

Windows Live Calendar (Outlook.com)

BEGIN:VEVENT
DTSTART;VALUE=DATE:20140312
DTEND;VALUE=DATE:20140313
SUMMARY:Example
END:VEVENT

Yahoo! Calendar

BEGIN:VEVENT
SUMMARY:Example
DTSTART;VALUE=DATE:20140312
DTEND;VALUE=DATE:20140312
END:VEVENT

Hebcal.com

BEGIN:VEVENT
SUMMARY:Purim
DTSTART;VALUE=DATE:20140316
DURATION:P1D
END:VEVENT

AddThisEvent.com

BEGIN:VEVENT
DTSTART:20140312
DTEND:20140313
SUMMARY:Example
END:VEVENT

Since the overwhelming majority seem to use both DTSTART + DTEND, I may change Hebcal’s feeds to follow suit. I’ll need to do some interop testing with older versions of Microsoft Outlook for Windows first.

Farewell, htdig

htdig has been retired from radwin.org.

Screen Shot 2013-08-11 at 9.07.05 PM

Many years ago I set up a site search on radwin.org using a tool called ht://Dig. It was a C/C++ app that would spider/crawl a website and also serve out search results in an HTML-friendly fashion through a CGI.

The rest of the world moved onto Lucene/Solr, but I never bothered to make the move – in large part because htdig was good enough, and in part because DreamHost doesn’t  support Tomcat/J2EE in their shared hosting environment (I don’t blame them).

The last time I successfully compiled htdig was on October 31, 2005. DreamHost has upgraded hardware several times, and the old 32-bit binaries have continued to run. Recently they did another server migration/upgrade, and finally everything broke. I tried recompiling both the stable 3.1.6 (released February 2002) and the most recent release (3.2.0b6, released June 2004) but neither compiles cleanly with gcc 4.4.

And it’s not worth fixing.

 

Paternity leave coding update

Three and a half weeks into my sabbatical, and I’ve actually found some time to code on Hebcal!

Some Hebcal accomplishments in July:

  1. Moved Hebcal for Unix code from SourceForge to GitHub.
  2. Added two new user-requested features and released hebcal-3.13
  3. Added Printable PDF support for Hebcal.com
  4. Forked a copy of Hebcal for Unix especially for hebcal.com
    • Replaced sunset calculation engine with PHP’s datelib
    • Replaced timezone/daylight saving time with PHP’s datelib
    • Updated world city database from geonames.org to include 300+ new cities
  5. Updated the USA ZIP code database

Ariella has been incredibly supportive. I’ve done about half of the work at home, and half of it at Hacker Dojo.

On paternity leave, hope to write some code

I’m taking 3 months off to hang out with the family! Baby Emma is doing great, and the big kids are, well, big. It’s been about 7 years since I’ve had a sabbatical from work, so this seems like a great time to do it.

One of my first projects was to move hebcal for Unix from SourceForge to GitHub.

Hebcal for Unix has been around for 20+ years. Danny Sadinoff wrote 98% of the code, and Michael has been fixing bugs and adding features here and there.

SourceForge had been providing hosting for the GPL code for 14+ years. We even converted from CVS to Mercurial about 3 years ago. However, with the recent changes to SourceForge code hosting, Hebcal got stuck in some sort of limbo-land. Lots of 500 Internal Server Errors.

So… we’ve decided to join the cool kids and make the transition from hg to git. And while making that transition we’ve also moved to GitHub, which is where all of the open source developers are hanging out these days.

Over the coming month we’ll be cleaning up the code and the hebcal.com website, removing references to the old sourceforge.net URL.

And then we’ll get back to fixing bugs and adding new features.

A letter to Emma Margalit on the day of her Simchat Bat

Dearest New Little One,

We are so very happy to greet you, meet you, and welcome you into our immediate family, our big family of friends and neighbors and loved ones, our community, and our big world.

We have chosen for you the name Emma Margalit, named after two phenomenal women in your family tree. Your great-grandmother Emma passed away 7 months ago, on the night of Kol Nidrei. She never knew you, or even about you, but she would have loved you. Your great-uncle Lewis will tell you a bit more about her in just a minute, and we have much more to tell you about her through your whole life but there are just a few things we would like to tell you right now.

Emma was a devoted mother who raised three sons, and took responsibility for teaching her children and grandchildren proper manners, grammar, nutrition, and posture. She was an amazing cook who could turn vegetable scraps into coveted soup, and was a gracious hostess as she served it. She was a great believer in hand-written thank-you notes and letters, and her slashy handwritten correspondance earned her great admiration in certain small corners.

She made sure that we all took care with our “who”s and our “whom”s and thought that an elevated vocabulary was a sign of a well-educated person. On daily walks, she took care to pick up coins from the ground that others had dropped, over a period of years eventually assembling a kitty of over a hundred dollars. That single habit says volumes about her dedication, sense of propriety, humility, and patience. We hope that you can emulate some of those great qualities.

She lived for 91 years and is greatly missed. Her, and your Hebrew name is Nechama, which means comfort, and it is the words used to describe the way we comfort mourners after a death. We hope that you will be some comfort after her death for those who loved her. More importantly, by naming you for Great-Grandma Emma, we hope you carry on some of her graciousness, her intelligence, her beauty, and her modesty.

Your middle name, Margalit, means “pearl” in Hebrew. We have chosen the name for several reasons, but one of them is that the name Margalit reminds us of Grandma Emma’s dear cousin Martha Benenson, who was like a big sister to Emma. As children, they grew up together after Emma’s parents took in Martha and her brothers. As an adult, Martha lived on her own in Washington, D.C. She lived quietly and modestly, never moving out to California despite Emma’s requests. Your mother remembers her annual visits to Studio City, sitting in the winter sun at the breakfast table and doing crosswords puzzles.

By naming you for both Emma and Martha, we not only remember two extraordinary women, but we also pay tribute to the loyalty and devotion of friendship and family. Their relationship was special in both of their lives, and we wish for you the same quality of intimacy and loyalty throughout your life.

The name Margalit, while literally meaning a pearl, can also be used to talk about the Torah, and Torah learning more generally. Great words of wisdom and learning can be called a Margalit, and we wish you a life filled with Torah learning. Pearls are a jewel of great beauty, and your date of birth is on the day of the Omer corresponding to Yesod shebe’Tiferet, the foundation of beauty. It is a miracle in the world that the foundation of beauty of a pearl is the oyster, not typically thought of as a paragon of beauty. We wish for you a life that recognizes that beauty can be found in all places, from all people, and from all walks of life.

On the Gregorian calendar, were born on April 15, which happens to be World Art Day in memory of Leonardo da Vinci’s birthday. Most Americans remember April 15 as Tax Day. Since your father now works at Intuit, he has joked for several months about naming you TurboTax Radwin if you happened to be born on the 15th. Luckily, you have a mother, and you’re welcome. You also share a birthday with the State of Israel; you were born on the 5th day of the Hebrew month of Iyyar — Yom Ha’Atmaut, Israel Independence Day.

But also, sadly, the day you were born will likely be remembered for acts of violence that happened during the Boston Marathon. You are not born into the same world as your great-grandmother Emma. Things feel more fragile, scary, and violent. It’s not to say that it’s a worse world, but the pressures of trying to make it better more urgently fall upon all of us, and upon you. We wish you a better world, and we will work with you as your parents to help you make it better in your own way.

Which brings us to our first charge. There is a line in the prayer for peace that we say on Shabbat: “We have not come into being to hate or to destroy. We have come into being to praise, to labor, and to love.” Little Emma, regardless of the hatred or violence in world at large, you must find a way to fill your life with praise, labor, and love. May you find opportunities to praise God in everyday things like washing your hands or eating foods, and recount the oneness of God in the recitation of the Sh’ma every night before you go to bed. Find fulfillment in your labor – work using your hands, your heart, and your head. And be sure to love. Love your family, your friends. Love deeply and passionately, even if it means your heart may be broken. It will heal. You have not come into being to hate or to destroy. You have come into being to praise, to labor, and to love.

You come into our family as the fourth of four very extraordinary children, each unique in their own many ways, each strong and determined already. Unlike the way that your older siblings may have molded the family around them, you more than they, inherit a family that is already somewhat set in its ways, a family that has already long ago given up on having children sit down while they eat dinner, but that has not yet given up on catching a Leprechaun. You will of course have the chance to make your own mark on our family and our rhythms. There are more conversations to enter and there are more conversationalists with whom to share the microphone. We welcome you to the big party.

Our final charge for you comes from the the naming ceremony. We recite “K’Shem she-nichnasah la-brit, Ken tikanes le-Torah, u-le-chuppah, u-le-ma’asim tovim”. This is usually translated as the congregation expressing its wishes that the newborn child will grow up to study Torah, be married under the chuppah (wedding canopy), and perform good deeds. Our good friend and teacher Dr. Rabbi Aryeh Cohen translates these a little differently: a life filled with Torah study, meaningful relationships, and and acts of justice and kindness. Regardless of whether you choose to study the sciences or humanities or Torah, we tell you today that learning is something you do not just when you’re in school, but something you do during your whole lifetime. And when it comes to “good deeds” – doing the mitzvot is just the beginning. To truly be a mensch, you’ll need to pursue justice and kindness, especially for those less fortunate than you.

We have so much more we want to tell you and teach you. But you’re not yet even one week old, so perhaps this is enough for now.

Our hearts are filled with joy and gratitude. Brucha habah’ah, welcome to the world, little Emma Margalit.
Love, Ema and Abba

Chametz Contract/Bill of Sale for Pesach

BILL OF SALE

We, Ariella and Michael Radwin (“SELLER”), in consideration of $1,500 (one thousand five hundred US Dollars), do hereby sell, transfer and convey to John Doe (“BUYER”), our not-Kosher-for-Passover food (“CHAMETZ”) and Sascha the cat (“CAT”).

Chametz are leavened foods that are forbidden on the Jewish holiday of Passover. According to Jewish law, Jews may not own, eat or benefit from chametz during Passover.

For the purposes of this contract, CHAMETZ consists of the following:

(1) All boxes of canned and dried foods on shelves in the garage, and in the pantry to the right of the refrigerator in the kitchen

(2) Any refrigerated/frozen food items placed in the “Chametz” brown paper bag in the refrigerator in the garage

(3) Several bottles of wine, beer and liquor in the liquor cabinet above the fridge

(4) The CAT’s food, which is kept in the bathroom adjacent to the guest bedroom

(5) Any other Chametz possessed by SELLER, knowingly or unknowingly as defined by the Torah and Rabbinic Law

CAT is a domestic short-hair, commonly known as Sascha. CAT is about 12 1/2 (twelve and one half) years old. Here is a digital photo of said CAT:

http://www.radwin.org/michael/blog/archives/sascha-the-cat.jpg

I, the undersigned BUYER, acknowledge receipt of this Bill of Sale. BUYER does hereby provide SELLER a down payment of $1 (one US Dollar) in the form of PayPal, cash, check, or money order. If BUYER does not provide the remaining $1,499 by Saturday April 2, 2013 at 8:20pm, this Bill of Sale is cancelled and SELLER will refund whatever payment(s) BUYER has made.

BUYER will also lease all places wherein the CHAMETZ owned by SELLER may be found, particularly at the address/es listed below, and elsewhere.

Per BUYER’s instructions, SELLER (or SELLER’s agent) will agree to feed said CAT one half-cup of cat food daily. SELLER also change CAT’s litter box and play with her.

All said above is well and good and is binding and is in conformity with all Torah, Rabbinic and Civil laws.

Dated this 24th day of March, 2013.

 

SELLER’s name: Ariella & Michael Radwin

SELLER’s address: …, Palo Alto, CA

SELLER’s signature:

 

BUYER’s name: John Doe

BUYER’s signature:

gzip encoding (mod_deflate) on DreamHost

It took me all of 10 minutes, and I just sped up Hebcal.com by enabling Apache mod_deflate on DreamHost.

I used the Google PageSpeed Insights tool to measure the performance of Hebcal, and it complained that we weren’t gzip-compressing HTML, CSS, or JavaScript.

Turns out this is not enabled on DreamHost sites by default. What a surprise!

So here’s what I ended up adding to the .htaccess file:

<IfModule mod_deflate.c>
 AddOutputFilterByType DEFLATE text/html text/css text/javascript
</IfModule>

That’s it!