Commercial Graph: A Map of Financial Relationships

I’m speaking today about Intuit’s Commercial Graph at the Strata + Hadoop World Conference. Slides: Commercial Graph: A Map of Financial Relationships (pptx format).

Abstract

Imagine the social graph where personal relationships are replaced by commercial relationships based on real financial data. Imagine the possibilities for small businesses to grow, connect, transact and prosper.

Intuit is uniquely qualified to achieve just this. We are entrusted with the collective data of 50 million consumers and small businesses. It is a unique pool of data that covers the financial spectrum – ranging from individual purchase history to business inventories.

At Intuit, we are building the Commercial Graph with the consumer and small business data from products like Mint.com, Quicken, and QuickBooks.

We take millions of user-entered, and hence unstructured, business descriptions and billions of transactions and apply Hadoop based deduplication algorithms for normalization, and machine learning for categorization. In order to better understand the graph, we compute metrics such as connected components, centrality, and commercial PageRank.

We will examine several applications of the commercial graph, including finding more customers like your best customers, optimizing your vendors, and relevant offers & recommendations to help our customers make and save money.

A deep-dive on technical architecture will discuss use of Giraph as a Hadoop based large scale graph processing platform and neo4j as a real-time graph datastore.

Annual tech refresh: move to WordPress

radwin.org got compromised recently due to some sort of server-side vulnerability. Was it a MovableType bug? Some stale version of phpBB or a vulnerability in the ancient copy of PHP4 itself? Who the heck knows. I did a slash and burn and removed all stale PHP/CGI stuff and upgraded to PHP5. Looks like I got rid of it.

As a side effect, I’m saying goodbye to MovableType and taking the leap to WordPress. Maybe that way that blog I actively authored from 2002-2006 will actually avoid bit-rotting. Certainly this version looks a helluva lot better on lots of different devices, thanks to the whole “responsive” web design movement.

There’s going to be a bunch of broken links. Oh well. It’s a good thing that we’ve got search technology for anyone who really cares to find some ancient content I wrote.