08 October 2009

Geeks vs Suits

There Are No Villains in Financial Crises | Megan McArdle:

Who led us into the financial crisis, and why? Zubin Jelveh writes up some intriguing findings calling into question the notion that securitization was at the heart of the financial crisis:
Instead of a smooth curve, at certain FICO scores there are big jumps in the number of people with mortgages.

The reason? Rules of thumb observed by those in the mortgage industry for judging the chances a borrower will default. In the 1990's, Fannie and Freddie released research showing that about 50% of defaults are associated with borrowers who have FICO scores below 620. That happens to be where the biggest jump in the graph above takes place, suggesting that the industry looks far more kindly on a borrower with a score of 621 than a borrower with a nearly identical score of 619. [...]
The FICO score revolution was valuable, but we took it too far.
I think I can offer an explanation for this threshold which is a little more nuanced than the cheeky "sloppy bankers defaulted to simplistic rules-of-thumb they plucked out of the air." A lot of the work done with FICO scores uses a type of machine learning algorithm called a Decision Tree.* (See, I dunno, Quinlan '86 for the deep background.) The way decision trees work is to analyze which of the various features being used to make a decision have the greatest information gain (or better yet, information gain ratio). All instances are then divided into the set with that feature value below a certain threshold, and those above it. The algorithm then recurses into each of those subsets and continues. (See the Wikipedia article on Quinlan's C4.5 algorithm for a fuller explanation.) These systems have to create specific thresholds on specific features, in this case FICO == 620, in order to function. It's not a "rule of thumb" or something else informal: it's a requirement of the learning system used, which is surprisingly good for all its simplicity.

There are, however, better classifiers for credit scoring. Unfortunately they're almost never used. They're either too complex** or incorporate too much stochasticity for the bosses to be happy, or they are black-boxish, which makes the bosses and their lawyers throw fits. As I understand it, if you turn someone down for a loan in America, you have a duty to explain what factors contributed to their denial. If you use something like an artificial neural network to make your decision this is almost impossible. You can't tell someone "we fed your data into a multilayer feedforward network, and our version of the Cascade-Correlation Algorithm has decided you're a bad risk" even if Cascade-Correlation is really damn good at deciding who's a bad risk.*** The thresholds used in decision trees, like FICO == 620, divides the feature space with a series of orthogonal hyperplanes, creating an easy to understand decision boundary. People like this easy to understand boundary. It's easy to say, "Oh, if you were on this side of this line, you would have qualified." You can't say that with the more precise but also more fuzzy, high dimensional, decision boundaries of things like neural networks.

What we're up against is another instance of Arnold Kling's geeks-vs-suits conflict. Maybe the suits are right and it's of more value to society to be able to tell a borrower why they're turned down than it is to do efficient credit scoring. But when the geeks have great methods, and they're held back by the suits, it's not really fair to lay blame on the methods the geeks are allowed to implement. In my interpretation, the FICO score revolution was valuable, but we didn't take it far enough. We half-assed it. We tied one computational arm behind our backs, but convinced ourselves we were ready to take on all comers.

* Complex in both the colloquial and scientific meanings.
** Well it did five years ago, and not having heard anything to the contrary, I'm assuming it still does.
*** C-C is just an example. I've never tried it in particular to create an ANN for credit scoring.

No comments:

Post a Comment