A custom risk model was built using historical Prosper data to assess the risk of Prosper borrower listings. The output of the model is a Prosper score which is used in conjunction with a credit reporting agency score to estimate expected loss rates on Prosper borrower listings. The base Prosper score was built specifically on the Prosper population, so it incorporates behavior that is unique and inherent to this population. In contrast, the credit score obtained from a credit reporting agency is based on a much broader population, of which Prosper borrowers are just a small subset. As such, the credit reporting agency score should, and does, rank order risk on the Prosper population, but is not as discriminating as a custom score. Prosper uses both the custom score and the credit reporting agency score together to assess the borrower's level of risk and determine estimated loss rates, which is more powerful than using just one score. The loss estimates are based on the historical performance of Prosper loans to borrowers with similar characteristics. They are not a guarantee and actual performance may differ from expected performance.
A logistic regression model was built to predict the probability of a loan going "bad," where "bad" is the probability of going 61+ days past due. All loans booked from April, 2007 through June, 2007 were used to build the model, with the performance measured through December, 2008. The score was then validated using all loans booked from July, 2007 through September, 2007 with the performance measured through December, 2008. The output of the model to Prosper users is a Prosper score, which ranges from 1 to 10, with 10 being the best, or lowest risk score. The worst, or highest risk score is a 1.
All potential variables available at the time of listing, including those from the identification authorization process, the credit report and listing details provided by the borrower were analyzed for potential inclusion in the final model. For example, variables such as authorization score (used during identity verification), income, debt-to-income ratio, total revolving balance and delinquencies were reviewed. Transformations such as log and square root and ratios were performed on most of the variables during the development process. Several iterations of stepwise linear regression were used to select significant variables from the pool of customer bureau variables and listing characteristics. Variables were dropped or kept in the final model based on their significance and interaction with other variables. Many model iterations were completed and analyzed in order to determine the final model.
Key variables in the model are:
The model was validated on loans booked from July, 2007 through March, 2008 to ensure that it ranks risk in this more recent population.
The "enhanced" Prosper score estimates the probability of a borrower loan going "bad," where "bad" is the probability of going more than 60 days past due. The output to Prosper users is a score which ranges from 1 to 5, with 5 being the best, or lowest risk score and 1 being the worst or highest risk score. The new score is similar to the existing Prosper score, which is now called the "base" Prosper score. The enhanced score was built on a more recent population and thus reflects more recent trends, including the challenging economic environment. Both scores, along with the credit reporting agency score, should be used to make lending decisions.
Loans booked from April, 2008 through July, 2008 were used to build the discrete additive scorecard, with the performance measured for the following 12 months. The scorecard was verified and results validated on two independent samples of loans, booked from August, 2008 through September, 2008 and from April, 2007 through June, 2007, with the performance measured for the following 12 months. Variables available at the time of listing, including those from the identification authorization process, the credit reporting agency and listing details provided by the borrower were analyzed for potential inclusion in the final model. For example, variables such as authorization score (used during identity verification), income, debt-to-income ratio, total revolving balance and delinquencies were reviewed. Transformations to refine the variables were performed during the development process. Variables were dropped or kept in the final scorecard based on their contribution and stability over time. Many scorecard iterations were completed and analyzed in order to determine the final scorecard.
The score is calculated by adding weights assigned to ranges of categorical variables for the predictors included in the scorecard. Variables in the score include:
The raw score represents a rank order of the likelihood of a Prosper borrower loan with similar characteristics becoming more than 60 days past due. This score is then transformed by mapping it into a probability of bad. The probability of bad is grouped into ranges based on quintile distributions of the development data and those ranges are mapped to values of 1-5. This score is displayed on each borrower listing. The enhanced Prosper score ranges from 1 to 5, with 5 being the best, or lowest risk value. The probability of bad ranges for the enhanced score are as follows:
| Prob(bad) Range | Enhanced Score |
| < .0738 | 5 |
| .0738 < .1100 | 4 |
| .1100 < .1552 | 3 |
| .1552 < .2232 | 2 |
| .2232+ | 1 |
The estimated loss rates, assuming average balances are the same for goods and bads, using Prosper Rating and the enhanced Prosper score are:
| Enhanced Prosper Score | ||||||
| 5 | 4 | 3 | 2 | 1 | Total | |
| AA | 1.4% | 2.1% | 3.7% | 0.0% | 0.0% | 1.6% |
| A | 2.2% | 4.0% | 10.4% | 13.4% | 8.5% | 4.4% |
| B | 2.5% | 5.6% | 2.5% | 14.2% | 42.5% | 4.3% |
| C | 6.4% | 9.2% | 11.6% | 12.2% | 26.2% | 9.5% |
| D | 7.5% | 8.7% | 12.2% | 17.0% | 15.3% | 10.5% |
| E | 8.3% | 14.3% | 7.5% | 11.3% | 20.0% | 11.2% |
| HR | 14.0% | 16.2% | 19.6% | 24.1% | 28.3% | 21.0% |
| Total | 6.1% | 10.9% | 15.1% | 20.3% | 26.5% | 13.4% |