Initially, Centizen team planned to anonymize the data from the client’s operational systems that is deposited in S3. We used Amazon Key Management Service (KMS) along with security policies to guard against unwanted data access. Our team also used Elastic Map Reduce (EMR) to scrub the data and identify the features in it. To create training data for the machine learning process, the features related to the property, geography and peril are to be extracted and transformed.
Centizen team secondly concentrated on machine learning to reduce the complexity in the data required. The Amazon Machine Learning service is used to create nine models which predict the probability of a claim, recovery and peril type. The models are adjusted to maximize prediction accuracy across a portfolio while evenly splitting false positive and false negative prediction errors.
Finally, our team took portfolio scoring into consideration in order to predict the claims associated with a portfolio of mortgages, the data file is dropped into an S3 bucket. EMR is used to scrub the data, identify features and append geography and peril attributes. The prediction models are applied to predict claims and recovery for the properties contained in the portfolio. The results are aggregated using EMR to provide a final report for the client investor.