Project Description

Machine Learning – Amazon 


Elastic Map Reduce (EMR) is an AWS tool for big data processing and analysis. Map Reduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. AWS Key Management Service (KMS) is an AWS product that allows administrators to create, delete and control keys that encrypt data stored in AWS databases and products.

Machine Learning Amazon


Our client is the largest provider of specialty insurance and loan administration services to the residential and commercial mortgage industries in the United States. They offer a team of independent, licensed insurance adjusters with expertise and tech enabled tools to tackle a wide range of insurance analysis, claims processing and related services.


Our client’s customers typically purchase portfolios of distressed mortgages and contract with our client to process the claims during the sales process. Their historical data provides the raw material needed to predict damage, claims, and recovery. Complexity in the data required a machine learning algorithm to understand the collection of property attributes better.

Centizen Solution

Initially, Centizen team planned to anonymize the data from the client’s operational systems that is deposited in S3. We used Amazon Key Management Service (KMS) along with security policies to guard against unwanted data access. Our team also used Elastic Map Reduce (EMR) to scrub the data and identify the features in it. To create training data for the machine learning process, the features related to the property, geography and peril are to be extracted and transformed.

Centizen team secondly concentrated on machine learning to reduce the complexity in the data required. The Amazon Machine Learning service is used to create nine models which predict the probability of a claim, recovery and peril type. The models are adjusted to maximize prediction accuracy across a portfolio while evenly splitting false positive and false negative prediction errors.

Portfolio Scoring

Finally, our team took portfolio scoring into consideration in order to predict the claims associated with a portfolio of mortgages, the data file is dropped into an S3 bucket. EMR is used to scrub the data, identify features and append geography and peril attributes. The prediction models are applied to predict claims and recovery for the properties contained in the portfolio. The results are aggregated using EMR to provide a final report for the client investor.

Outcomes/ Business Values

  • Over 90% accurate in predicting existing damage and claims on a portfolio of loans.
  • Helps our client to differentiate our service and make them feel satisfied with our service.
  • Predicted claim performance allows investors to incorporate damage risk and recovery into their disposition decisions.

We can Help.

Contact Us


Cloud Computing
Analytics And Visualization
Internet of Things
AI & Machine Learning
Digital Transformation