Mobile Network Operator (MNO) Credit Scoring and Automated Underwriting
Is it possible to use MNO data on its own to predict customer behavior, credit risk, probability of default, fraud and other attributes? Is MNO data sufficient to do this reliably and accurately?
The following case study shows that MNO data is both predictive and can be used in isolation in markets where other data sources, such traditional credit bureau and related banking data, are limited or absent. The results show a level of accuracy and stability, sufficient to support profitable, mobile financial services. The project also shows the resource and time required using Zoral Platform components.
- collect and analyze predictive MNO data, (3 primary data sources GSM, Network transaction and Airtime credit, 12 months data)
- cleanse, transform, enhance and integrate predictive, MNO data into Zoral Analytical Data Workbench
- load to Zoral Analytical Data Warehouse
- combine and tune with existing Zoral Model Library credit risk and other models
- combine with models and data from comparable emerging markets and use a proxy
- blend scores from multiple Zoral ML credit models to create optimized credit/risk, limits management, and fraud scores
- implement using Zoral Decision Engine and Zoral Platform
- benchmark against equivalent industry performance
- execute MFS lending decisioning in pilot mode
- review MFS pilot lending results
- model and test Minimum Viable Product (MVP) based on comparable MFS lending pilot experience
The main goal was to create and benchmark a pilot, automated, consumer lending platform to integrate with the client’s Mobile Financial Services (MFS) architecture, including,
- intelligent, MFS customer pre-selection and credit scoring based on the MNO customer base
- enable MFS lending, based on MVP (minimum viable product), for a subset of pre-selected and scored MNO customers
- machine learning and subsequent MFS model tuning based results
- pilot implementation roadmap and MFS platform
- demonstrate the credit performance and viability of the pilot platform.
The diagram below shows a general approach to tuning the Zoral ML models selected:
Tuning process - MFS models for new digital lending products
Minimum Viable Product (MVP) methodology
- the data was uploaded to Zoral’s Big Data, Analytics Environment
- after initial data discovery, the data was compared to the Zoral ML MFS credit data model templates and the data quality measured using Zoral 4I, (Invalid, Incomplete, Inconsistent, Inaccurate), data quality methodology and toolset
- following data discovery, the data was cleansed and enriched. This step was required for MNO data segments found to have sub-optimal data quality. Additional data domains were added to the raw MNO data to fill the predictive data gaps discovered
- a range of Zoral’s existing emerging market models were used, along with standard tools, (Zoral DE, Zoral BDW, Zoral ML and Analytical data workbench (ADW)
- data discovery, data quality, data enrichment and Zoral data model template comparison were conducted iteratively in order to achieve desired predictive data quality
- business metrics and target variables for the MFS were defined, e.g. default rate, fraud rate, customer retention, LTV, customer limits.
- as the client digital lending product was not already in production, the Zoral team used a number of emerging market proxy targets to measure accurately and estimate the underwriting and portfolio profitability results.
Zoral software and tools used
- Zoral ML MFS models and templates
- existing, emerging markets Zoral ML knowledge base
- feature engineering - Zoral ML machine learning
- Jupyter, Rstudio - data science environment
- Spark - Big data engine for large-scale processing
- Drill - SQL engine for big data
- ElasticSearch, Kibana analytics, data visualization
- Zoral Decision Engine
- Zoral Loan management system
- Zoral Analytical Data Workbench
A comprehensive set of data domains was provided by the client and loaded into Zoral ADW. Data extraction, cleansing, integration, fuzzy matching, enrichment, feature engineering and Zoral ML model tuning were performed by the Zoral MFS Data Science and Risk Management team, based on an iterative MVP methodology as shown below,
Zoral iterative MVP methodology
Data sharing considerations included,
- legislative - certain data needed to be obfuscated
- accessibility - certain data was not available for the MVP
- time limitations - MVP project timescales limited the time to extract and transfer all predictive, MNO data. MVP deployment was successfully accomplished utilizing only the most predictive MNO data.
A range of data domains were used for decisioning, robo-underwriting workflow development and initial, Zoral ML MFS model tuning. Data volumes included,
- 39m prepaid GSM subscribers with 1 year of monthly and daily aggregates
- 750,000 prepaid GSM subscribers with 6 months monthly and daily aggregates;
- 1.5m Mobile Wallet customers with 6 months detailed transactions and daily aggregates (1m active customers with 15m transactions per month).
- 4 months’ credit payment data.
Data discovery and preprocessing
During predictive data discovery, raw data was uploaded into Zoral’s analytics environment. Numerous, Zoral AI/ML tools were used, including, enrichment/data transformation queries, data integration scripts and advanced data visualization techniques etc. These were applied to understand the data, detect data issues and support the feature engineering and Zoral ML MFS model tuning process. Additional data sets were uploaded to assist with data queries/issues. A range of Zoral data discovery and data mining tools and methodologies were used to understand MNO MFS business processes, calculate MFS business KPIs and discover data additional fields that could be used as inputs for the Zoral ML MFS models tuning phase.
The feature engineering process was started after data discovery, data quality assessment and data cleansing. Features were aggregated to customer level using numerical features, categorical features, timestamps, strings, etc. The model tuning phase was started once clean, complete data sets were obtained with predictive features and target variables.
Zoral ML MFS Model tuning
Prior to MVP implementation, the client’s MFS platform had not been used for lending products. In absence of actual data on MNO MFS loan performance, Zoral used the following approach,
- used existing Zoral ML MFS models from similar markets
- model tuning using “proxy” targets
- compared output, retuned, against proxy targets
- produced initial MVP output results measured against proxy targets
- compared, tuned MVP MFS models output to MFS models output statistics in similar markets.
Three primary Zoral ML models were tuned and transferred using available MNO data:
- Airtime credit (bills delinquency) model - predicting bill payment delinquency, (a “proxy” target,) for postpaid subscribers based on their MNO data usage - tuned and tested on MNO data.
- GSM Usage model - predicting customer default events for a live loan product based on subscriber MNO data.
- Transaction model - predicting customer default event for a live loan product based on mobile wallet financial transactions.
Models 2 and 3 were tuned, tested and compared to a number of Zoral ML emerging market models and data. Features were mapped to MNO’s datasets and Zoral ML models were transferred and adapted from similar markets.
After the models were tuned, they were blended into a single credit score to obtain a more robust and stable prediction of subscriber’s probability of default for digital, microloan products.
This Zoral ML credit score achieved good predictive result, sufficient to be taken to the next stage for initial pilot use and tuning.
The credit default prediction model was tuned using client and other emerging market data.
The diagram below shows the results of the tuned Zoral ML model. This was predictive for credit default events for all three geographies, thus providing a cross geography credit scorecard.
Transferring MFS default prediction models from similar emerging markets to target MNO
A further round of model retuning was conducted in order to obtain an optimal, blended model, leveraging data from multiple similar geographies.
Such an approach made it possible to have viable underwriting platform to support a pilot micro-lending product within a short period of time. The Zoral ML MFS models demonstrated good predictive power, automatically rejecting high-default and high fraud segment loans.
Blended score vs. customer monthly revenue
Tuned Zoral ML MFS credit model performance - area under the ROC curve (AUC) metrics
- airtime credit model: 0.91 AUC
- GSM model: 0.82 AUC
- transaction model: 0.78 AUC.
(0.5 AUC = bad (or random coin flip) prediction; 0.6 AUC = acceptable performance; 0.7 = good performance; 0.8 AUC and higher = excellent performance)
The pilot results showed, that the airtime credit model performance was strong and stable. GSM and Transaction model had a slightly lower performance, as these models were transferred from other geographies and not all features were mapped to MNO data, (NB some features were missing from the initial MNO datasets provided, so were excluded from the models).
The Airtime model was tuned with a “proxy” target. The two other models were ensembled and produced a stable, robust model for default event prediction for the MVP phase.
After applying each model to MNO customer data, Zoral divided the MNO customers into three risk bins: High Risk, Medium Risk and Low Risk.
The graphs show performance of each model for the airtime credit “proxy” target.
The three models were blended into a single, combined credit score to provide robust default prediction, credit limit estimation and customer segmentation capabilities.
The blended score was compared with various business KPIs (e.g. revenue split) to conduct a “sanity check” of the customer default prediction score. An example of such a check is described below.
Customers were segmented by blended score and divided into 10 equal parts (bins). The 1st bin represents customer with the lowest score, i.e. high risk, high default level customers. The 10th bin represents customer with highest score, i.e. low risk customers. Diagram 3 shows correlation of customer risk level with customer’s monthly revenue – the higher score, the higher monthly revenue.
If we drill down revenue structure we can see the combined score model still produces a good, predictive curve of score distribution vs. services revenue.
As well as credit default, the following models were also tuned using the MNO data,
- credit limit - based on blended credit score
- fraud prevention - MVP tuned fraud detection, included fraud black list checks and anomaly detection model based on customer transaction data.
The diagram below shows the result of the transaction anomaly detection model. Each dot on the graph below represents the customer’s aggregated mobile wallet usage, the distance between dots shows customer transaction behavior similarity, i.e. similar customers are shown close to each other on the plot. Red dots show clusters of anomalous customer behavior that can be defined as fraud suspicious.
Mobile wallet transactions anomaly clusters
Decisioning Workflow implementation
Sample Digital Consumer Lending MVP workflow, Zoral DE
Following the data absorption and model tuning, Zoral implemented the combined Zoral DE and analytics toolset as a running, MVP environment. This provided a scalable solution architecture that would allow rapid implementation and ease of customization for full scale production.
Zoral also worked with the MNO team to help generate MFS product definitions and functional system requirements.
Technical assistance was provided to help define, agree and implemented the integration communication protocols (APIs) between the MFS solution components.
Zoral DE was an essential part of the MFS solution. Using Zoral DE, MFS decisioning workflows were rapidly created, tuned and credit model and credit decision outputs integrated. A pilot MFS loan product decisioning workflow was developed as part of MVP. The three tuned models, blended score, fraud list, credit limit and business rules were implemented as part of the Zoral DE workflow. The diagram below shows a sample decision workflow with Zoral ML MFS models utilized in the Zoral DE.
Prior to MVP launch, Zoral produced back-end prototypes and developed a demonstration App of the MVP from the customer point of view (front-end). Given the low, smart phone penetration in the market, the requirement was to demonstrate this using USSD. The diagram below shows the illustrative customer journey for MVP using USSD capabilities.
MVP Customer journey, loan acceptance
Brief summary of the mobile application
On Screen 1, a customer enters the USSD code of a loan product (the digital loan product can be advertised via SMS/MNO website/multi-channel marketing campaigns/etc.) and requests information about his potential loan.
The customer is asked to enter his national insurance number (or equivalent) at the time of requesting a loan.
The Zoral Platform, (Zoral DE plus the models described above), then determines whether a customer is eligible for a loan and calculates the optimal loan amount, based on a set of customer prequalified loans, (scored in batch mode), using customer MSISDN (mobile account number or other reference). On Screen 2, the customer is shown the output of the decisioning system with the eligible loan amount. A customer can accept this amount, enter a lower or a higher amount. If a customer enters a higher amount, the customer’s loan limit will be checked and a loan issued, provided the requested amount is lower than the maximum, dynamically calculated limit.
On Screen 3, the customer enters their MPIN to indicate their consent. The appropriate error messages is shown for any un-authorized action or unfulfilled user request.
If the customer accepts the loan terms and conditions, the loan amount is disbursed to the customer mobile account using the Zoral Platform. A corresponding record is created in the Zoral LM including customer details, loan details, loan schedule, decisioning data, verification data, credit scores, model scores, limits utilization, etc.
A prequalified customer/loan sample, processed in batch mode, was used for the MVP. Real-time loan decisioning was implemented using the Zoral Platform. In addition to real-time functionality, further integration with the mobile/web/chat application was built to enable the MFS digital loan product to be provided across all MNO customer’s portfolio and channels.
The system was integrated with smartphones, feature phones, chat, USSD and web applications.
- Zoral Platform and analytical toolset assembled to capture and enrich required MNO datasets
- data quality mechanism applied and MNO data issues discovered and fixed
- the pilot demonstrated that the MNO had large variety of data that could support an automated lending platform and be used for the creation of new products, once cleaned and prepared
- a production ready default prediction blended score was built based on three high performance models (AUC 0.78-0.91) tuned, tested and transferred using different, similar emerging markets geographies. Score results were good and sufficient for MVP deployment
- credit limit model and fraud prevention workflow were built and deployed using Zoral DE
- blended credit score was shown to be good, production strength and have a high correlation with a range of MNO KPIs, e.g. customer monthly revenue distribution
- production strength credit decisioning workflow was built based on tuned Zoral ML models and MNO MFS architecture and data
- digital consumer lending MVP successfully deployed, using tuned Zoral ML MFS models and Zoral DE based credit decision workflow on MNO MFS platform
- once data was made available, the above tasks were completed by the Zoral team in 8 weeks.
Additional recommendations made
- increasing predictive data capture and range of data, (e.g. enriching data with new sources such as social network data, additional behavioral data, etc.)
- increasing the range of fraud models
- improving collections infrastructure to optimize payments from non-performing loans.