Mobile Network Operator (MNO) Credit Scoring and Automated Underwriting

April 13, 2020

Results

MNO emerging markets project using GSM, Airtime Credit and transaction data
High predictive results for mobile lending functionality (AUC 0.78-0.91)
Project duration, 8 weeks from receipt of data

Is it possible to use MNO data on its own to predict customer behavior, credit risk, probability of default, fraud and other attributes? Is MNO data sufficient to do this reliably and accurately?

Yes.

The following case study shows that MNO data is both predictive and can be used in isolation in markets where other data sources, such traditional credit bureau and related banking data, are limited or absent. The results show a level of accuracy and stability, sufficient to support profitable, mobile financial services. The project also shows the resource and time required using Zoral Platform components.

Project summary

collect and analyze predictive MNO data, (3 primary data sources GSM, Network transaction and Airtime credit, 12 months data)
cleanse, transform, enhance and integrate predictive, MNO data into Zoral Analytical Data Workbench
load to Zoral Analytical Data Warehouse
combine and tune with existing Zoral Model Library credit risk and other models
combine with models and data from comparable emerging markets and use a proxy
blend scores from multiple Zoral ML credit models to create optimized credit/risk, limits management, and fraud scores
implement using Zoral Decision Engine and Zoral Platform
benchmark against equivalent industry performance
execute MFS lending decisioning in pilot mode
review MFS pilot lending results
model and test Minimum Viable Product (MVP) based on comparable MFS lending pilot experience

Project goals

The main goal was to create and benchmark a pilot, automated, consumer lending platform to integrate with the client’s Mobile Financial Services (MFS) architecture, including,

intelligent, MFS customer pre-selection and credit scoring based on the MNO customer base
enable MFS lending, based on MVP (minimum viable product), for a subset of pre-selected and scored MNO customers
machine learning and subsequent MFS model tuning based results
pilot implementation roadmap and MFS platform
demonstrate the credit performance and viability of the pilot platform.

Methodology

The diagram below shows a general approach to tuning the Zoral ML models selected:

Tuning process - MFS models for new digital lending products

Minimum Viable Product (MVP) methodology

the data was uploaded to Zoral’s Big Data, Analytics Environment
after initial data discovery, the data was compared to the Zoral ML MFS credit data model templates and the data quality measured using Zoral 4I, (Invalid, Incomplete, Inconsistent, Inaccurate), data quality methodology and toolset
following data discovery, the data was cleansed and enriched. This step was required for MNO data segments found to have sub-optimal data quality. Additional data domains were added to the raw MNO data to fill the predictive data gaps discovered
a range of Zoral’s existing emerging market models were used, along with standard tools, (Zoral DE, Zoral BDW, Zoral ML and Analytical data workbench (ADW)
data discovery, data quality, data enrichment and Zoral data model template comparison were conducted iteratively in order to achieve desired predictive data quality
business metrics and target variables for the MFS were defined, e.g. default rate, fraud rate, customer retention, LTV, customer limits.
as the client digital lending product was not already in production, the Zoral team used a number of emerging market proxy targets to measure accurately and estimate the underwriting and portfolio profitability results.

Zoral software and tools used

Zoral ML MFS models and templates
existing, emerging markets Zoral ML knowledge base
feature engineering - Zoral ML machine learning
Jupyter, Rstudio - data science environment
Spark - Big data engine for large-scale processing
Drill - SQL engine for big data
ElasticSearch, Kibana analytics, data visualization
Zoral Decision Engine
Zoral Loan management system
Zoral Analytical Data Workbench

Data transfer

A comprehensive set of data domains was provided by the client and loaded into Zoral ADW. Data extraction, cleansing, integration, fuzzy matching, enrichment, feature engineering and Zoral ML model tuning were performed by the Zoral MFS Data Science and Risk Management team, based on an iterative MVP methodology as shown below,

Zoral iterative MVP methodology

Data sharing

Data sharing considerations included,

legislative - certain data needed to be obfuscated
accessibility - certain data was not available for the MVP
time limitations - MVP project timescales limited the time to extract and transfer all predictive, MNO data. MVP deployment was successfully accomplished utilizing only the most predictive MNO data.

A range of data domains were used for decisioning, robo-underwriting workflow development and initial, Zoral ML MFS model tuning. Data volumes included,

39m prepaid GSM subscribers with 1 year of monthly and daily aggregates
750,000 prepaid GSM subscribers with 6 months monthly and daily aggregates;
1.5m Mobile Wallet customers with 6 months detailed transactions and daily aggregates (1m active customers with 15m transactions per month).
4 months’ credit payment data.

Data discovery and preprocessing

During predictive data discovery, raw data was uploaded into Zoral’s analytics environment. Numerous, Zoral AI/ML tools were used, including, enrichment/data transformation queries, data integration scripts and advanced data visualization techniques etc. These were applied to understand the data, detect data issues and support the feature engineering and Zoral ML MFS model tuning process. Additional data sets were uploaded to assist with data queries/issues. A range of Zoral data discovery and data mining tools and methodologies were used to understand MNO MFS business processes, calculate MFS business KPIs and discover data additional fields that could be used as inputs for the Zoral ML MFS models tuning phase.

The feature engineering process was started after data discovery, data quality assessment and data cleansing. Features were aggregated to customer level using numerical features, categorical features, timestamps, strings, etc. The model tuning phase was started once clean, complete data sets were obtained with predictive features and target variables.

Zoral ML MFS Model tuning

Prior to MVP implementation, the client’s MFS platform had not been used for lending products. In absence of actual data on MNO MFS loan performance, Zoral used the following approach,

used existing Zoral ML MFS models from similar markets
model tuning using “proxy” targets
compared output, retuned, against proxy targets
produced initial MVP output results measured against proxy targets
compared, tuned MVP MFS models output to MFS models output statistics in similar markets.

Three primary Zoral ML models were tuned and transferred using available MNO data:

Airtime credit (bills delinquency) model - predicting bill payment delinquency, (a “proxy” target,) for postpaid subscribers based on their MNO data usage - tuned and tested on MNO data.
GSM Usage model - predicting customer default events for a live loan product based on subscriber MNO data.
Transaction model - predicting customer default event for a live loan product based on mobile wallet financial transactions.

Models 2 and 3 were tuned, tested and compared to a number of Zoral ML emerging market models and data. Features were mapped to MNO’s datasets and Zoral ML models were transferred and adapted from similar markets.

After the models were tuned, they were blended into a single credit score to obtain a more robust and stable prediction of subscriber’s probability of default for digital, microloan products.

This Zoral ML credit score achieved good predictive result, sufficient to be taken to the next stage for initial pilot use and tuning.

The credit default prediction model was tuned using client and other emerging market data.

The diagram below shows the results of the tuned Zoral ML model. This was predictive for credit default events for all three geographies, thus providing a cross geography credit scorecard.

Cross-geography score vs initial feedback tuning, by scorebands, Emerging markets

Transferring MFS default prediction models from similar emerging markets to target MNO

A further round of model retuning was conducted in order to obtain an optimal, blended model, leveraging data from multiple similar geographies.

Such an approach made it possible to have viable underwriting platform to support a pilot micro-lending product within a short period of time. The Zoral ML MFS models demonstrated good predictive power, automatically rejecting high-default and high fraud segment loans.

Blended score vs. customer monthly revenue

Tuned Zoral ML MFS credit model performance - area under the ROC curve (AUC) metrics

airtime credit model: 0.91 AUC
GSM model: 0.82 AUC
transaction model: 0.78 AUC.

(0.5 AUC = bad (or random coin flip) prediction; 0.6 AUC = acceptable performance; 0.7 = good performance; 0.8 AUC and higher = excellent performance)

The pilot results showed, that the airtime credit model performance was strong and stable. GSM and Transaction model had a slightly lower performance, as these models were transferred from other geographies and not all features were mapped to MNO data, (NB some features were missing from the initial MNO datasets provided, so were excluded from the models).

The Airtime model was tuned with a “proxy” target. The two other models were ensembled and produced a stable, robust model for default event prediction for the MVP phase.

After applying each model to MNO customer data, Zoral divided the MNO customers into three risk bins: High Risk, Medium Risk and Low Risk.

The graphs show performance of each model for the airtime credit “proxy” target.

The three models were blended into a single, combined credit score to provide robust default prediction, credit limit estimation and customer segmentation capabilities.

The blended score was compared with various business KPIs (e.g. revenue split) to conduct a “sanity check” of the customer default prediction score. An example of such a check is described below.

Customers were segmented by blended score and divided into 10 equal parts (bins). The 1st bin represents customer with the lowest score, i.e. high risk, high default level customers. The 10th bin represents customer with highest score, i.e. low risk customers. Diagram 3 shows correlation of customer risk level with customer’s monthly revenue – the higher score, the higher monthly revenue.

If we drill down revenue structure we can see the combined score model still produces a good, predictive curve of score distribution vs. services revenue.

As well as credit default, the following models were also tuned using the MNO data,

credit limit - based on blended credit score
fraud prevention - MVP tuned fraud detection, included fraud black list checks and anomaly detection model based on customer transaction data.

The diagram below shows the result of the transaction anomaly detection model. Each dot on the graph below represents the customer’s aggregated mobile wallet usage, the distance between dots shows customer transaction behavior similarity, i.e. similar customers are shown close to each other on the plot. Red dots show clusters of anomalous customer behavior that can be defined as fraud suspicious.

Mobile wallet transactions anomaly clusters

Decisioning Workflow implementation

Sample Digital Consumer Lending MVP workflow, ZDE

Sample Digital Consumer Lending MVP workflow, Zoral DE

Following the data absorption and model tuning, Zoral implemented the combined Zoral DE and analytics toolset as a running, MVP environment. This provided a scalable solution architecture that would allow rapid implementation and ease of customization for full scale production.

Zoral also worked with the MNO team to help generate MFS product definitions and functional system requirements.

Technical assistance was provided to help define, agree and implemented the integration communication protocols (APIs) between the MFS solution components.

Zoral DE was an essential part of the MFS solution. Using Zoral DE, MFS decisioning workflows were rapidly created, tuned and credit model and credit decision outputs integrated. A pilot MFS loan product decisioning workflow was developed as part of MVP. The three tuned models, blended score, fraud list, credit limit and business rules were implemented as part of the Zoral DE workflow. The diagram below shows a sample decision workflow with Zoral ML MFS models utilized in the Zoral DE.

Prior to MVP launch, Zoral produced back-end prototypes and developed a demonstration App of the MVP from the customer point of view (front-end). Given the low, smart phone penetration in the market, the requirement was to demonstrate this using USSD. The diagram below shows the illustrative customer journey for MVP using USSD capabilities.

MVP Customer journey, loan acceptance

Brief summary of the mobile application

On Screen 1, a customer enters the USSD code of a loan product (the digital loan product can be advertised via SMS/MNO website/multi-channel marketing campaigns/etc.) and requests information about his potential loan.

The customer is asked to enter his national insurance number (or equivalent) at the time of requesting a loan.

The Zoral Platform, (Zoral DE plus the models described above), then determines whether a customer is eligible for a loan and calculates the optimal loan amount, based on a set of customer prequalified loans, (scored in batch mode), using customer MSISDN (mobile account number or other reference). On Screen 2, the customer is shown the output of the decisioning system with the eligible loan amount. A customer can accept this amount, enter a lower or a higher amount. If a customer enters a higher amount, the customer’s loan limit will be checked and a loan issued, provided the requested amount is lower than the maximum, dynamically calculated limit.

On Screen 3, the customer enters their MPIN to indicate their consent. The appropriate error messages is shown for any un-authorized action or unfulfilled user request.

If the customer accepts the loan terms and conditions, the loan amount is disbursed to the customer mobile account using the Zoral Platform. A corresponding record is created in the Zoral LM including customer details, loan details, loan schedule, decisioning data, verification data, credit scores, model scores, limits utilization, etc.

A prequalified customer/loan sample, processed in batch mode, was used for the MVP. Real-time loan decisioning was implemented using the Zoral Platform. In addition to real-time functionality, further integration with the mobile/web/chat application was built to enable the MFS digital loan product to be provided across all MNO customer’s portfolio and channels.

The system was integrated with smartphones, feature phones, chat, USSD and web applications.

Results summary

Zoral Platform and analytical toolset assembled to capture and enrich required MNO datasets
data quality mechanism applied and MNO data issues discovered and fixed
the pilot demonstrated that the MNO had large variety of data that could support an automated lending platform and be used for the creation of new products, once cleaned and prepared
a production ready default prediction blended score was built based on three high performance models (AUC 0.78-0.91) tuned, tested and transferred using different, similar emerging markets geographies. Score results were good and sufficient for MVP deployment
credit limit model and fraud prevention workflow were built and deployed using Zoral DE
blended credit score was shown to be good, production strength and have a high correlation with a range of MNO KPIs, e.g. customer monthly revenue distribution
production strength credit decisioning workflow was built based on tuned Zoral ML models and MNO MFS architecture and data
digital consumer lending MVP successfully deployed, using tuned Zoral ML MFS models and Zoral DE based credit decision workflow on MNO MFS platform
once data was made available, the above tasks were completed by the Zoral team in 8 weeks.

Additional recommendations made

increasing predictive data capture and range of data, (e.g. enriching data with new sources such as social network data, additional behavioral data, etc.)
increasing the range of fraud models
improving collections infrastructure to optimize payments from non-performing loans.