Discussion for:Broadening the data base for deepening the focus?The use of big data analytics in transaction banking – Dr. Martin DiehlDiscussant:Adrian Guerin, Bank of Canada*
*Any opinions expressed herein are those of the discussant and do not necessarily represent the views of the Bank of Canada
Introduction to Big Data Analytics and methodsUse cases in financial industryAvailable data in transaction bankingOverview of existing areas of researchPotential applications for central banks
Summary: Big Data Analytics
More data, different types, speed, accuracy, and value.Variety of sources – internet, devices, corporate systems, infrastructures.BDA: combination ofData MiningandMachine Learning Methods.Methods:Supervised learningUnsupervised learningReinforcement learning
Summary: Use Cases, Transaction Banking
Existing use casesClassification:Anomaly detection and fraud detectionDefault predictionForecasting:Market returns, risk indicatorsSentiment analysisPotential use casesPredict RTGS liquidity needs (participant or system)Forecastconsumption
Thank you: presentation highlights analytical opportunityBDA (Data Science) an evolving field, skills in short supplyApplied most notably in private sector (e.g., Google, Amazon, IBM).Proliferation of DS training (university level, certificate).Content is timelyDLT or “blockchain” technology: potential for wealth of data provided to central banks and regulatorsPossible “regulator node” with full ledger visibility (near real-time data).Example initiatives:ASX examining DLT option with DAH for CHESS replacementDTCC partnered with IBM,Axoni, R3 to build DLT solution for derivatives post-trade processing (re-platform DTCC’s Trade Information Warehouse)
Overview of primary methods very helpful as introductionLight touch on data scraping/mining and NLP, but appropriate given it may not be useful in payments systems context.Questions:Data assessment (available data)Are there opportunities fordata miningto complement transaction banking data and analyses?Can we confidently use ‘unofficial statistics’ acquired through data mining?Comment, unsupervised learning:Can be challenging to set up (e.g., k-means, how many clusters appropriate) and interpret (are there “natural” groups?).
AI methods as “black box”Comments: Agree, must (continue to) be vigilant of model risk.Can measure performance of ML algorithms (e.g., confusion matrix for supervised learning), but…May create false sense of confidence.Are test/train data representative, contain outlies? Where are the false negatives?Question/Comments: Feature selectionKey ML objective is to maximize measured model performance.Does this create a risk similar to in econometrics with “kitchen sink” models and the search for statistical significance?Do the techniques lend themselves to overreliance on such measures – is this a risk for feature selection (might we improperly leave something out)?ML methods are “smart”, but continue to rely on data inputs; feature selection and lookback period / sample data considerations are important.