Big Data Driven Innovationand NextGeneration AI
Raj ReddyCarnegie Mellon UniversityPittsburgh, PA 15213Nov 8, 2017
AI 1.0: Use of Knowledge in Problem Solving
An Intelligent System mustLearns from ExperienceUse Vast Amount of KnowledgeTolerate Error and AmbiguityRespond in Real TimeCommunicate with Humans using Natural LanguageSearch Compensates for Lack of KnowledgePuzzlesKnowledge Compensates for Lack of SearchF=ma; E = mc2Traditional Sources of KnowledgeFormal Knowledge: Learnt in Schools and UniversitiesBooks, Manuals,Informal Knowledge: Heuristics from PeoplesHuman Encoding of KnowledgeExpert SystemsKnowledge Based SystemRule BasedSsytem
Major Breakthroughs in AI of the 20thCentury
Enabled by Brute-force, Heuristics, Human Coding of Rules and Knowledge, and Simple Machine Learning (Pattern Recognition)World Champion Chess MachineIBM Deep BlueMathematical DiscoveryProof CheckersAccident Avoiding CarCMU: No Hands Across AmericaRoboticsManufacturing AutomationDisaster Rescue RobotsSpeech Recognition SystemsDictation MachineComputer Vision and Image ProcessingMedical Image ProcessingExpert SystemsRule Based SystemsKnowledge Based Systems
AI 2.0: Extract and Use Knowledge fromNew Data Driven Knowledge Sources
Paradigm Shift in ScienceFirst 3 Paradigms: Experiment, Theory, SimulationRutherford, Bohr, Oppenheimer4thParadigm: Data Driven ScienceCreate Next Generation AI systemsData Driven AI systemsTo Solve Previously Unsolved ProblemsPreviously Unavailable Sources of DataKnowledge from Big Data:Data Driven Learning of Models and AlgorithmsKnowledge from Multiple (Cross) Media:Social Media Intelligence GatheringFrom All Language SourcesFrom All the Media: Text, Speech, Image and VideoKnowledge from Crowd Intelligence:Global Brain: from Individual Intelligence to Collective IntelligenceKnowledge from Augmented Intelligence:Human-Machine Hybrid Intelligence for Collaborative Problem SolvingKnowledge from Unmanned Autonomous Vehicles:Intelligence from Collaborating Teams of RobotsAutomatic Discovery of New KnowledgeMachine Learning using Big DataDeep Learning
Major Breakthroughs in AI in 21stCentury
Enabled by Big Data and Machine LearningLanguage TranslationGoogle Translate: Any Language to Any LanguageSpeech to Speech DialogSiri, Cortana, AlexaAutonomous VehiclesCMU, Stanford, Google, TeslaDeep Question AnsweringIBM’s WatsonRoboSoccerWorld Champion PokerCMULibratusNo Limit TexasHold’emPoker
AI 2.0 Enables COGs and GATsCognition Amplifiers and Guardian Angels
Cognition Amplifiers (COGs) and Guardian Angels (GATs) areExamples of Use of AI 2.0 Technologies. They NeedBig Data IntelligenceCross Media IntelligenceCrowd IntelligenceAugmented IntelligenceUAV Intelligence
Big Data: Drowning in Data
Sources of Big Data forAI 2.0
Personal Data:Data from PeopleHealthEducationFood SecurityEnergy SecurityWater SecurityOther: Medicine/Pharmaceuticals, Shelter/Housing, SanitationNational/Local Data: Data from Places and ThingsTransportationTelecom/Smartphone/Communication/WiFiBankingEntertainmentShoppingGlobal Data– Data from Unplanned Events (Black Swans)Earthquakes/TsunamiTyphoons/Hurricanes/CyclonesFireFlooding
What to Collect?Data Relevance: Big Data Parameters
Volume – Size and ScaleUp to 40,000 sensors in the Airbus A3807 TB per dayVelocity – Data Rate and Streaming DataSensor data collected inmsecType and No of SensorsVariety – Cross Media DataSensorsImages and videosText dataRelational business dataValidity - ReliabilityPoor data qualityMissing dataData collected doesn't suit targeted use casesValue – Usefulness and ImportanceData per se is not valuableHow to extract real value from data?
Necessary Conditions for Collection and Use of Big Data
InfrastructureInstrument Data SourcesPeoplePlaces andThingsComputing Power: Processor, Memory and BandwidthMulti Farm Cloud ComputingSuper Computers for ProcessingZettabyte (1021Bytes) Storage FarmsMillion Gigabit bandwidthMachine Learning and Analytics
Big Data from Health
Monitor Life Support ParametersUsing Improvised Smart PhoneUsing a Low Cost Version of Apple Watch and FitbitBody Media Implant or Capsule in the Future?Analyze the Data for Tell Tale SignalsNotify and/or Warn the User of Impending ProblemsGuardian Angel Service Providers For HealthDevices, Tools and Gat Apps
Big Data from Education
Keystroke Activity Monitoring Exposes Student Learning BehaviorAttendancePaying Attention or Playing games?Learning SpeedProblem Solving SpeedValuable Tool for Student, Parents and EducatorsTimely Notifications and WarningsGuardian Angel Service Providers For EducationDevices, Tools and Gat Apps
Big Data from EmergenciesProvide the Right Information to the Right People
Right InformationTo the Right PeopleAt the Right TimeIn the Right LanguageI the Right Medium: Voice, Video and/or TextIn the Right Level of Detail
Data to Knowledge: Machine LearningMachine Learning is the Key to Unlocking Big Data
16
DataScienceandMachineLearning
interdisciplinaryfieldaboutscientificmethods…toextractknowledgeorinsights fromdata...usestechniquesfrommanyfields…mathematics,statistics,informationscience, andcomputerscience,inparticularfromthe subdomainsofmachinelearning.
DataScience
…IntelligentAmplifiers":Use of AI Technologyto augment humanintelligence
ArtificialIntelligence
IsasubfieldofAI,itgivescomputers theabilitytolearnwithoutbeing explicitlyprogrammed
MachineLearning
Adapted from:SAP andWikipedia
18
©2017SAPSEoranSAPaffiliatecompany.Allrightsreserved.
MachineLearning-ComplexityandAutomationLevels
Source:Gartner2014
Data AnalyticsMonitor: What Happened?Diagnosis: Why did IT Happen?Prediction: What will Happen?Prescription: What is to be Done?
Role of Machine Learning in Big Data
Anomaly DetectionHealthy Individuals vs. Persons with Potential ProblemsClassificationClustering into Groups of Similar PopulationsFailure PredictionSensor-based Prediction of Future Health ProblemsPrescriptive Analytics and OptimizationRecommend Actions and Optimize Activities for Preventive Services like Flu Shots and Screening TestsCorrelationsSupport for Root Cause Analysis like DNA InheritanceForecastingPredict Life Expectancy forInsurnace
Problems with Big Data Machine Learning
High dimensional data – Find the relevant features?Needs the involvement of domain experts and/or automatic feature selection techniquesData Quality is poor:Data is not collected to be used for Machine Learning. There are no standards with regards to sensor data. Difficult to integrate different data sources.Rare event problem:Standard Machine Learning algorithms achieve poor results. Needs special algorithms for unbalanced classes.No labels:Use unsupervised learning algorithms (e.g. anomaly detection)Use-case specific algorithms and data models:Needs flexibility and extensibilityDeployment challenge:Need for an automatic system for model deployment and management
1
Machine LearningSteps in Big Data
Sensor Data AcquisitionPre-processandexploredataanddetectpatternsand outliers.80%of thetimeofdatascientistsis spentwithpre- processingLearningUse domain user annotations as labels and sensor data as well as business data to learn machine learning modelsPredictionUse the learned model and apply to new dataFeedbackAsk domain users to annotatepatterns andanomaliesRecommendationRecommend steps that should be done by the domain userActionAsses recommendations and act accordingly if appropriate
1
Integrating Machine Learning into COGs and GATs
Data PreprocessingDomain Expert FeedbackAdaptive LearningAnomaly DetectionPrescriptive AnalyticsProblem and Failure PredictionRecommended Recovery ActionsModel Management and Updating
Knowledge to Action: Intelligent Agents
22
Big Data Enables Cognition Amplifiers and Guardian AngelsCOGs and GATs for Everyone on the Planet
Cognition Amplifiers and Guardian Angels are two families of intelligent Agents that help with scarcity of attention problemA Cognition Amplifier (COG) is a Personal Enduring Autonomic Intelligent Agent that anticipates what you want to do and helps you to do it with less effortBuying and selling:Transact with multiple providersEmail:Filter spam, understand and respond to actionable emailNews:Based on topic preferences, novelty, collaborative filteringBanking: Monitor bank account, Credit Cards, Pay BillsTravel: Flights, hotel, schedule disruptions, cancellationsA Guardian Angel (GAT) is a Personal Enduring Autonomic Intelligent Agent that Discovers And Warns You About Unanticipated Events That Could Impact Your Safety, Security, and HappinessJust-in-Time Warnings: Hurricanes, Earthquakes, Extreme WeatherAccident Alerts and Rerouting; Transport StrikesScarcity of Essential Resources: Food, Energy, Water etc.Each Person has Thousands of Cogs and Gats as Personal Assistants
Architecture of Cogs and GatsCogs and Gats Publish and Subscribe
Cogs and Gats are Mobile Apps (like a shadow) for Each Person on the PlanetUnlike APPs of today, Cogs are Mass-customized to Each IndividualDesigned to be Non-intrusive, Autonomic, and Device IndependentAlways On, Always Present and Always WorkingAlways LearningEnduring(life-long)Cogs and Gats Monitor, Analyze and Learn From Experience;Learn From Own Experience And Experience of OthersAnd share knowledge with a community of Cogs and GatsAutomated Discovery of Data and Information SourcesGats and Cogs Publish and Subscribe Anonymized Data of User Activities and ExperiencesData, suitably anonymized, can be used to learn appropriate responses for every possible situation byLearning preferences by observing user choices,Learning by task similarity and user similarity,Learning by error correction andSimply learning thru clarification dialog ( does that mean yes? Would you care to define it?)
Technologies of Cogs and GatsCogs and Gats Publish and Subscribe
Service Agents are created by service providers using agent templates included in the platformDownload and Pay just as for Apps and PersonalizationGuardian Angels Can request and manage services on behalf of their masters.Every organization that wants to enable access to their services by Guardian Angels will create a Service Agent for this purpose.Opt-in “Waze” like ModelsPrivacyIndividualOther participantsLegalCan be subpoena-edSecurityInformation falling into wrong hands
User
Cloud BasedUser InfrastructurePlatform
What Cogs and Gats Are Not
One System Fits AllDesigned for Mass UseUser Activated APPsAssume Human in the loop alwaysIntentional ActivationRequire Laptops or Smart PhonesConsume Human AttentionTexting instead of Voice and VisionContext InsensitiveFixed algorithms (non-changing)Non Temporal
Economic Impact of AI 2.0 and Big DataGuardian Angels and Cognition Amplifiers
Every person on the planet will be able to perform many daily habits more effectively using Gats and CogsDaily habits (routines, activities) include a wide spectrum from routine tasks (such as banking and travel planning) to tasks too difficult for the userOver 80% of all human activity will done by Cogs by 20207 Billion People Market Vs 2 Billion TodayUltimately Humans Could be 10 Times More Efficient and EffectiveGlobal GDP is $100 TrillionEven 10% improvement will lead to $10T additional wealth creationGat and Cog Eco System Requires Platform Providers, App Providers, and Agents for Personal CustomizationAll of Whom Would Benefit from the Additional Wealth Creation
0
Embed
Upload