Publications: 106 | Followers: 1

Philly – CNTK – TF - Azure

Publish on Category: Birds 0

DL (Deep Learning)Workspace
Engineering Practice and Lessons LearntHongzhi Li, Jin Li, Sanjeev Mehrotra
Key Engineering Practice
Separate Configuration & codeMicroservice Architecture with Mostly Stateless ModuleHigh Quality Modules
Separate Configuration & code
Separate Configuration & codeBenefit:Significant code reuse and stability of the code baseA default configuration (in deploy.py), checked into repoCluster specific configuration (inconfig.yaml) is not checked inBackup/restore operation:Backup/restore cluster related configuration + keys from/to a blobCode + configuration will be rendered to a location that isgitignored, and executed thereState of the cluster managed by database (SQL Azure)User database (who is authenticated, who is admin, etc..)Scheduling database (what job is being scheduled, deleted, etc..)
Microservice Architecture With mostly Stateless Modules
Build DL workspace as a collection of MicroserviceMinimum dependency among services (so that cost of switching a module is low)OpenIDauthentication, securedetcd/kuberneteclustersMySQLvs SQLCoreOS vs UbuntuAsp.Netcore vs flask API serviceFile share: NFS, HDFS,GlusterFS, CIFS (Azure File Share)Minimal/no change to other module when a module need to be changed/updatedStateless microservice is strongly preferredStates are preserved in either SQL server oretcdservers
High Quality Module
Evaluate the quality of a module before using itDocker is of good quality, and has been stableKuberneteis of good quality, and has been stableNvidia-docker (preferred platform for DL workload)Zombie process (Nvidia driver)Try best not to hack a moduleUse docker/nvidia-docker/kubernete/glusterfs/hdfsas isMinimal code, and most of the issues we have encountered have also been encountered by the community
Backup
Modern microservice architecture
Embrace micro-servicesHundreds/thousands ofindependentservices forms an ecosystemEach microservice should evolve by its own (created/justified/deprecated through usage, not top-down design)Each service is single purpose, with simple and well-defined API, modular and independentGoals of service owner: meet the needs of my clients, at minimum costandeffortStandardize communication (network protocols, data formats, schema between services), rather than service themselvesA service became standard by being better than its alternativeStandardize infrastructure (cluster management, monitoring, diagnostic, alerting, etc..)No need to standardize internals (e.g., programming language,framework,persistence)Encourage open-source like practice:Good documentation from the get goSearchable code/documentation and discussion forum

0

Embed

Share

Upload

Make amazing presentation for free
Philly – CNTK – TF - Azure