Follow
Publications: 0 | Followers: 0

MyRocks_current_and_roadmaps

Publish on Category: Birds 268

MyRocks at Facebook and Roadmaps
Yoshinori MatsunobuProduction Engineer, FacebookNov 2017
Agenda
MySQL at FacebookProduction database migration fromInnoDBto MyRocksMyRocks Development Roadmaps
“Main MySQL Database (UDB)” at Facebook
Storing Social GraphMassivelyShardedLow latencyAutomated OperationsPure Flash StorageConstrained by space, not by CPU/IOPS
What is MyRocks
MySQL on top of RocksDB (RocksDB storage engine)Taking both LSM advantages and MySQL featuresLSM advantage: Smaller space and lower write amplificationMySQL features: SQL, Replication, Connectors and many toolsOpen Source
MySQL Clients
InnoDB
RocksDB
ParserOptimizerReplicationetc
SQL/Connector
MySQL
http://myrocks.io/
MyRocks Initial Goal at Facebook
InnoDBin UDB
90%
Space
IO
CPU
Machine limit
15%
20%
MyRocks in UDB
45%
Space
IO
CPU
Machine limit
15%
21%
21%
15%
45%
Migration in Production
Continuous data consistency check betweenInnoDBand MyRocksShadow traffics testsDeployed on slavesDeployed on masters
Created first MyRocks instance without downtime
Picking one of theInnoDBslave instances, then starting logical dump and restoreStopping one slave does not affect servicesVerified data consistency acrossInnoDBand MyRocks
Master (InnoDB)
Slave1 (InnoDB)
Slave2 (InnoDB)
Slave3 (InnoDB)
Slave4 (MyRocks)
Stop & Dump & Load
Creating secondMyRocksinstance without downtime
Master (InnoDB)
Slave1 (InnoDB)
Slave2 (InnoDB)
Slave3 (MyRocks)
Slave4 (MyRocks)
myrocks_hotbackup(Online binary backup)
Promoted MyRocks as a master
Master (MyRocks)
Slave1 (InnoDB)
Slave2 (InnoDB)
Slave3 (InnoDB)
Slave4 (MyRocks)
Copied MyRocks everywhere
Master (MyRocks)
Slave1 (MyRocks)
Slave2 (MyRocks)
Slave3 (MyRocks)
Slave4 (MyRocks)
Our current production status
WeCOMPLETEDInnoDBto MyRocks migrationin UDB
We saved50% spacein UDBcompared to compressedInnoDB
We started working on migratingother large database tiers
Important MyRocks/RocksDB Features
Mem Comparable KeysCHAR/VARCHAR with latin1_bin |utf8_bin columns can be compared by singlememcmpFast Data Loading, deletes and replicationDynamic OptionsTTL
Faster Data Loading
Normal Write Path inMyRocks/RocksDB
MemTable
WAL
Level 0 SST
Level 1 SST
Level max SST
….
Flush
Compaction
Compaction
Faster Write Path
Write Requests
Level max SST
“SET SESSIONrocksdb_bulk_load=1;” or“SET SESSIONrocksdb_bulk_load_unsorted=1”
Development Roadmaps
Matching read performance vsInnoDBhttps://smalldatum.blogspot.comSupporting Mixed EnginesBetter ReplicationSupporting “Bigger Small Data”
Mixed Engines
Currently our production use case is either “MyRocks only” or “InnoDBonly” instanceThere are several internal/external use cases that want to useInnoDBand MyRocks within the same instance, though single transaction does not overlap enginesOnline Backup support and benchmarks are major concernsCurrent plan is extendingxtrabackupto integratemyrocks_hotbackupConsidering to backportinggtid_pos_auto_engines
Better Replication
Removing engine logBoth internal and external benchmarks (e.g. Amazon, Alibaba) show thatqpsimproves significantly withbinlogdisabledReal Problem would be two logs –binlogand engine log, which requires 2pc and ordered commitsOne Log - use one log as the source of truth for commits -- eitherbinlog,binlog-like service or RocksDB WALWe heavily rely onbinlogs(forsemisync,binlogconsumers), TBD is how much perf we gain by stopping writing to WALParallel replication applyBatchingSkipping using transactions on slaves
Bigger Small Data (Bigger Instance Size)
Problem Statement: Shared Nothing database is not general purpose databaseMySQL Cluster, Spider,VitessGood if you have specific purposes. Might have issues if people lack of expertise about atomic transactions, joins and secondary keysSuggestion: Now we have 256GB+ RAM and 10TB+ Flash on commodity servers. Why not put everything there?Bigger instances may help general purpose small-mid applicationsThey don’t have to worry aboutsharding. Atomic trans, joins and secondary keys just worke.g. Amazon Aurora (supporting up to 60TB instance) and AlibabaPolarDB(~100TB instance)
Features required to support Bigger Instance
Parallel Querye.g. how to makemysqldumpfinish within 24 hours from 20TB table?Parallel binary copye.g. how quickly can we create a 60TB replica instance in a remote region?Parallel DDL, Parallel LoadingResumableDDLe.g. if the DDL is expected to take 10 days, what will happen ifmysqldrestarts after 8 days?Better join algorithmMuch faster replicationCan handle 10x connection requests and queriesGood resource controlH/W perspective: Shared Storage and Elastic Computing UnitsCan scale read replicas from the same shared storage

0

Embed

Share

Upload

Make amazing presentation for free
MyRocks_current_and_roadmaps