AWS Certified Solutions Architect Pro – Study Notes Domain 5

By | November 26, 2016

Domain 5.0: Data Storage for a complex large scale deployment (15% of exam)

5.1 Demonstrate ability to make architectural trade off decisions involving storage options

5.2 Demonstrate ability to make architectural trade off decisions involving database options

5.3 Demonstrate ability to implement the most appropriate data storage architecture

5.4 Determine use of synchronous versus asynchronous replication

  • Optimizing S3 – use parallelization for both PUTs and GETs
      • Parallelizing:
        • Divide your files into small parts & upload those parts simultaneously
        • If 1 part fails, it can be restarted
        • Moves bottleneck to the network itself, helps to increase aggregate throughput
        • 25-50MB file size chunks on high bandwidth networks
        • 10MB file size chunks on mobile networks
    • Optimizing for GETS:
      • Use CloudFront
        • Multiple endpoints globally
        • Low latency
        • High xfer speeds available
        • Caches objects from S3
        • 2 flavors:
          • RTMP
          • Web
      • Use range-based GETs to get multithreaded performance (http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html)
        • Using the Range HTTP header in a GET request, allows you to retrieve a specific range of bytes in an object stored in S3
        • Allows you to send multiple GETs at once
        • Compensates for unreliable network performance
        • Maximizes bandwidth throughput
    • S3 is Lexicographical (stored in dictionary order A-Z)
      • The more random you make your file structure within a particular bucket, the better performance you get from S3.
        • Really applies to super large buckets
    • Once you turn on versioning, you can’t turn it off. You can only suspend it.
    • Securing S3:
      • Use bucket policies to restrict deletes
      • You can also use MFA Delete (which is exactly what it sounds like, you need creds and google auth to delete anything)
      • Versioning does not protect you against deleting a bucket:
        • Backup your bucket to another separate S3 bucket owned by a different account
  • Database Design Patterns (http://media.amazonwebservices.com/AWS_Storage_Options.pdf) ß read the anti-patterns for DBs
    • Multi-AZ vs Read Replicas
      • Multi-AZ
        • Used for DR only, not for scaling
        • Synchronous replication
      • Read Replica
        • Used for scaling out, not DR
        • Asynchronous replication
    • RDS Use Cases
      • Ideal for existing apps that rely on MySQL, Oracle, SQL, PostgreSQL, MairaDB & Aurora
      • Amazon RDS offers full compatibility & direct access to native DB engines. Most code, libs & tools designed to play with these DBs should work unmodified w/ Amazon RDS
      • Optimal for new apps with structured data that requires more sophisticated querying & joining than can be provided by Amazon’s NoSQL offering: DynamoDB
      • ACID = RDS
        • Atomicity – in a transaction with 2 or more discrete pieces of info, all data is committed or none is
        • Consistency – a transaction either creates a new valid state of data, or if any failure occurs, returns all data to state before transaction occurred
        • Isolation – a transaction in process but not committed remains isolated from other transactions
        • Durability – data is available in correct state even in the event of a failure & system restart
      • When NOT to use RDS
        • Index & query focused data – use DynamoDB
        • Numerous Binary Large Objects – BLOBs (audio files, videos, images)
        • Automated scalability (RDS good for scaling UP, DynamoDB good for scaling OUT) – Use DynamoDB
        • Other database platforms (IBM DB2, Informix, Sybase) – Use EC2
        • If you need complete, OS level control of the DB server with full root admin – use EC2
    • DynamoDB Use Cases
      • Existing or new applications that need a flexible NoSQL DB with low read/write latencies
      • The ability to scale storage & throughput up & down w/out code changes or downtime
      • Common use cases:
        • Mobile apps
        • Gaming
        • Digital ads
        • Live voting
        • Audience interaction for live events
        • Sensor networks
        • Log ingestion
        • Access control for web based content
        • Metadata storage for S3 objects
        • E-comm shopping carts
        • Web session mgmt.
      • If you need to automatically scale your DB, think DynamoDB
      • Where NOT to use DynamoDB
        • Apps that need traditional relational DB
        • Joins and/or complex transactions
        • BLOB data – use S3, however use DynamoDB to keep track of metadata
        • Large data w/ low I/O rate – again use S3