AWS Certified Solutions Architect Pro – Study Notes Domain 1

By | December 4, 2016

Domain 1.0: High Availability and Business Continuity (15% of exam)

1.1 Demonstrate ability to architect the appropriate level of availability based on stakeholder requirements

1.2 Demonstrate ability to implement DR for systems based on RPO and RTO

1.3 Determine appropriate use of multi-Availability Zones vs. multi-Region architectures

1.4 Demonstrate ability to implement self-healing capabilities

  • Recovery Point Objective (RPO) – The acceptable amount of data loss measured in time.
    • If your RPO is 1 hour and you have an event at 4pm, you have to restore data up to 3pm.
  • Recovery Time Objective (RTO) – The time it takes after a disruption to restore a business process to its service level.
    • If your RTO is 1 hour and you have an event @ 4pm, you need to be back up and running by 5pm.
  • Benefits of using AWS for DR:
    • Min hardware required for data replication
    • Pay as you use model
    • Scalability
    • Automate DR deployment (scripts, CFTs, etc…)
  • DRBD – https://aws.amazon.com/blogs/aws/redundant-disk/
  • AWS Services for DR (https://aws.amazon.com/disaster-recovery/)
    • Multiple regions around the globe
    • Storage
      • S3 – 11 9s & cross region replication
        • Mission critical & primary data storage
        • Redundantly stored on multiple devices across multiple facilities w/in a region.
        • Can use cross region replication to move from 1 to another
      • Glacier
        • 3 hours or longer to recover a file (i.e. if your RTO is 15 min…)
        • Can create point-in-time snapshots of data volumes
        • Use these snaps as the starting point of new EBS vols
        • Protected long-term because they are stored on S3
        • Not automatic by default, must be scripted
      • AWS Import/Export Snowball (https://aws.amazon.com/importexport/)
        • Import TO EBS, Glacier & S3
        • Can only export FROM S3
          • If you export from S3 bucket with versioning turned on, only latest version is exported
        • Use cases:
          • Cloud Migration
          • DR
          • Datacenter Decommission
          • Content Distribution
      • Direct Connect
      • AWS Storage Gateways
        • Can be deployed on prem (ESXi or Hyper-V) or as an EC2 instance
        • Can schedule snapshots
        • Can use with Direct Connect
        • Can implement bandwidth throttling (good for remote sites)
        • If multiple sites, need one in each location
        • Networking ports: 443 externally. 80 (activation only), 3260 (iSCSI), and UDP53 (DNS) internally.
        • Encrypted using SSL in transit and AES-256 @ rest.
        • Stores data as EBS snaps in S3
        • Gateway cached
          • iSCSI based block storage
          • local storage is frequently accessed data
          • infrequently accessed data stored in S3
          • if link to AWS goes down, you lose access to your data
          • each volume can go up to 32TB, 32 volumes supported (i.e. 1PB of data can be stored)
          • need storage for local cache and an upload buffer
          • can take point in time incremental snaps of volumes and store in S3 as an EBS snapshot
        • Gateway stored
          • iSCSI based block storage
          • for when you need your entire data set locally
          • each volume can go up to 16TB, 32 volumes supported (i.e. 0.5PB of data can be stored)
          • can take point in time incremental snaps
          • snaps provide durable off-site backup to S3 as EBS snap
          • use snap of gateway stored volume as starting point of new EBS volume which you can attach to an EC2 instance
        • Gateway VTL/VTS
          • Get rid of your physical tape library infrastructure
          • Virtual Tape Library -> backed by S3
            • Instant retrieval
            • 1500 virtual tapes (150TB)
          • Virtual Tape Shelf -> backed by Glacier
            • Up to 24 hours to get a virtual tape back
            • Unlimited tapes
          • need storage for local cache and an upload buffer
          • Software supported:
            • NetBackup 7.x
            • Backup Exec 2012-15
            • MS System Center 2012 Data Protection Mgr
            • Veeam 7 & 8
            • Dell NetVault 10
    • Compute
      • EC2
      • EC2 VM Import Connector – Virtual Appliance that works with vCenter to convert your VMware VMs into EC2 instances in AWS.
    • Networking
      • Route53
      • ELB
      • VPC
      • Direct Connect
    • DBs
      • Supported HA for Databases:
        • Oracle – RAC & data guard
        • SQL Server – Always on availability groups & SQL mirroring
        • MySQL – Asynch replication
        • Snapshot data from 1 region to another
        • Can have a read replica running in another region
        • available for MySQL, MariaDB, PostgreSQL, and Amazon Aurora
        • Automatic failover in case of:
          • An Availability Zone outage
          • The primary DB instance fails
          • The DB instance’s server type is changed
          • The operating system of the DB instance is undergoing software patching
          • A manual failover of the DB instance was initiated using Reboot with failover
        • RDS Multi-AZ failover (synchronous replication only):
          • MySQL, Oracle & PostgreSQL use synch physical replication to keep standby up to date with primary
          • SQL uses SQL server native mirroring tech.
          • ALWAYS uses synchronous*
          • High Availability
          • Backups are taken from secondary (avoids I/O suspension)
          • Restores are taken from secondary (same reason)
          • Is NOT a scaling solution (use read replicas for scaling)
        • Read Replicas (asynchronous replication only):
          • Read heavy DB workloads (duh)
          • Serve reads while source DB is unavailable (maintenance, I/O suspension, etc..)
          • Business reporting
          • When creating a new read replica, if multi-AZ is not enabled the snap is of the primary (~1 min I/O suspension). If multi-AZ is enabled snap is taken from secondary DB. Read replicas themselves cannot be multi-AZ
          • When created, given a new end point DNS address
          • Can be promoted to its own standalone DB.
          • Can have up to 5 read replicas
          • MySQL ONLY can have:
            • read replicas in different regions
            • read replicas of read replicas (further increases latency)
          • Cannot snap or do automate backups of read replicas
      • DynamoDB
        • Offers cross region replication via open source tool on Github: https://github.com/awslabs/dynamodb-cross-region-library
        • If application does NOT require Atomicity, Consistency, Isolation, Durability (ACID) compliance, joins & SQL then consider DynamoDB rather than RDS (more on this in domain 5)
      • RedShift
        • Snapshot data warehouse to be stored in S3 with in same region or copied to another region
    • Orchestration
      • CloudFormation
      • ElasticBeanstalk
      • OpsWork
  • DR Scenarios
    • Backup & Restore $
      • Cheapest & most manual
      • Longest RTO/RPO
      • Select appropriate tool to backup data to AWS
      • Ensure appropriate retention policy for data
      • Ensure security measures are in place (encryption & access policies)
    • Pilot Light $$
      • Small, most critical core elements of systems in AWS. When you need recovery, you can quickly provision a full scale production environment around the critical core
      • 2 options for provisioning from a network perspective:
        • Use pre-allocated IPs (& even MACs w/ ENIs) & assoc with instances when invoking DR.
        • Use ELB to distribute traffic to multiple instances, then update DNS to point to AWS EC2 instance or to LB using CNAME. Everyone does this option J
      • 1. Setup EC2 instance to replicate or mirror data
      • 2. Have all supporting custom software available in AWS
      • 3. Create & maintain AMIs of key servers where fast recovery is required
      • 4. Regularly run/test/patch/update these servers
    • Warm Standby $$$
      • Scaled down version of a fully functional environment.
      • Horizontal scaling is preferred over vertical scaling
      • 1. Setup EC2 instance to replicate or mirror data
      • 2. Create and maintain AMIs
      • 3. Run app using minimal footprint
      • 4. Patch/update these servers in line with prod
      • To recover – scale up/out your AWS footprint, change DNS (or use Rt53 automated health checks) & consider autoscaling
    • Multi-Site (active-active) $$$$
      • Most expensive and most automated
      • Shortest RPO/RTO
      • Runs on-site AND in AWS as active-active
      • Use Rt53 to route traffic to both sites either symmetrically or asymmetrically. Change DNS weighting to all AWS in the event of failure.
      • Application logic potentially necessary in the event of site failure
  • Automated Backups
    • Services that have automated backups:
      • RDS
        • Stored on S3
        • MySQL DB engine, only the InnoDB storage engine is supported
        • MariaDB, only the XtraDB storage engine is supported
        • Deleting a DB instance deletes all automated backups (manual snaps are not deleted)
        • Default retention period is one day (values are 0 – 35 days)
        • Manual snap limits – 50 per region – does not apply to automated backups
        • Restore allows you to change engine type (SQL Std to SQL Ent)
      • Elasticache (only Redis not Memcached)
      • Redshift
    • Services that do NOT have automated backups
      • EC2
        • Automate using CLI or Python
        • Stored in S3
        • Snaps are incremental, charged for incremental space
          • Each snap still contains base data