AWS Certified Solutions Architect Pro – Study Notes Domain 8

By | November 29, 2016

Domain 8.0: Cloud Migration and Hybrid Architecture (10% of exam)

8.1 Plan and execute for applications migrations

8.2 Demonstrate ability to design hybrid cloud architectures

VMware Integration

  • AWS management portal for vCenter: https://aws.amazon.com/ec2/vcenter-portal/
  • Portal installs as a vCenter plug-in
  • Enables you to migrate VMware VMs to EC2 & Manage AWS resources from within vCenter
  • Use cases:
    • Migrate VMs to EC2
    • Reach new geographies from vCenter
    • Self-Service AWS portal from within vCenter
    • Leverage VMware experience while learning AWS

Migrating to cloud using Storage Gateway https://aws.amazon.com/storagegateway/details/

  • Can use storage gateway to migrate on-prem VMs to AWS
  • Snaps must be consistent. Take VM offline before taking snap or use OS/App tool to flush to disk

Data Pipeline https://aws.amazon.com/datapipeline/ and http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

  • Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.
  • Can create, access & manage using:
    • AWS mgmt. console
    • CLI
    • SDKs
    • Query API
  • Supported compute services:
    • EC2
    • EMR
  • Supported Services to store data:
    • DynamoDB
    • RDS
    • Redshit
  • Can be extended on on-premises:
    • AWS supplies a Task Runner package that can be installed on your on-premises hosts. This package polls the Data Pipeline service for work to perform. When it’s time to run an activity, Data Pipeline will issue the appropriate command to the Task Runner.
  • With Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as:
    • S3
    • RDS
    • DynamoDB
    • Elastic MapReduce (EMR).
  • Pipeline – the resource that contains the definition of the dependent chain of data sources, destinations, and predefined or custom data processing activities required to execute your business logic.
    • Contains the datanodes, activities, preconditions & schedules
    • Can run on EC2 or EMR
    • Consists of:
      • Task Runner – package continuously polls the AWS Data Pipeline service for work to perform. Installed in 1 of 2 ways:
        • Installed automatically on resources that are launched and managed by the Data Pipeline service
        • Manually installed on a compute resource that you manage, such as a long-running EC2 instance or an on-premises server
      • Data node – The end destination for your data. a data node can reference a specific Amazon S3 path. Data Pipeline supports an expression language that makes it easy to reference data which is generated on a regular basis
        • For example, you could specify that your Amazon S3 data format is s3://example-bucket/my-logs/logdata-#{scheduledStartTime(‘YYYY-MM-dd-HH’)}.tgz
        • Other examples:
          • A DynamoDB table that contains data for HiveActivity or EmrActivity to use
          • A MySQL table and database query that represents data for a pipeline activity to use
          • A Redshift table that contains data for RedshiftCopyActivity to use
      • Activity – an action that AWS Data Pipeline initiates on your behalf as part of a pipeline. Example activities are EMR or Hive jobs, copies, SQL queries, or command-line scripts
        • Data pipeline provides pre-packaged activities like:
          • Move/copy data from one location to another
          • Run an EMR cluster
          • Run a Hive query
          • Copy data to/from Redshift tables
          • Run a custom UNIX/Linux shell command as an activity
          • Run a SQL query on a DB
        • Use ShellCommandActivity to specify custom activities
      • Precondition – a readiness check (consisting of conditional statements that must be true) that can be optionally associated with a data source or activity.
        • This can be useful if you are running an activity that is expensive to compute, and should not run until specific criteria are met (i.e. does data/table/S3 path/S3 file exist?)
        • Can specify pre-packaged preconditions and custom preconditions
        • 2 types of preconditions:
          • System managed
          • User managed
      • Schedule – when your pipeline activities run and the frequency with which the service expects your data to be available

Network Migrations

  • CIDR Reservations:
    • Biggest you can have is /16
    • Smallest you can have is /28
    • 5 IP addresses are reserved per CIDR block, in a /24:
      • .0 = network address
      • .1 = VPC router
      • .2 = mapping to amazon provided DNS
      • .3 = reserved for future use
      • .255 = broadcast is not supported in a VPC, so AWS reserves this address
  • VPN to Direct Connect Migrations
    • Most orgs start with VPN & then move to Direct Connect as traffic increases
    • Once Direct Connect is installed, VPN and Direct Connect should be configured to be in the same BGP community
    • Then config BGP so that VPN has a higher cost than Direct Connect connection
    • https://www.youtube.com/watch?v=SMvom9QjkPk

Ok, that’s it for the 8 Domains necessary to sit the AWS Solutions Architect Professional exam! If you’ve sat the exam or if I’ve made any mistakes please let me know if the comments section of the appropriate domain.