AWS Certified Solutions Architect Pro – Study Notes Domain 7

By | December 4, 2016

Domain 7.0: Scalability and Elasticity (15% of exam)

7.1 Demonstrate the ability to design a loosely coupled system

7.2 Demonstrate ability to implement the most appropriate front-end scaling architecture

7.3 Demonstrate ability to implement the most appropriate middle-tier scaling architecture

7.4 Demonstrate ability to implement the most appropriate data storage scaling architecture

7.5 Determine trade-offs between vertical and horizontal scaling

CloudFront https://aws.amazon.com/cloudfront/faqs/

  • Can be used to deliver dynamic, static, streaming, and interactive content of a website using a global network of edge locations
  • Requests for content are automatically routed to nearest edge location for best possible performance
  • Is optimized to work with other AWS like S3, EC2, ELB & Route 53
  • Key Concepts:
    • 2 Distribution Types
      • Web Distributions
      • RTMP Distributions
    • Geo Restrictions (Geo Blocking)
      • Whitelist or Blacklist by country
      • Done by either API or console
      • Blacklisted viewer will see a HTTP 403 error
      • Can create custom error pages
    • Support for GET, HEAD, POST, PUT, PATCH, DELETE & OPTIONS
      • CloudFront doesn’t cache responses to POST, PUT, DELETE or PATCH requests – these requests are proxied back to the origin server
    • SSL configs – can use either HTTP or HTTPS with CloudFront. Can use either default CloudFront URL or a custom URL with your own certificate. If you go with custom URL:
      • Dedicated IP custom SSL:
        • Dedicated IP addresses to server your SSL content @ each CloudFront edge location.
        • Expensive
        • $600 per certificate per month per endpoint
        • Supports older browsers
      • SNI (Server Name Indication) Custom SSL:
        • Relies on SNI extension of Transport Layer Security protocol
        • Allows multiple domains to serve SSL traffic over same IP address by including the hostname browsers are trying to connect to
        • Does not support older browsers
    • Wildcard CNAME supported
      • Up to 100 CNAME aliases to each distribution
    • Invalidation
      • If you delete a file from your origin, it will be deleted from edge locations when that file reaches its expiration time (as defined in the objects HTTP header)
      • You can proactively remove ahead of expiration time using the Invalidation API to remove an object from all CloudFront edge locations
        • Use in the event of offensive or potentially harmful material
        • Call an invalidation request
        • You do get charged for it
    • Zone Apex Support
      • You can use CloudFront to deliver content from the root domain, or “zone apex” of your website.
      • For example, you can configure both http://www.example.com and http://example.com to point at the same CloudFront distribution, without the performance penalty or availability risk of managing a redirect service.
      • To use this feature, you create a
        Route 53 Alias record to map the root of your domain to your CloudFront distribution.
    • Edge caching – Dynamic Content Support
      • CloudFront supports delivery of dynamic content that is customized or personalized using HTTP cookies.
      • To use this feature, you specify whether you want Amazon CloudFront to forward some or all your cookies to your origin server.
      • CloudFront then considers the forwarded cookie values when identifying a unique object in its cache.
      • Get both the benefit of content that is personalized with a cookie and the performance benefits of CloudFront.
      • You can also optionally choose to log the cookie values in CloudFront access logs.

ElastiCache https://aws.amazon.com/elasticache/faqs/

  • Memcached vs Redis http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/SelectEngine.Uses.html
  • https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
  • Lazy Loading
    • App tries to get data from cache, if no data avail cache returns null.
    • App gets data from DB, app then updates cache
    • only requested data is in cache
    • node failures don’t matter as request simply goes back to DB again
  • Write Through
    • Cache is updated when data is written to DB (each write is 2 steps, 1 to DB, 1 to cache)
    • ensures data is never stale
    • good for apps that don’t have a lot of writes
    • infrequently accessed data gets stored in cache (bad)
    • if node is spinning up it could miss writing & cause missing data (bad)
  • Use Memcached if the following apply to your situation:
    • Does not manage it’s own persistence (relies on DB to have the most recent data)
    • Can be run in a cluster of nodes
    • Can’t backup clusters – goes back to DB to repopulate
    • You need the simplest model possible.
    • You need to run large nodes with multiple cores or threads (Multithreaded performance).
    • You need the ability to scale out/in, adding and removing nodes as demand on your system increases and decreases (Horizontal scaling).
    • You need to partition your data across multiple shards.
    • Can populate with both Lazy Loading and Write Through
    • great solution for storing “session” state, making web servers stateless which allows for easy scaling
  • Use Redis 2.8.x or Redis 3.2 (non-clustered mode) if the following apply to your situation:
    • You need complex data types, such as strings, hashes, lists, sets, sorted sets, and bitmaps.
    • You need to sort or rank in-memory data-sets.
    • You need persistence of your key store.
    • You need to replicate your data from the primary to one or more read replicas for read intensive applications.
    • You need automatic failover if your primary node fails (Multi-AZ).
    • You need publish and subscribe (pub/sub) capabilities—to inform clients about events on the server.
    • You need backup and restore capabilities.
    • You need to support multiple databases.
  • Use Redis 3.2 (clustered mode) if you require all the functionality of Redis 2.8.x with the following differences:
    • You need to partition your data across 2 to 15 node groups. (Cluster mode only.)
    • You need geospatial indexing. (clustered mode or non-clustered mode)
    • You do not need to support multiple databases.
    • Redis (cluster mode enabled) cluster mode has the following limitations:
      • No scale up to larger node types.
      • No changing the number of node groups (partitions).
      • No changing the number of replicas in a node group (partition).

Kinesis Streams https://aws.amazon.com/kinesis/streams/faqs/

  • Enables you to build custom applications that process or analyze streaming data for specialized needs.
  • You can continuously add various types of data such as clickstreams, application logs, and social media to a Kinesis stream from hundreds of thousands of sources.
  • Within seconds, the data will be available for your Kinesis Applications to read and process from the stream.
  • Data in Kinesis is stored for 24 hours by default, can increase to 7 days
  • Kinesis Streams is not persistent storage, use S3, Redshift, DynamoDB, EMR etc. to store processed data long term
  • Synchronously replicates streaming data across 3 AZs
  • When would you use Kinesis?
    • Gaming – collect data like player actions into gaming platform to have a reactive environment based off real-time events
    • Real-time analytics
    • Application alerts
    • Log/Event Data collection
    • Mobile data capture
  • Key Concepts:
    • Data Producers (e.g. EC2 Instances, IoT Sensors, Clients, Mobile, Server)
      • Kinesis Streams API
        • PutRecord (single record)
        • PutRecords (multiple records)
      • Kinesis Producer Library (KPL)
        • simplifies producer application development, allowing developers to achieve high write throughput to a Kinesis Stream
      • Kinesis Agent
        • Java app that you can install on Linux devices
    • Shards
      • Shard is the base throughput unit of an Amazon Kinesis stream
      • One shard provides a capacity of 1MB/sec data input and 2MB/sec data output.
        • One shard can support up to 1000 PUT records per second.
        • You will specify the number of shards needed when you create a stream.
        • For example, you can create a stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second.
      • Can dynamically add/remove shards from stream via resharding
    • Data Records – the unit of data stored in an Amazon Kinesis stream
      • Sequence number – a unique identifier for each record
        • Assigned by streams after you write to the stream with client.putRecord(s)
      • Partition Key – used to segregate and route records to different shards of a stream
        • Used to group data by shard within a stream
        • Stream service segregates data records belonging to a stream into multiple shards
        • Use partition keys associated w/ each data record to determine which shard a given data record belongs to
        • Specified by the app putting the data into a stream
      • Data (blob) – data your producer is adding to stream. Max size = 1MB
    • Data Consumers (e.g. Amazon Kinesis Streams Applications)
      • Typically EC2 instances that are querying the Kinesis Streams
      • Run analytics against the data & pass data onto persistent storage

SNS Mobile Push https://aws.amazon.com/sns/faqs/

  • Subset of SNS
  • Push notifications can be sent to mobile devices and desktops using one of the following services:
    • Amazon Device Messaging (ADM)
    • Apple Push Notification Service (APNS)
    • Google Cloud Messaging (GCM)
    • Windows Push Notification Service (WNS) for Windows 8+ and Windows Phone 8.1+
    • Microsoft Push Notification Service (MPNS) for Windows Phone 7+
    • Baidu Cloud Push for Android devices in China
  • Steps: