key: Folder1/hello.html value: the content of that file
Also objects contain metadata and optional version number
Read after Write consistency for new PUTs (Immediate)
Eventual consistency for DELETE and Overwrite PUT (may take some time)
The data are stored lexicographical/sorted alphabetically
For performance, save objects of random names (add salt before filename if its based on timestamp)
Availability 99.99% (4 nines)
Durability 99.999999999 (11 nines)
Standard-IA Infrequent access
Cheaper than S3 standard but retrieval fee is charged
99.9 (3 nines) availability
Single Zone -IA or Zone infrequent access (Released April 2018)
Single zone only. No redundancy
20% cheaper than Standard-IA
Use case: Store reproducible, infrequently accessed data. Example: second or third backup copies for compliance sake.
Reduced Redundancy Storage
Availability 99.99% (4 nines)
Durability 99.99% also
Cheap but takes 4 hours to retrieve
Use “Bulk retrieval” for cheaper cost
Use expedited retrieval for fast retrievals
Specify rules to move across storage classes at specified age and then finally delete.
Total data storage across all versions is billed
Once enabled you cannot disable versioning. You can suspend it for future updates. If you want to turn versioning off, you need to delete the bucket and recreate (version id)
Once you delete the delete marker, you can get the file back that you have deleted while versioning on
Access Control Lists
S3 ACLs is a legacy access control mechanism that predates IAM. However, if you already use S3 ACLs and you find them sufficient, there is no need to change. As a general rule, AWS recommends using S3 bucket policies or IAM policies for access control.
An S3 ACL is a sub-resource that’s attached to every S3 bucket and object. It defines which AWS accounts or groups are granted access and the type of access. When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource.
Use IAM policies if:
You need to control access to AWS services other than S3. IAM policies will be easier to manage since you can centrally manage all of your permissions in IAM, instead of spreading them between IAM and S3.
You have numerous S3 buckets each with different permissions requirements. IAM policies will be easier to manage since you don’t have to define a large number of S3 bucket policies and can instead rely on fewer, more detailed IAM policies.
You prefer to keep access control policies in the IAM environment.
Use S3 bucket policies if:
You want a simple way to grant cross-account access to your S3 environment, without using IAM roles.
Your IAM policies bump up against the size limit (up to 2 kb for users, 5 kb for groups, and 10 kb for roles). S3 supports bucket policies of up 20 kb.
You prefer to keep access control policies in the S3 environment.
Make it public
S3 is AWS object storage service on the cloud. Lets you store key/value pairs (bucket name, filename is key the content of the object/file is value)
S3 access is global but a bucket will need a region
Client side encryption
Server Side encryption
SSE-S3 using S3 managed Keys
SSE-KMS using KMS keys
SSE-C using client provided keys
Control access to a bucket using bucket ACL or bucket policy
All buckets and objects are pvt by default
Two ways to stop people from accidentally delete objects
Enable MFA delete
Cross region replication
You need to first turn on versioning
Then goto Management and choose cross region replication
create rule to replicate all or some objects to a destination bucket.
You can specify a different storage class for the replication target bucket
Only new objects (not the existing ones) are replicated
S3 transfer acceleration
Lets you copy files to cloud front edge location as opposed to directly copying to s3 bucket thus saving time/latency since the edge location is closer to you than the S3 bucket
Static website hosting on S3
Create a bucket whose name is same as your domain name (without .com)
Go to static website hosting and enable
Grant public read access
URL will be http://your-bucket-name.s3-website-REGION.amazonaws.com where region can be us-east-1 etc.
S3 is global but buckets reside in regions. But no need to provide region in url or arn since they are globally unique
Requester Pays Option: Can be used to pass on request/transfer costs to another AWS account
The bucket owner (or others, as permitted by an IAM policy) can arrange for notifications to be issued to Amazon Simple Queue Service (SQS) or Amazon Simple Notification Service (SNS) when a new object is added to the bucket or an existing object is overwritten. Notifications can also be delivered to AWS Lambda for processing by a Lambda function.
Following events are supported: s3:ObjectCreated:Put, s3:ObjectCreated:Post , s3:ObjectCreated:Copy,s3:ObjectCreated:CompleteMultipartUpload,s3:ObjectCreated:*,s3:ReducedRedundancyObjectLost.
Each notification is delivered as a JSON object with the following fields: Region, Timestamp, Event Type (as listed above), Request, Actor, Principal ID, Source IP of the request, Request ID, Host ID, Notification Configuration Destination ID, Bucket Name, Bucket ARN, Bucket Owner Principal ID, Object Key, Object Size, Object ETag, Object Version ID (if versioning is enabled on the bucket).
Notifications are delivered to the target in well under a second.
Cost – There is no charge for this feature.
Regions – The bucket and the target must reside in the same AWS Region.
Optimizing S3 performance: If you consistently exceed 100+ PUT/DELETEs or 300+ GETS, you should optimize your S3.
For GET only performance use CloudFront
For PUT/DELETE performance use a hexadecimal hash as the prefix. This will force S3 to use different bucket partitions which will enhance performance
S3 price – charged for Storage, number of requests, data transfer (tiered so more you use less charge)
Bucket name has to be all lowercase letters
Individual objects inside the same bucket can have different storage class and you can turn on server side encryption at object level.
A example bucket link https://s3-eu-west-2.amazonaws.com/myobj
URL for bucket with Static website hosting: http://mysite.s3-website-eu-west-2.amazonaws.com
you can turn on SSL https with cloudfront
Every non-anonymous request to S3 must contain authentication information to establish the identity of the principal making the request. In REST, this is done by first putting the headers in a canonical format, then signing the headers using your AWS Secret Access Key.
You can use pre-signed urls
Amazon S3 Select is a new (Apr 2018) capability
designed to pull out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3.
In the past most applications have to retrieve the entire object and then filter out only the required data for further analysis.
Now S3 Select enables applications to offload the heavy lifting of filtering and accessing data inside objects to the Amazon S3 service.
By reducing the volume of data that has to be loaded and processed by your applications, S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%.
You can use S3 Select from the AWS SDK for Java, AWS SDK for Python, and AWS CLI.
Use SELECT command as opposed to GET command
is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL expressions.
Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL expressions.