AWS S3 Synchronization
Synchronize an S3 bucket and a filesystem directory using
raco s3-sync ‹src› ‹dest›
where either ‹src› or ‹dest› should start with s3:// to identify a bucket and item name or prefix, while the other is a path in the local filesystem to a file or directory. Naturally, a / within a bucket item’s name corresponds to a directory separator in the local filesystem, and a trailing / syntactically indicates a prefix (as opposed to a complete item name).A bucket item is ignored if its name ends in /. A bucket can contain an item whose name plus / is a prefix for other bucket items, in which case attempting to synchronize both from the bucket will produce an error, since the name cannot be used for both a file and a directory.
For example, to upload the content of ‹src-dir› with a prefix ‹dest-path› within ‹bucket›, use
raco s3-sync ‹src-dir› s3://‹bucket›/‹dest-path›
To download the items with prefix ‹src-path› within ‹bucket› to ‹dest-dir›, use
raco s3-sync s3://‹bucket›/‹src-path› ‹dest-dir›
If ‹src› refers to a directory or prefix (either syntactically or as determined by consulting the filesystem or bucket), ‹dest› cannot refer to a file or bucket item. If ‹dest› refers to directory or prefix while ‹src› refers to a file or item, the ‹src› file or item name is implicitly added to the ‹dest› directory or prefix.
The following options (supply them after s3-sync and before ‹src›) are supported:
--dry-run —
report actions that would be taken, but don’t upload, download, delete, or change redirection rules. --jobs ‹n› or -j ‹n› —
perform up to ‹n› downloads or uploads in parallel. --shallow —
when downloading, constrain downloads to existing directories at ‹dest› (i.e., no additional subdirectories); in both upload and download modes, extract the current bucket state in a directory-like way (which is useful if the bucket contains many more nested items than the local filesystem) --delete —
delete destination items that have no corresponding source item. --acl ‹acl› —
when uploading, use ‹acl› for the access control list; for example, use public-read to make items public. --reduced —
when uploading, specificy reduced-redundancy storage. --check-metadata – when uploading, check whether an existing item has the metadata that would be uploaded (including access control), and adjust the metadata if not.
--include ‹regexp› —
consider only items whose name within the S3 bucket matches ‹regexp›, where ‹regexp› uses “Perl-compatible” syntax. --exclude ‹regexp› —
do not consider items whose name within the S3 bucket matches ‹regexp› (even if they match an inclusion pattern). --gzip ‹regexp› —
on upload or for checking download hashes (to avoid unnecessary downloads), compress files whose name within the S3 bucket matches ‹regexp›. --gzip-min ‹bytes› —
when combined with --gzip, compress only files that are at least ‹bytes› in size. ++upload-metadata ‹name› ‹value› —
includes ‹name› with ‹value› as metadata when uploading (without updating metadata for any file that is not uploaded). This flag can be specified multiple times to add multiple metadata entries. --s3-hostname ‹hostname› —
set the S3 hostname to ‹hostname› instead of s3.amazon.com. --region ‹region› —
set the S3 region to ‹region› (e.g., us-east-1) instead of issuing a query to locate the bucket’s region. --error-links —
report an error if a soft link is found; this is the default treatment of soft links. --follow-links —
follow soft links. --redirect-links —
treat soft links as redirection rules to be installed for ‹bucket› as a web site (upload only). --redirects-links —
treat soft links as individual redirections to be installed as metadata on a ‹bucket›’s item, while the item itself is made empty (upload only). --ignore-links —
ignore soft links. --web —
sets defaults to public-read access, reduced redundancy, compression for ".html", ".css", ".js", and ".svg" files that are 1K or larger, and Content-Cache "max-age=0, no-cache" metadata.
1 S3 Synchronization API
(require s3-sync) | package: s3-sync |
The s3-sync library uses aws/s3, so use ensure-have-keys, s3-host, and s3-region before calling s3-sync.
procedure
(s3-sync local-path s3-bucket s3-path [ #:upload? upload? #:jobs jobs #:shallow? shallow? #:check-metadata? check-metadata? #:dry-run? dry-run? #:delete? delete? #:include include-rx #:exclude exclude-rx #:make-call-with-input-file make-call-with-file-stream #:get-content-type get-content-type #:get-content-encoding get-content-encoding #:acl acl #:reduced-redundancy? reduced-redundancy? #:upload-metadata upload-metadata #:link-mode link-mode #:log log-info #:error raise-error]) → void? local-path : path-string? s3-bucket : string? s3-path : (or/c #f string?) upload? : any/c = #t jobs : inexact-positive-integer? = 1 shallow? : any/c = #f check-metadata? : any/c = #f dry-run? : any/c = #f delete? : any/c = #f include-rx : (or/c #f regexp?) = #f exclude-rx : (or/c #f regexp?) = #f
make-call-with-file-stream :
(or/c #f (string? path? . -> . (or/c #f (path? (input-port? . -> . any) . -> . any)))) = #f
get-content-type : (string? path? . -> . (or/c string? #f)) = #f
get-content-encoding : (string? path? . -> . (or/c string? #f)) = #f acl : (or/c #f string?) = #f reduced-redundancy? : any/c = #f
upload-metadata :
(and/c (hash/c symbol? string?) immutable?) = #hash()
link-mode : (or/c 'error 'follow 'redirect 'redirects 'ignore) = 'error log-info : (string . -> . void?) = log-s3-sync-info raise-error : (symbol? string? any/c ... . -> . any) = error
Typically, local-path refers to a directory and s3-path refers to a prefix for bucket item names. If local-path refers to a file and upload? is true, then a single file is synchronized to s3-bucket at s3-path. In that case, if s3-path ends with a / or it is already used as a prefix for bucket items, then the file name of local-path is added to s3-path to form the uploaded item’s name; otherwise, s3-path names the uploaded item. If upload? is #f and s3-path is an item name (and not a prefix on other item names), then a single bucket item is downloaded to local-path; if local-path refers to a directory, then the portion of s3-path after the last / is used as the downloaded file name.
If shallow? is true, then in download mode, bucket items are downloaded only when they correspond to directories that exist already in local-path (which is useful when local-path refers to a directory). In both download and upload modes, a true value of shallow? causes the state of s3-bucket to be queried in a directory-like way, exploring only relevant directories; that exploration can be faster than querying the full content of s3-bucket if it contains many more nested items (with the prefix s3-path) than files within local-path.
If check-metadata? is true, then in upload mode, bucket items are checked to ensure that the current metadata matches the metadata that would be uploaded, and the bucket item’s metadata is adjust if not.
downloaded only when they correspond to directories that exist already in local-path (which is useful when local-path refers to a directory). In both download and upload modes, a true value of shallow? causes the state of s3-bucket to be queried in a directory-like way, exploring only relevant directories; that exploration can be faster than querying the full content of s3-bucket if it contains many more nested items (with the prefix s3-path) than files within local-path.
If dry-run? is true, then actions needed for synchronization are reported via log, but no uploads, downloads, deletions, or redirection-rule updates are performed.
If jobs is more than 1, then downloads and uploads proceed in background threads.
If delete? is true, then destination items that have no corresponding item at the source are deleted.
If include-rx is not #f, then it is matched against bucket paths (including s3-path in the path). Only items that match the regexp are considered for synchronization. If exclude-rx is not #f, then any item whose path matches is not considered for synchronization (even if it also matches a provided include-rx).
If make-call-with-file-stream is not #f, it is called to get a function that acts like call-with-input-file to get the content of a file for upload or for hashing. The arguments to make-call-with-file-stream are the S3 name and the local file path. If make-call-with-file-stream or its result is #f, then call-with-input-file is used. See also make-gzip-handlers.
If get-content-type is not #f, it is called to get the Content-Type field for each file on upload. The arguments to get-content-type are the S3 name and the local file path. If get-content-type or its result is #f, then a default value is used based on the file extension (e.g., "text/css" for a "css" file).
The get-content-encoding argument is like get-content-type, but for the Content-Encoding field. If no encoding is provided for an item, a Content-Encoding field is omitted on upload. Note that the Content-Encoding field of an item can affect the way that it is downloaded from a bucket; for example, a bucket item whose encoding is "gzip" will be uncompressed on download, even though the item’s hash (which is used to avoid unnecessary downloads) is based on the encoded content.
If acl is not #f, then it use as the S3 access control list on upload. For example, supply "public-read" to make items public for reading. More specifically, if acl is not #f, then 'x-amz-acl is set to acl in upload-metadata.
If reduced-redundancy? is true, then items are uploaded to S3 with reduced-redundancy storage (which costs less, so it is suitable for files that are backed up elsewhere). More specifically, if reduced-redundancy is true, then 'x-amz-storage-class is set to "REDUCED_REDUNDANCY" in upload-metadata.
The upload-metadata hash table provides metadata to include with any file upload (and only to files that are otherwise determined to need uploading).
The link-mode argument determines the treatment of soft links in local-path:
'error —
reports an error 'follow —
follows soft links (i.e., treat it as a file or directory) 'redirect —
treat it as a redirection rule to be installed for s3-bucket as a web site on upload 'redirects —
treat it as a redirection rule to be installed for s3-bucket’s item as metadata on upload, while the item itself is uploaded as empty 'ignore —
ignore
The log-info and raise-error arguments determine how progress is logged and errors are reported. The default log-info function logs the given string at the 'info level to a logger whose name is 's3-sync.
Changed in version 1.2 of package s3-sync: Added 'redirects mode. Changed in version 1.3: Added the upload-metadata argument. Changed in version 1.4: Added support for a single file as local-path and a bucket item name as s3-path. Changed in version 1.5: Added the check-metadata? argument.
2 S3 gzip Support
(require s3-sync/gzip) | package: s3-sync |
procedure
(make-gzip-handlers pattern [ #:min-size min-size])
→
(string? path? . -> . (or/c #f (path? (input-port? . -> . any) . -> . any))) (string? path? . -> . (or/c string? #f)) pattern : (or/c regexp? string? bytes?) min-size : exact-nonnegative-integer? = 0
3 S3 Web Page Support
(require s3-sync/web) | package: s3-sync |
Added in version 1.3 of package s3-sync.
procedure
(s3-web-sync ...) → void?
#:acl —
defaults to web-acl #:reduced-redundancy? —
defaults to web-reduced-redundancy? #:upload-metadata —
defaults to web-upload-metadata #:make-call-with-input-file —
defaults to a gzip of files that match web-gzip-rx and web-gzip-min-size #:get-content-encoding —
defaults to a gzip of files that match web-gzip-rx and web-gzip-min-size
4 S3 Web Page Configuration
(require s3-sync/web-config) | package: s3-sync |
Added in version 1.3 of package s3-sync.
value
value
web-upload-metadata :
(and/c (hash/c symbol? string?) immutable?)
value