blobxfer YAML Configuration
blobxfer accepts YAML configuration files to drive the transfer. YAML
configuration files are specified with the --config option to any
blobxfer command.
For an in-depth explanation of each option or the associated default value, please see the CLI Usage documentation for the corresponding CLI option.
Schema
The blobxfer YAML schema consists of distinct "sections". The following
sub-sections will describe each. You can combine all sections into the
same YAML file if desired as blobxfer will only read the required sections
to execute the specified command.
You can view a complete sample YAML configuration here. Note that the sample configuration file is just a sample and may not contain all possible options.
Configuration Sections
version
The version property specifies the version of the blobxfer YAML
configuration schema to use. This property is required.
version: 1
versionspecifies theblobxferYAML configuration schema to use. Currently the only valid value is1.
azure_storage
The azure_storage section specifies Azure Storage credentials that will
be referenced for any transfer while processing the YAML file. This section
is required.
azure_storage: endpoint: core.windows.net accounts: mystorageaccount0: ABCDEF... mystorageaccount1: ?se...
endpointspecifies for which endpoint to connect to with Azure Storage. Generally this can be omitted if using Public Azure regions.accountsis a dictionary of storage account names and either a storage account key or a shared access signature token. Note that if you are downloading a striped blob (Vectored IO), then all storage accounts for which the blob is striped to must be populated in this list.
options
The options section specifies general options that may be applied across
all other sections in the YAML configuration.
options: log_file: /path/to/blobxfer.log enable_azure_storage_logger: false resume_file: /path/to/resumefile.db progress_bar: true quiet: false dry_run: false verbose: true timeout: connect: null read: null max_retries: null concurrency: md5_processes: 2 crypto_processes: 2 disk_threads: 16 transfer_threads: 32 proxy: host: myproxyhost:6000 username: proxyuser password: abcd...
log_fileis the location of the log file to write toenable_azure_storage_loggercontrols the Azure Storage logger outputresume_fileis the location of the resume database to createprogress_barcontrols display of a progress bar output to the consolequietcontrols quiet modedry_runwill perform a dry runverbosecontrols if verbose logging is enabledtimeoutis a dictionary of timeout values in secondsconnectis the connect timeout to apply to a requestreadis the read timeout to apply to a requestmax_retriesis the maximum number of retries for a request
concurrencyis a dictionary of concurrency limitsmd5_processesis the number of MD5 offload processes to create for MD5 comparison checkingcrypto_processesis the number of decryption offload processes to createdisk_threadsis the number of threads for disk I/Otransfer_threadsis the number of threads for network transfers
proxydefines an HTTP proxy to use, if required to connect to the Azure Storage endpointhostis the IP:Port of the HTTP Proxyusernameis the username login for the proxy, if requiredpasswordis the password for the username for the proxy, if required
download
The download section specifies download sources and destination. Note
that download refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the download property as you need.
When the download command with the YAML config is specified, the list
is iterated and all specified sources are downloaded.
download: - source: - mystorageaccount0: mycontainer - mystorageaccount1: someothercontainer/vpath destination: /path/to/store/downloads include: - "*.txt" - "*.bxslice-*" exclude: - "*.bak" options: check_file_md5: true chunk_size_bytes: 16777216 delete_extraneous_destination: false delete_only: false max_single_object_concurrency: 8 mode: auto overwrite: true recursive: true rename: false restore_file_properties: attributes: true lmt: true rsa_private_key: myprivatekey.pem rsa_private_key_passphrase: myoptionalpassword strip_components: 1 skip_on: filesize_match: false lmt_ge: false md5_match: true - source: # next if needed...
sourceis a list of storage account to remote path mappingsdestinationis the local resource pathincludeis a list of include patternsexcludeis a list of exclude patternsoptionsare download-specific optionscheck_file_md5will integrity check downloaded files using the stored MD5chunk_size_bytesis the maximum amount of data to download per requestdelete_extraneous_destinationwill cleanup any files locally that are not found on the remote. Note that this interacts with include and exclude filters.delete_onlywill only perform the local cleanup. If this is specified astrue, thendelete_extraneous_destinationmust be specified astrueas well.max_single_object_concurrencyis the maximum number of concurrent transfers per objectmodeis the operating modeoverwritespecifies clobber behaviorrecursivespecifies if remote paths should be recursively searched for entities to downloadrenamewill rename a single entity source path to thedestinationrestore_file_propertiesrestores the following file properties if enabledattributeswill restore POSIX file mode and ownership if stored on the entity metadatalmtwill restore the last modified time of the file
rsa_private_keyis the RSA private key PEM file to use to decrypt encrypted blobs or filesrsa_private_key_passphraseis the RSA private key passphrase, if requiredstrip_componentsis the number of leading path components to strip from the remote pathskip_onare skip on options to usefilesize_matchskip if file size matchlmt_geskip if local file has a last modified time greater than or equal to the remote filemd5_matchskip if MD5 match
upload
The upload section specifies upload sources and destinations. Note
that upload refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the upload property as you need.
When the upload command with the YAML config is specified, the list
is iterated and all specified sources are uploaded.
upload: - source: - /path/to/hugefile1 - /path/to/hugefile2 destination: - mystorageaccount0: mycontainer/vdir - mystorageaccount1: someothercontainer/vdir2 include: - "*.bin" exclude: - "*.tmp" options: mode: auto access_tier: null chunk_size_bytes: 0 delete_extraneous_destination: true delete_only: false one_shot_bytes: 33554432 overwrite: true recursive: true rename: false rsa_public_key: mypublickey.pem skip_on: filesize_match: false lmt_ge: false md5_match: true stdin_as_page_blob_size: 0 store_file_properties: attributes: true cache_control: 'max-age=3600' content_type: 'text/javascript; charset=utf-8' md5: true strip_components: 1 vectored_io: stripe_chunk_size_bytes: 1000000 distribution_mode: stripe - source: # next if needed...
sourceis a list of local resource pathsdestinationis a list of storage account to remote path mappingsincludeis a list of include patternsexcludeis a list of exclude patternsoptionsare upload-specific optionsmodeis the operating modeaccess_tieris the access tier to set for the object. If not set, the default access tier for the storage account is inferred.chunk_size_bytesis the maximum amount of data to upload per request. This corresponds to the block size for block and append blobs, page size for page blobs, and the file chunk for files. Only block blobs can have a block size of up to 100MiB, all others have a maximum of 4MiB.delete_extraneous_destinationwill cleanup any files remotely that are not found on locally. Note that this interacts with include and exclude filters.delete_onlywill only perform the remote cleanup. If this is specified astrue, thendelete_extraneous_destinationmust be specified astrueas well.one_shot_bytesis the size limit to upload block blobs in a single request.overwritespecifies clobber behaviorrecursivespecifies if local paths should be recursively searched for files to uploadrenamewill rename a single entity destination path to a singlesourcersa_public_keyis the RSA public key PEM file to use to encrypt filesskip_onare skip on options to usefilesize_matchskip if file size matchlmt_geskip if remote file has a last modified time greater than or equal to the local filemd5_matchskip if MD5 match
stdin_as_page_blob_sizeis the page blob size to preallocate if the amount of data to be streamed from stdin is known beforehand and themodeispagestore_file_propertiesstores the following file properties if enabledattributeswill store POSIX file mode and ownershipcache_controlsets the CacheControl propertycontent_typesets the ContentType propertymd5will store the MD5 of the file
strip_componentsis the number of leading path components to strip from the local pathvectored_ioare the Vectored IO options to apply to the uploadstripe_chunk_size_bytesis the stripe width for each chunk ifstripedistribution_modeis selecteddistribution_modeis the Vectored IO mode to use which can be one of:disabledwill disable Vectored IOreplicawhich will replicate source files to target destinations on upload. Note that more than one destination should be specified.stripewhich will stripe source files to target destinations on upload. If more than one destination is specified, striping occurs in round-robin order amongst the destinations listed.
synccopy
The synccopy section specifies synchronous copy sources and destinations.
Note that synccopy refers to a list of objects, thus you may specify as many
of these sub-configuration blocks on the synccopy property as you need.
When the synccopy command with the YAML config is specified, the list
is iterated and all specified sources are synchronously copied.
synccopy: - source: - mystorageaccount0: mycontainer destination: - mystorageaccount0: othercontainer - mystorageaccount1: mycontainer include: - "*.bin" exclude: - "*.tmp" options: mode: auto dest_mode: auto access_tier: null delete_extraneous_destination: true delete_only: false overwrite: true recursive: true rename: false server_side_copy: true skip_on: filesize_match: false lmt_ge: false md5_match: true
sourceis a list of storage account to remote path mappings. All sources are copied to each destination specified. To use an arbitrary URL, specify the map as*: https://some.url/path.destinationis a list of storage account to remote path mappingsincludeis a list of include patternsexcludeis a list of exclude patternsoptionsare synccopy-specific optionsmodeis the source modedest_modeis the destination modeaccess_tieris the access tier to set for the object. If not set, the default access tier for the storage account is inferred.delete_extraneous_destinationwill cleanup any files in remote destinations that are not found in the remote sources. Note that this interacts with include and exclude filters.delete_onlywill only perform the remote cleanup. If this is specified astrue, thendelete_extraneous_destinationmust be specified astrueas well.overwritespecifies clobber behaviorrecursivespecifies if source remote paths should be recursively searched for files to copyrenamewill rename a single remote source entity to the remote destination pathserver_side_copywill perform the copy on Azure Storage servers. This option is enabled by default and destinations must be block blob. If destinations are not block blob, this option must be set tofalse.skip_onare skip on options to usefilesize_matchskip if file size matchlmt_geskip if source file has a last modified time greater than or equal to the destination filemd5_matchskip if MD5 match