Using the Raster Loader CLI
Most functions of the Raster Loader are accessible through the carto
command-line interface (CLI). To start the CLI, use the carto command in a
terminal.
Currently, Raster Loader allows you to upload a local raster file to a BigQuery, Snowflake, or Databricks table. You can also download and inspect a raster file from a BigQuery, Snowflake, or Databricks table.
Using the Raster Loader with BigQuery
Before you can upload a raster file, you need to have set up the following in BigQuery:
To use the bigquery utilities, use the carto bigquery command. This command has
several subcommands, which are described below.
Note
Accessing BigQuery with Raster Loader requires the GOOGLE_APPLICATION_CREDENTIALS
environment variable to be set to the path of a JSON file containing your BigQuery
credentials. See the GCP documentation for more information.
Using the Raster Loader with Snowflake
Before you can upload a raster file, you need to have set up the following in Snowflake:
A Snowflake account
A Snowflake database
A Snowflake schema
To use the snowflake utilities, use the carto snowflake command. This command has
several subcommands, which are described below.
Using the Raster Loader with Databricks
Before you can upload a raster file, you need to have set up the following in Databricks:
To use the databricks utilities, use the carto databricks command. This command has
several subcommands, which are described below.
Uploading a raster layer
To upload a raster file, use the carto [bigquery|snowflake|databricks] upload command.
The input raster must be a GoogleMapsCompatible raster. You can make your raster compatible
by converting it with the following GDAL command:
gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=NO -co RESAMPLING=NEAREST -co BLOCKSIZE=512 <input_raster>.tif <output_raster>.tif
You have the option to also set up a table in your provider and use this table to upload your data to. In case you do not specify a table name, Raster Loader will automatically generate a table name for you and create that table.
At a minimum, the carto upload command requires a file_path to a local
raster file that can be read by GDAL and processed with rasterio. It also requires
the project (the GCP project name) and dataset (the BigQuery dataset name)
parameters in the case of Bigquery; the database and schema parameters in the
case of Snowflake; or the catalog and schema parameters in the case of Databricks.
There are also additional parameters, such as table (table
name) and overwrite (to overwrite existing data). For example:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--overwrite
This command uploads the TIFF file from /path/to/my/raster/file.tif to a BigQuery
project named my-gcp-project, a dataset named my-bigquery-dataset, and a table
named my-bigquery-table. If the table already contains data, this data will be
overwritten because the --overwrite flag is set.
The same operation, performed with Snowflake, would be:
carto snowflake upload \
--file_path /path/to/my/raster/file.tif \
--database my-snowflake-database \
--schema my-snowflake-schema \
--table my-snowflake-table \
--account my-snowflake-account \
--username my-snowflake-user \
--password my-snowflake-password \
--overwrite
Authentication parameters are explicitly required in this case for Snowflake, since they are not set up in the environment.
The same operation, performed with Databricks, would be:
carto databricks upload \
--file_path /path/to/my/raster/file.tif \
--catalog my-databricks-catalog \
--schema my-databricks-schema \
--table my-databricks-table \
--server-hostname my-databricks-server-hostname \
--cluster-id my-databricks-cluster-id \
--token my-databricks-token \
--overwrite
Authentication parameters are also explicitly required in the case of Databricks, since they are not set up in the environment.
If no band is specified, the first band of the raster will be uploaded. If the
--band flag is set, the specified band will be uploaded. For example, the following
command uploads the second band of the raster:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--band 2
Band names can be specified with the --band_name flag. For example, the following
command uploads the red band of the raster:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--band 2 \
--band_name red
If the raster contains multiple bands, you can upload multiple bands at once by specifying a list of bands. For example, the following command uploads the first and second bands of the raster:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--band 1 \
--band 2
Or, with band names:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--band 1 \
--band 2 \
--band_name red \
--band_name green
You can enable compression of the band data using the --compress flag. This uses gzip compression which can significantly reduce storage size. By default, it uses compression level 6, which provides a good balance between compression ratio and performance. You can adjust this using the --compression-level parameter (values from 1 to 9, where 1 is fastest but least compressed, and 9 gives maximum compression):
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--compress \
--compression-level 3
The same works for Snowflake:
carto snowflake upload \
--file_path /path/to/my/raster/file.tif \
--database my-snowflake-database \
--schema my-snowflake-schema \
--table my-snowflake-table \
--account my-snowflake-account \
--username my-snowflake-user \
--password my-snowflake-password \
--compress \
--compression-level 3
And for Databricks:
carto databricks upload \
--file_path /path/to/my/raster/file.tif \
--catalog my-databricks-catalog \
--schema my-databricks-schema \
--table my-databricks-table \
--server-hostname my-databricks-server-hostname \
--cluster-id my-databricks-cluster-id \
--token my-databricks-token \
--compress \
--compression-level 3
See also
See the CLI details for a full list of options.
For large raster files, you can use the --chunk_size flag to specify the number of
rows to upload at once, and preventing BigQuery from showing you an exception like the following,
due to excessive operations in the destination table:
`
Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
`
The default chunk size is 10000 rows.
For example, the following command uploads the raster in chunks of 20000 rows:
carto bigquery upload \
--file_path /path/to/my/raster/file.tif \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table \
--chunk_size 20000
For large raster files in Databricks, you might get the following error:
`
Error uploading records: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array
`
This error is due to the size of the raster file being too large to be uploaded in one go,
and the default chunk size being too large. In this case, you can try to reduce the number of
rows to upload at once by using the --chunk_size flag.
Inspecting a raster file
You can also use Raster Loader to retrieve information about a raster file stored in a BigQuery, Snowflake, or Databricks table. This can be useful to make sure a raster file was transferred correctly or to get information about a raster file’s metadata, for example.
To access a raster file in a BigQuery table, use the carto bigquery describe command.
At a minimum, this command requires a GCP project name, a BigQuery dataset name, and a BigQuery table name. For example:
carto bigquery describe \
--project my-gcp-project \
--dataset my-bigquery-dataset \
--table my-bigquery-table
The same operation, performed with Snowflake, would be:
carto snowflake describe \
--database my-snowflake-database \
--schema my-snowflake-schema \
--table my-snowflake-table \
--account my-snowflake-account \
--username my-snowflake-user \
--password my-snowflake-password
Authentication parameters are explicitly required in this case for Snowflake, since they are not set up in the environment.
The same operation, performed with Databricks, would be:
carto databricks describe \
--catalog my-databricks-catalog \
--schema my-databricks-schema \
--table my-databricks-table \
--server-hostname my-databricks-server-hostname \
--cluster-id my-databricks-cluster-id \
--token my-databricks-token
Authentication parameters are also explicitly required in the case of Databricks, since they are not set up in the environment.
See also
See the CLI details for a full list of options.
CLI details
The following is a detailed overview of all of the CLI’s subcommands and options:
carto
The carto command line interface.
carto [OPTIONS] COMMAND [ARGS]...
bigquery
Manage Google BigQuery resources.
carto bigquery [OPTIONS] COMMAND [ARGS]...
describe
Load and describe a table from BigQuery
carto bigquery describe [OPTIONS]
Options
- --project <project>
Required The name of the Google Cloud project.
- --dataset <dataset>
Required The name of the dataset.
- --table <table>
Required The name of the table.
- --limit <limit>
Limit number of rows returned
- --token <token>
An access token to authenticate with.
upload
Upload a raster file to Google BigQuery.
carto bigquery upload [OPTIONS]
Options
- --file_path <file_path>
The path to the raster file.
- --file_url <file_url>
The path to the raster file.
- --project <project>
Required The name of the Google Cloud project.
- --token <token>
An access token to authenticate with.
- --dataset <dataset>
Required The name of the dataset.
- --table <table>
The name of the table.
- --billing_project <billing_project>
The name of the billing project. Default value is the –project parameter.
- --band <band>
Band(s) within raster to upload. Could repeat –band to specify multiple bands.
- --band_name <band_name>
Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.
- --chunk_size <chunk_size>
The number of blocks to upload in each chunk.
- --compress
Compress band data using zlib.
- --overwrite
Overwrite existing data in the table if it already exists.
- --append
Append records into a table if it already exists.
- --cleanup-on-failure
Clean up resources if the upload fails. Useful for non-interactive scripts.
- --exact_stats
Compute exact statistics for the raster bands.
- --basic_stats
Compute basic stats and omit quantiles and most frequent values.
- --compression-level <compression_level>
Compression level (1-9, higher = better compression but slower)
- --band-valuelabels <band_valuelabels>
Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.
databricks
Manage Databricks resources.
carto databricks [OPTIONS] COMMAND [ARGS]...
describe
Load and describe a table from Databricks
carto databricks describe [OPTIONS]
Options
- --server-hostname <server_hostname>
Required The Databricks workspace hostname.
- --token <token>
Required The Databricks access token.
- --cluster-id <cluster_id>
Required The Databricks cluster ID for Spark operations.
- --catalog <catalog>
Required The name of the catalog.
- --schema <schema>
Required The name of the schema.
- --table <table>
Required The name of the table.
- --limit <limit>
Limit number of rows returned
upload
Upload a raster file to Databricks.
carto databricks upload [OPTIONS]
Options
- --server-hostname <server_hostname>
Required The Databricks workspace hostname.
- --token <token>
Required The Databricks access token.
- --cluster-id <cluster_id>
Required The Databricks cluster ID for Spark operations.
- --file_path <file_path>
The path to the raster file.
- --file_url <file_url>
The path to the raster file.
- --catalog <catalog>
Required The name of the catalog.
- --schema <schema>
Required The name of the schema.
- --table <table>
The name of the table.
- --band <band>
Band(s) within raster to upload. Could repeat –band to specify multiple bands.
- --band_name <band_name>
Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.
- --chunk_size <chunk_size>
The number of blocks to upload in each chunk.
- --parallelism <parallelism>
Number of partitions when uploading each chunk.
- --overwrite
Overwrite existing data in the table if it already exists.
- --append
Append records into a table if it already exists.
- --cleanup-on-failure
Clean up resources if the upload fails. Useful for non-interactive scripts.
- --exact_stats
Compute exact statistics for the raster bands.
- --basic_stats
Compute basic stats and omit quantiles and most frequent values.
- --compress
Compress band data using zlib.
- --compression-level <compression_level>
Compression level (1-9, higher = better compression but slower)
- --band-valuelabels <band_valuelabels>
Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.
info
Display system information.
carto info [OPTIONS]
snowflake
Manage Snowflake resources.
carto snowflake [OPTIONS] COMMAND [ARGS]...
describe
Load and describe a table from Snowflake
carto snowflake describe [OPTIONS]
Options
- --account <account>
Required The Swnoflake account.
- --username <username>
The username.
- --password <password>
The password.
- --token <token>
An access token to authenticate with.
- --private-key-path <private_key_path>
The path to the private key file. (PEM format)
- --private-key-passphrase <private_key_passphrase>
The passphrase for the private key.
- --role <role>
The role to use for the file upload.
- --warehouse <warehouse>
Name of the default warehouse to use.
- --database <database>
Required The name of the database.
- --schema <schema>
Required The name of the schema.
- --table <table>
Required The name of the table.
- --limit <limit>
Limit number of rows returned
upload
Upload a raster file to Snowflake.
carto snowflake upload [OPTIONS]
Options
- --account <account>
Required The Swnoflake account.
- --username <username>
The username.
- --password <password>
The password.
- --token <token>
An access token to authenticate with.
- --private-key-path <private_key_path>
The path to the private key file. (PEM format)
- --private-key-passphrase <private_key_passphrase>
The passphrase for the private key.
- --role <role>
The role to use for the file upload.
- --warehouse <warehouse>
Name of the default warehouse to use.
- --file_path <file_path>
The path to the raster file.
- --file_url <file_url>
The path to the raster file.
- --database <database>
Required The name of the database.
- --schema <schema>
Required The name of the schema.
- --table <table>
The name of the table.
- --band <band>
Band(s) within raster to upload. Could repeat –band to specify multiple bands.
- --band_name <band_name>
Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.
- --chunk_size <chunk_size>
The number of blocks to upload in each chunk.
- --overwrite
Overwrite existing data in the table if it already exists.
- --append
Append records into a table if it already exists.
- --cleanup-on-failure
Clean up resources if the upload fails. Useful for non-interactive scripts.
- --exact_stats
Compute exact statistics for the raster bands.
- --basic_stats
Compute basic stats and omit quantiles and most frequent values.
- --compress
Compress band data using zlib.
- --compression-level <compression_level>
Compression level (1-9, higher = better compression but slower)
- --band-valuelabels <band_valuelabels>
Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.