Using the Raster Loader CLI

Most functions of the Raster Loader are accessible through the carto command-line interface (CLI). To start the CLI, use the carto command in a terminal.

Currently, Raster Loader allows you to upload a local raster file to a BigQuery, Snowflake, or Databricks table. You can also download and inspect a raster file from a BigQuery, Snowflake, or Databricks table.

Using the Raster Loader with BigQuery

Before you can upload a raster file, you need to have set up the following in BigQuery:

  1. A GCP project

  2. A BigQuery dataset

To use the bigquery utilities, use the carto bigquery command. This command has several subcommands, which are described below.

Note

Accessing BigQuery with Raster Loader requires the GOOGLE_APPLICATION_CREDENTIALS environment variable to be set to the path of a JSON file containing your BigQuery credentials. See the GCP documentation for more information.

Using the Raster Loader with Snowflake

Before you can upload a raster file, you need to have set up the following in Snowflake:

  1. A Snowflake account

  2. A Snowflake database

  3. A Snowflake schema

To use the snowflake utilities, use the carto snowflake command. This command has several subcommands, which are described below.

Using the Raster Loader with Databricks

Before you can upload a raster file, you need to have set up the following in Databricks:

  1. A Databricks server hostname

  2. A Databricks cluster id

  3. A Databricks token

To use the databricks utilities, use the carto databricks command. This command has several subcommands, which are described below.

Uploading a raster layer

To upload a raster file, use the carto [bigquery|snowflake|databricks] upload command.

The input raster must be a GoogleMapsCompatible raster. You can make your raster compatible by converting it with the following GDAL command:

gdalwarp -of COG -co TILING_SCHEME=GoogleMapsCompatible -co COMPRESS=DEFLATE -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=NO -co RESAMPLING=NEAREST -co BLOCKSIZE=512 <input_raster>.tif <output_raster>.tif

You have the option to also set up a table in your provider and use this table to upload your data to. In case you do not specify a table name, Raster Loader will automatically generate a table name for you and create that table.

At a minimum, the carto upload command requires a file_path to a local raster file that can be read by GDAL and processed with rasterio. It also requires the project (the GCP project name) and dataset (the BigQuery dataset name) parameters in the case of Bigquery; the database and schema parameters in the case of Snowflake; or the catalog and schema parameters in the case of Databricks.

There are also additional parameters, such as table (table name) and overwrite (to overwrite existing data). For example:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --overwrite

This command uploads the TIFF file from /path/to/my/raster/file.tif to a BigQuery project named my-gcp-project, a dataset named my-bigquery-dataset, and a table named my-bigquery-table. If the table already contains data, this data will be overwritten because the --overwrite flag is set.

The same operation, performed with Snowflake, would be:

carto snowflake upload \
  --file_path /path/to/my/raster/file.tif \
  --database my-snowflake-database \
  --schema my-snowflake-schema \
  --table my-snowflake-table \
  --account my-snowflake-account \
  --username my-snowflake-user \
  --password my-snowflake-password \
  --overwrite

Authentication parameters are explicitly required in this case for Snowflake, since they are not set up in the environment.

The same operation, performed with Databricks, would be:

carto databricks upload \
  --file_path /path/to/my/raster/file.tif \
  --catalog my-databricks-catalog \
  --schema my-databricks-schema \
  --table my-databricks-table \
  --server-hostname my-databricks-server-hostname \
  --cluster-id my-databricks-cluster-id \
  --token my-databricks-token \
  --overwrite

Authentication parameters are also explicitly required in the case of Databricks, since they are not set up in the environment.

If no band is specified, the first band of the raster will be uploaded. If the --band flag is set, the specified band will be uploaded. For example, the following command uploads the second band of the raster:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 2

Band names can be specified with the --band_name flag. For example, the following command uploads the red band of the raster:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 2 \
  --band_name red

If the raster contains multiple bands, you can upload multiple bands at once by specifying a list of bands. For example, the following command uploads the first and second bands of the raster:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 1 \
  --band 2

Or, with band names:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --band 1 \
  --band 2 \
  --band_name red \
  --band_name green

You can enable compression of the band data using the --compress flag. This uses gzip compression which can significantly reduce storage size. By default, it uses compression level 6, which provides a good balance between compression ratio and performance. You can adjust this using the --compression-level parameter (values from 1 to 9, where 1 is fastest but least compressed, and 9 gives maximum compression):

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --compress \
  --compression-level 3

The same works for Snowflake:

carto snowflake upload \
  --file_path /path/to/my/raster/file.tif \
  --database my-snowflake-database \
  --schema my-snowflake-schema \
  --table my-snowflake-table \
  --account my-snowflake-account \
  --username my-snowflake-user \
  --password my-snowflake-password \
  --compress \
  --compression-level 3

And for Databricks:

carto databricks upload \
  --file_path /path/to/my/raster/file.tif \
  --catalog my-databricks-catalog \
  --schema my-databricks-schema \
  --table my-databricks-table \
  --server-hostname my-databricks-server-hostname \
  --cluster-id my-databricks-cluster-id \
  --token my-databricks-token \
  --compress \
  --compression-level 3

See also

See the CLI details for a full list of options.

For large raster files, you can use the --chunk_size flag to specify the number of rows to upload at once, and preventing BigQuery from showing you an exception like the following, due to excessive operations in the destination table:

` Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors `

The default chunk size is 10000 rows.

For example, the following command uploads the raster in chunks of 20000 rows:

carto bigquery upload \
  --file_path /path/to/my/raster/file.tif \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table \
  --chunk_size 20000

For large raster files in Databricks, you might get the following error:

` Error uploading records: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array `

This error is due to the size of the raster file being too large to be uploaded in one go, and the default chunk size being too large. In this case, you can try to reduce the number of rows to upload at once by using the --chunk_size flag.

Inspecting a raster file

You can also use Raster Loader to retrieve information about a raster file stored in a BigQuery, Snowflake, or Databricks table. This can be useful to make sure a raster file was transferred correctly or to get information about a raster file’s metadata, for example.

To access a raster file in a BigQuery table, use the carto bigquery describe command.

At a minimum, this command requires a GCP project name, a BigQuery dataset name, and a BigQuery table name. For example:

carto bigquery describe \
  --project my-gcp-project \
  --dataset my-bigquery-dataset \
  --table my-bigquery-table

The same operation, performed with Snowflake, would be:

carto snowflake describe \
  --database my-snowflake-database \
  --schema my-snowflake-schema \
  --table my-snowflake-table \
  --account my-snowflake-account \
  --username my-snowflake-user \
  --password my-snowflake-password

Authentication parameters are explicitly required in this case for Snowflake, since they are not set up in the environment.

The same operation, performed with Databricks, would be:

carto databricks describe \
  --catalog my-databricks-catalog \
  --schema my-databricks-schema \
  --table my-databricks-table \
  --server-hostname my-databricks-server-hostname \
  --cluster-id my-databricks-cluster-id \
  --token my-databricks-token

Authentication parameters are also explicitly required in the case of Databricks, since they are not set up in the environment.

See also

See the CLI details for a full list of options.

CLI details

The following is a detailed overview of all of the CLI’s subcommands and options:

carto

The carto command line interface.

carto [OPTIONS] COMMAND [ARGS]...

bigquery

Manage Google BigQuery resources.

carto bigquery [OPTIONS] COMMAND [ARGS]...
describe

Load and describe a table from BigQuery

carto bigquery describe [OPTIONS]

Options

--project <project>

Required The name of the Google Cloud project.

--dataset <dataset>

Required The name of the dataset.

--table <table>

Required The name of the table.

--limit <limit>

Limit number of rows returned

--token <token>

An access token to authenticate with.

upload

Upload a raster file to Google BigQuery.

carto bigquery upload [OPTIONS]

Options

--file_path <file_path>

The path to the raster file.

--file_url <file_url>

The path to the raster file.

--project <project>

Required The name of the Google Cloud project.

--token <token>

An access token to authenticate with.

--dataset <dataset>

Required The name of the dataset.

--table <table>

The name of the table.

--billing_project <billing_project>

The name of the billing project. Default value is the –project parameter.

--band <band>

Band(s) within raster to upload. Could repeat –band to specify multiple bands.

--band_name <band_name>

Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.

--chunk_size <chunk_size>

The number of blocks to upload in each chunk.

--compress

Compress band data using zlib.

--overwrite

Overwrite existing data in the table if it already exists.

--append

Append records into a table if it already exists.

--cleanup-on-failure

Clean up resources if the upload fails. Useful for non-interactive scripts.

--exact_stats

Compute exact statistics for the raster bands.

--basic_stats

Compute basic stats and omit quantiles and most frequent values.

--compression-level <compression_level>

Compression level (1-9, higher = better compression but slower)

--band-valuelabels <band_valuelabels>

Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.

databricks

Manage Databricks resources.

carto databricks [OPTIONS] COMMAND [ARGS]...
describe

Load and describe a table from Databricks

carto databricks describe [OPTIONS]

Options

--server-hostname <server_hostname>

Required The Databricks workspace hostname.

--token <token>

Required The Databricks access token.

--cluster-id <cluster_id>

Required The Databricks cluster ID for Spark operations.

--catalog <catalog>

Required The name of the catalog.

--schema <schema>

Required The name of the schema.

--table <table>

Required The name of the table.

--limit <limit>

Limit number of rows returned

upload

Upload a raster file to Databricks.

carto databricks upload [OPTIONS]

Options

--server-hostname <server_hostname>

Required The Databricks workspace hostname.

--token <token>

Required The Databricks access token.

--cluster-id <cluster_id>

Required The Databricks cluster ID for Spark operations.

--file_path <file_path>

The path to the raster file.

--file_url <file_url>

The path to the raster file.

--catalog <catalog>

Required The name of the catalog.

--schema <schema>

Required The name of the schema.

--table <table>

The name of the table.

--band <band>

Band(s) within raster to upload. Could repeat –band to specify multiple bands.

--band_name <band_name>

Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.

--chunk_size <chunk_size>

The number of blocks to upload in each chunk.

--parallelism <parallelism>

Number of partitions when uploading each chunk.

--overwrite

Overwrite existing data in the table if it already exists.

--append

Append records into a table if it already exists.

--cleanup-on-failure

Clean up resources if the upload fails. Useful for non-interactive scripts.

--exact_stats

Compute exact statistics for the raster bands.

--basic_stats

Compute basic stats and omit quantiles and most frequent values.

--compress

Compress band data using zlib.

--compression-level <compression_level>

Compression level (1-9, higher = better compression but slower)

--band-valuelabels <band_valuelabels>

Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.

info

Display system information.

carto info [OPTIONS]

snowflake

Manage Snowflake resources.

carto snowflake [OPTIONS] COMMAND [ARGS]...
describe

Load and describe a table from Snowflake

carto snowflake describe [OPTIONS]

Options

--account <account>

Required The Swnoflake account.

--username <username>

The username.

--password <password>

The password.

--token <token>

An access token to authenticate with.

--private-key-path <private_key_path>

The path to the private key file. (PEM format)

--private-key-passphrase <private_key_passphrase>

The passphrase for the private key.

--role <role>

The role to use for the file upload.

--warehouse <warehouse>

Name of the default warehouse to use.

--database <database>

Required The name of the database.

--schema <schema>

Required The name of the schema.

--table <table>

Required The name of the table.

--limit <limit>

Limit number of rows returned

upload

Upload a raster file to Snowflake.

carto snowflake upload [OPTIONS]

Options

--account <account>

Required The Swnoflake account.

--username <username>

The username.

--password <password>

The password.

--token <token>

An access token to authenticate with.

--private-key-path <private_key_path>

The path to the private key file. (PEM format)

--private-key-passphrase <private_key_passphrase>

The passphrase for the private key.

--role <role>

The role to use for the file upload.

--warehouse <warehouse>

Name of the default warehouse to use.

--file_path <file_path>

The path to the raster file.

--file_url <file_url>

The path to the raster file.

--database <database>

Required The name of the database.

--schema <schema>

Required The name of the schema.

--table <table>

The name of the table.

--band <band>

Band(s) within raster to upload. Could repeat –band to specify multiple bands.

--band_name <band_name>

Column name(s) used to store band (Default: band_<band_num>). Could repeat –band_name to specify multiple bands column names. List of columns names HAVE to pair –band list with the same order.

--chunk_size <chunk_size>

The number of blocks to upload in each chunk.

--overwrite

Overwrite existing data in the table if it already exists.

--append

Append records into a table if it already exists.

--cleanup-on-failure

Clean up resources if the upload fails. Useful for non-interactive scripts.

--exact_stats

Compute exact statistics for the raster bands.

--basic_stats

Compute basic stats and omit quantiles and most frequent values.

--compress

Compress band data using zlib.

--compression-level <compression_level>

Compression level (1-9, higher = better compression but slower)

--band-valuelabels <band_valuelabels>

Custom data for valuelabels in JSON format, or ‘None’. i.e: ‘{<value_1>: <label_1>, <value_2>: <label_2>, …}’. Could repeat –band-valuelabels to specify multiple bands data. They will be considered in the order they appear in the file. Note that you can set any value to ‘None’ to omit valuelabels for that band.