The clustered column are configured the moment the table is configured.
Is possible to specify up to four top-level columns and should be specified in descending priority order.
If you already have the table created, it will need to be recreated, is the only limitation I can see at the moment, other than that, it should work without further modifications.
As I can see, one of the options to introduce clustering to the destination table would be to create it ready before inserting data into it. Do I get this one correctly?
Is it somehow also possible to define column clustering order as part of BigQuery Loaders mutator (sub-command create)?
Yes, is a good option to manualy create the table before inserting the data.
Regarding the change in the mutator sub command, I believe is not currently on the roadmap, but we are happy to review any PR that would add that feature.
Just for those stumbling up on this question at later point.
In scenario where you have already set up schema with partitioning, but would like to add clustering, the following can be considered.
It is possible to use DDL to create both BigQuery Partitions and Clusters.
In this Stackoverflow question an approach to derive created tables DDL from metadata can be seen.
SELECT
table_name, ddl
FROM
`bigquery-public-data`.census_bureau_usa.INFORMATION_SCHEMA.TABLES;
By this you can find the DDL for generating same table schema.
In order to add clustering, follow the Google Clouds documentation.
-- Set up a table with clustering.
CREATE TABLE myDataset.data (column1 INT64, column2 INT64)
PARTITION BY _PARTITIONDATE
CLUSTER BY column1, column2;
This definitely works well - but it should be noted that this will only cluster data after this action has been performed - any data that is already in that table will remain unclustered.