CSV
Description
Comma Separated Values format (RFC).
When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter character, which is ,
by default. The delimiter character is defined in the setting format_csv_delimiter. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first, the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
$ clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FORMAT CSV" < data.csv
*By default, the delimiter is ,
. See the format_csv_delimiter setting for more information.
When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to the delimiter character or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) types are all supported.
NULL
is formatted according to setting format_csv_null_representation (default value is \N
).
In input data, ENUM values can be represented as names or as ids. First, we try to match the input value to the ENUM name. If we fail and the input value is a number, we try to match this number to the ENUM id. If input data contains only ENUM ids, it's recommended to enable the setting input_format_csv_enum_as_number to optimize ENUM parsing.
Example Usage
Format Settings
- format_csv_delimiter - the character to be considered as a delimiter in CSV data. Default value -
,
. - format_csv_allow_single_quotes - allow strings in single quotes. Default value -
true
. - format_csv_allow_double_quotes - allow strings in double quotes. Default value -
true
. - format_csv_null_representation - custom NULL representation in CSV format. Default value -
\N
. - input_format_csv_empty_as_default - treat empty fields in CSV input as default values. Default value -
true
. For complex default expressions, input_format_defaults_for_omitted_fields must be enabled too. - input_format_csv_enum_as_number - treat inserted enum values in CSV formats as enum indices. Default value -
false
. - input_format_csv_use_best_effort_in_schema_inference - use some tweaks and heuristics to infer schema in CSV format. If disabled, all fields will be inferred as Strings. Default value -
true
. - input_format_csv_arrays_as_nested_csv - when reading Array from CSV, expect that its elements were serialized in nested CSV and then put into string. Default value -
false
. - output_format_csv_crlf_end_of_line - if it is set to true, end of line in CSV output format will be
\r\n
instead of\n
. Default value -false
. - input_format_csv_skip_first_lines - skip the specified number of lines at the beginning of data. Default value -
0
. - input_format_csv_detect_header - automatically detect header with names and types in CSV format. Default value -
true
. - input_format_csv_skip_trailing_empty_lines - skip trailing empty lines at the end of data. Default value -
false
. - input_format_csv_trim_whitespaces - trim spaces and tabs in non-quoted CSV strings. Default value -
true
. - input_format_csv_allow_whitespace_or_tab_as_delimiter - Allow to use whitespace or tab as field delimiter in CSV strings. Default value -
false
. - input_format_csv_allow_variable_number_of_columns - allow variable number of columns in CSV format, ignore extra columns and use default values on missing columns. Default value -
false
. - input_format_csv_use_default_on_bad_values - Allow to set default value to column when CSV field deserialization failed on bad value. Default value -
false
. - input_format_csv_try_infer_numbers_from_strings - Try to infer numbers from string fields while schema inference. Default value -
false
.