Handling other formats
Earlier examples of loading JSON data assume the use of JSONEachRow
(ndjson). We provide examples of loading JSON in other common formats below.
Array of JSON objects
One of the most popular forms of JSON data is having a list of JSON objects in a JSON array, like in this example:
Let’s create a table for this kind of data:
To import a list of JSON objects, we can use a JSONEachRow
format (inserting data from list.json file):
We have used a FROM INFILE clause to load data from the local file, and we can see the import was successful:
Handling NDJSON (line delimited JSON)
Many apps can log data in JSON format so that each log line is an individual JSON object, like in this file:
The same JSONEachRow
format is capable of working with such files:
JSON object keys
In some cases, the list of JSON objects can be encoded as object properties instead of array elements (see objects.json for example):
ClickHouse can load data from this kind of data using the JSONObjectEachRow
format:
Specifying parent object key values
Let’s say we also want to save values in parent object keys to the table. In this case, we can use the following option to define the name of the column we want key values to be saved to:
Now, we can check which data is going to be loaded from the original JSON file using file()
function:
Note how the id
column has been populated by key values correctly.
JSON Arrays
Sometimes, for the sake of saving space, JSON files are encoded in arrays instead of objects. In this case, we deal with a list of JSON arrays:
In this case, ClickHouse will load this data and attribute each value to the corresponding column based on its order in the array. We use JSONCompactEachRow
format for this:
Importing individual columns from JSON arrays
In some cases, data can be encoded column-wise instead of row-wise. In this case, a parent JSON object contains columns with values. Take a look at the following file:
ClickHouse uses the JSONColumns
format to parse data formatted like that:
A more compact format is also supported when dealing with an array of columns instead of an object using JSONCompactColumns
format:
Saving JSON objects instead of parsing
There are cases you might want to save JSON objects to a single String
(or JSON) column instead of parsing it. This can be useful when dealing with a list of JSON objects of different structures. Let's take this file where we have multiple different JSON objects inside a parent list:
We want to save original JSON objects into the following table:
Now we can load data from the file into this table using JSONAsString
format to keep JSON objects instead of parsing them:
And we can use JSON functions to query saved objects:
Note that JSONAsString
works perfectly fine in cases we have JSON object-per-line formatted files (usually used with JSONEachRow
format).
Schema for nested objects
In cases when we're dealing with nested JSON objects, we can additionally define schema and use complex types (Array
, Object Data Type
or Tuple
) to load data:
Accessing nested JSON objects
We can refer to nested JSON keys by enabling the following settings option:
This allows us to refer to nested JSON object keys using dot notation (remember to wrap those with backtick symbols to work):
This way we can flatten nested JSON objects or use some nested values to save them as separate columns.
Skipping unknown columns
By default, ClickHouse will ignore unknown columns when importing JSON data. Let’s try to import the original file into the table without the month
column:
We can still insert the original JSON data with 3 columns into this table:
ClickHouse will ignore unknown columns while importing. This can be disabled with the input_format_skip_unknown_fields settings option:
ClickHouse will throw exceptions in cases of inconsistent JSON and table columns structure.
BSON
ClickHouse allows exporting to and importing data from BSON encoded files. This format is used by some DBMSs, e.g. MongoDB database.
To import BSON data, we use the BSONEachRow format. Let’s import data from this BSON file:
We can also export to BSON files using the same format:
After that, we’ll have our data exported to the out.bson
file.