Anonymized Web Analytics Data
This dataset consists of two tables containing anonymized web analytics data with hits (hits_v1
) and visits (visits_v1
).
The tables can be downloaded as compressed tsv.xz
files. In addition to the sample worked with in this document, an extended (7.5GB) version of the hits
table containing 100 million rows is available as TSV at https://datasets.clickhouse.com/hits/tsv/hits_100m_obfuscated_v1.tsv.xz.
Download and ingest the data
Download the hits compressed TSV file:
Create the database and table
For hits_v1
Or for hits_100m_obfuscated
Import the hits data:
Verify the count of rows
Download the visits compressed TSV file:
Create the visits table
Import the visits data
Verify the count
An example JOIN
The hits and visits dataset is used in the ClickHouse test routines, this is one of the queries from the test suite. The rest of the tests are referenced in the Next Steps section at the end of this page.
Next Steps
A Practical Introduction to Sparse Primary Indexes in ClickHouse uses the hits dataset to discuss the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices.
Additional examples of queries to these tables can be found among the ClickHouse stateful tests.
The test suite uses a database name test
, and the tables are named hits
and visits
. You can rename your database and tables, or edit the SQL from the test file.