dbt_synth_data
dbt_synth_data is a dbt package for creating synthetic data which I’ve built as part of my work at Education Analytics. The package’s features include:
- support for Snowflake, Postgres, SQLite, and DuckDB backends
- ability to generate various distributions including normal, exponential, binomial, and more
- ability to combine basic distributions by union or average to create more complex ones
- ability to generate many basic data types including boolean, numeric, string, and date
- ability to generate more complex data types including references to other tables, words, names, and addresses
- impressive performance, with ability (on Snowflake) to create billions of rows and hundreds of GB of synthetic data
At EA, we use dbt_synth_data to create synthetic data in the Ed-Fi data standard, which can then be used for
- testing user interfaces
- demoing applications to users without permission to access real data
- performance-tuning operational systems
- preparing training and other materials with realistic data
You can learn more about dbt_synth_data
in this presentation.