less than 1 minute read

earthmover

earthmover is a command-line data transformation tool I’ve built as part of my work at Education Analytics. It

  • loads data from a variety of sources, including CSV and other types of files, relational database connections, and more
  • transforms the data according to instructions in a YAML file
  • renders a Jinja template (which can be any text-based data format, including JSON, XML, HTML, YAML, and more) for each row of transformed data, and saves the output to a file

earthmover is similar in some ways to dbt, but it is the transformation execution engine (rather than issuing SQL commands to a database backend which is the execution engine). earthmover is built using a number of Python libraries including Dask and NetworkX.

At EA, we use earthmover to transform various types of flat files into JSON according to the Ed-Fi data standard, which we then send to Ed-Fi APIs using lightbeam. You can learn more about both tools in this presentation.