Edit on github

Initial Datacoves Repository Setup

Introduction

Setting up a new data project requires careful consideration of tools, configurations, and best practices. Datacoves simplifies this process by providing a standardized, yet customizable setup through the dbt-coves library. This article explains how to initialize and maintain your Datacoves repository.

Getting Started with dbt-coves Setup

The dbt-coves setup command generates a fully configured project environment tailored to your specific needs. This command creates a repository structure with all necessary components pre-configured according to data engineering best practices.

Initial Setup Process

dbt-coves comes pre-installed in Datacoves, you only have to run:

# Create a new Datacoves repository
dbt-coves setup

During the setup process, you'll be guided through a series of configuration questions that determine:

  • Which data warehouse to use (Snowflake, BigQuery, Redshift, Databricks)
  • Which components to include in your stack (dbt, Airflow, dlt)
  • Project naming conventions
  • Repository structure preferences
  • CI/CD pipeline configurations
  • Testing and documentation settings

Note

It is recommended that you commit the answers file in your repo for future updates (see below)

What Gets Created

The dbt-coves setup command generates a comprehensive project structure that includes:

  1. dbt configuration

    • Pre-configured dbt project with appropriate adapters
    • Custom macros tailored to your selected data warehouse
    • Template generators for consistent model creation
  2. Orchestration tools

    • Airflow DAG templates (if selected)
    • Pipeline configurations
  3. Data loading

    • dlt configurations for data ingestion (if selected)
  4. Quality control

    • SQLFluff and YAMLlint configurations
    • dbt test frameworks
    • CI/CD workflows for GitHub Actions or GitLab CI
  5. Documentation

    • README templates
    • Project structure documentation

Customizing Your Setup

The setup process is highly flexible, allowing you to:

  • Select only the components you need
  • Configure folder structures based on your preferences
  • Set up CI/CD pipelines appropriate for your workflow
  • Include specialized macros for your specific data warehouse

Updating Your Repository

As your project evolves or as Datacoves releases template improvements, you can update your existing repository:

# Update an existing Datacoves repository
dbt-coves setup --update

The update process: - Preserves your custom code and configurations - Updates template-managed files with the latest versions - Adds any new components you select (it will remove the components you selected at Setup time but didn't select at Update time) - Maintains backward compatibility where possible

Note

When running an update, you will be prompted for the services you want to setup / update, if you saved the answers file from when you first ran set, your original choices pre-selected. If you unselect one of these, that content will be deleted

Benefits for Data Teams

This approach to repository setup and maintenance offers several advantages:

  1. Reduced setup time from days to minutes
  2. Consistency across projects and teams
  3. Built-in best practices for data modeling and CI/CD
  4. Easy maintenance through template updates
  5. Standardized testing and quality control

Conclusion

The dbt-coves setup command streamlines the creation and maintenance of Datacoves repositories by providing a solid foundation that incorporates industry best practices. Whether you're starting a new data project or standardizing existing ones, this approach offers a scalable and maintainable solution for modern data stack implementation.

By leveraging this setup process, data teams can focus on delivering value through data transformations and insights rather than spending time on infrastructure configuration.