Skip to main content

Command Palette

Search for a command to run...

We Created Data Engineering Skills for Claude Code

Published
17 min read
We Created Data Engineering Skills for Claude Code

Data engineering looks like software development from a distance, but the work has a different center of gravity. You still write code, review changes, and ship to production, but the real risk often lives outside the diff: downstream lineage, metric correctness, warehouse cost, schema drift, PII exposure, and whether the business can still trust the data after a change lands.

Data Engineering Skills for Claude Code.jpg

Most data engineering work also does not begin and end with writing SQL. A small model change can require checking upstream sources, understanding downstream impact, updating dbt docs, adding tests, comparing outputs, reviewing query performance, and making sure sensitive data is handled correctly. We created data engineering skills for Claude Code to bring those workflows into the coding agent experience, giving data teams reusable playbooks for the work they already do every day.

You can also access these skills through Altimate Code, our open-source LLM harness for data engineering, instead of installing each one individually in Claude Code. Altimate Code includes the skills out of the box and is ready to use with Claude:

npm install -g altimate-code

Skills as Data Engineering Playbooks

A skill is a reusable workflow Claude Code can load when a task calls for a specific kind of expertise. For data engineering, that means more than a prompt template. Each skill gives Claude Code guidance on what context to gather, what checks to run, what tools to use, what risks to watch for, and what kind of answer or artifact to produce.

For example, a SQL review skill should not only say whether a query is syntactically valid. It should look for anti-patterns, safety issues, readability problems, performance risks, and whether the query is appropriate for the warehouse or dialect. A dbt testing skill should understand model structure, schema YAML, unit test patterns, and the kinds of edge cases that usually cause production data issues.

The goal is to make Claude Code less dependent on one giant perfect prompt from the user. Instead of asking a data engineer to spell out every step, the skill carries the workflow: build the model, document the columns, add the tests, inspect the lineage, compare the outputs, review the SQL, and summarize what changed.

The Data Engineering Workflows We Covered

We organized the skills around the workflows data teams repeat every week: building models, reviewing SQL, validating changes, debugging warehouse issues, governing sensitive data, and communicating results. Each skill is focused on a specific job, but the real value comes from how they work together across the lifecycle of a data change.

The current skill set covers six areas:

  • Building and maintaining dbt projects

  • Reviewing and improving SQL

  • Validating changes before they ship

  • Operating the warehouse

  • Governing data and team knowledge

  • Visualizing and explaining results


Building and Maintaining dbt Projects

The dbt skills cover the full lifecycle of a model: creating it, documenting it, testing it, understanding its downstream impact, and troubleshooting it when something breaks.

Creating and Changing Models

Att it’s core, dbt is a tool that allows users to create SQL models, so we had to have a skill that creates dbt models. It is called dbt-develop.

This skill teaches Claude how to both write new dbt models and change existing ones. If you are a dbt user, you know that is not as simple as just writing SQL. You have to understand the project setup, dialect of the database, patterns across the staging, intermediate, and mart models, and the lineage of data flowing throughout the project.

Documenting the Project

If you are anything like me, you’ve spent days or even weeks creating an intricate dbt project that takes messy, raw data and forms it into beautiful marts that you deliver to your stakeholders. You neglect to create good documentation for your work because you created it so why would I need to write what it does?

A few weeks later you receive a message from a colleague asking what a specific column does, and you have no idea what the answer is. Documentation might have helped out a bit here.

When I say documentation, I mean going past the basic notes of primary key or unique ID for customer. Our skill, dbt-docs, writes documentation that will answer questions about models and columns that you might not have thought about.

The skill guides Claude to write about aggregations, primary and foreign keys, upstream tables, etc. Our hope is that this skill will allow you to stop answering questions about your models and columns and spend more time building more.

https://youtu.be/PY63_Eu3Si4

Writing Tests for dbt Models

Before working in data, I was a teacher, so I have always loved a good test. Not the kind that exists just to make someone nervous, but the kind that makes expectations clear. A good test tells you what should happen, what matters, and where the gaps are.

That is exactly why tests matter in dbt. There is a special kind of confidence that comes from seeing a model run successfully. There is also a special kind of pain that comes from realizing later that the model ran successfully and still produced the wrong answer.

The dbt-test skill helps Claude Code add schema tests, unit tests, and data quality checks to dbt models. It can guide Claude Code through common checks like not_null, unique, relationships, and accepted_values, as well as custom generic or singular tests when a model needs something more specific.

For more complex model logic, the dbt-unit-tests skill goes deeper. It helps generate dbt unit tests by analyzing the model SQL, identifying upstream refs and sources, creating mock inputs, and assembling YAML that tests things like CASE statements, joins, window functions, null handling, aggregations, and incremental behavior.

The point is not to test for the sake of testing. It is to protect the assumptions your model depends on, so the next change does not quietly break the meaning of the data.

https://youtu.be/O49qIMh2QPQ

Understanding and Fixing Impact

dbt changes have a habit of traveling. A small edit to a source model can ripple into downstream marts, dashboards, tests, and metric definitions. The dbt-analyze skill helps Claude Code reason about that blast radius before a change ships by inspecting dependency and lineage context, including column-level lineage where available.

The other side of impact is troubleshooting. When a dbt project fails to compile, a model errors at runtime, a test starts failing, or a dashboard suddenly looks wrong, the dbt-troubleshoot skill gives Claude Code a diagnostic workflow instead of leaving it to guess from the error message alone. It helps separate compilation issues, warehouse errors, test failures, incorrect data, and performance problems.

Together, these skills help Claude Code answer two questions that come up constantly in dbt work: “What could this change affect?” and “What is actually going wrong?”


Reviewing and Improving SQL

Even in a dbt-heavy world, SQL is still the language data teams use to express business logic. But good SQL is not just SQL that runs. It should be readable, safe, efficient, and appropriate for the warehouse it runs on. The SQL skills help Claude Code review, optimize, and translate queries across the systems data teams actually use.

Reviewing SQL Before It Ships

Every data team has seen a query that technically works but makes you nervous. Maybe it has a risky join, a missing filter, unclear business logic, a performance issue hiding in plain sight, or a pattern that will be painful for the next person to maintain.

The sql-review skill gives Claude Code a pre-merge review workflow for SQL. It guides Claude Code to look for syntax issues, anti-patterns, readability problems, performance risks, and safety concerns before the query lands in production.

The goal is not just to ask, “Does this query run?” It is to ask, “Is this query understandable, reliable, and safe enough for the data system it is about to become part of?”

Optimizing Slow or Expensive Queries

Slow queries are rarely just an inconvenience in data engineering. They can block development, delay dashboards, frustrate stakeholders, and quietly drive up warehouse costs. A query that was fine on a small table can become a problem as data volume grows or as more teams start depending on it.

The query-optimize skill helps Claude Code inspect SQL with performance in mind. It can look for inefficient joins, unnecessary scans, filtering issues, aggregation patterns, and opportunities to rewrite the query in a way that is easier for the warehouse to execute.

This is where Claude Code becomes useful beyond code generation. It can help reason through why a query is slow, what tradeoffs are available, and how to make the SQL more efficient without losing the business logic that made the query valuable in the first place.

Translating Across Dialects

If you’ve done much traveling, you know that speaking the same language does not always mean speaking it the same way. SQL is similar. Snowflake, BigQuery, Databricks, Postgres, Redshift, MySQL, SQL Server, and DuckDB all share the same broad language, but each warehouse has its own syntax, functions, conventions, and sharp edges.

A query written for BigQuery will not always translate cleanly into Snowflake. The same is true when moving models between warehouses, supporting multiple customer environments, or trying to standardize logic across a mixed data stack.

That is why we built the sql-translate skill. It helps Claude Code translate SQL from one dialect to another while preserving the intent of the query. Think of it as Duolingo for Claude Code, except instead of asking it to practice ordering coffee, you are asking it to keep your business logic intact across warehouses.

If your team is migrating warehouses or working across multiple SQL engines, this skill gives Claude Code a much better starting point for making those translations safely.


Validating Changes Before They Ship

In software engineering, tests often tell you whether a change broke expected behavior. In data engineering, validation can be harder to pin down. A model can run, a query can return rows, and a migration can apply successfully while the meaning of the data has still changed.

That is why this category matters so much. Data teams do not just need help making changes. They need help understanding whether those changes are safe to ship. The validation skills help Claude Code compare outputs, inspect lineage, and look for migration risks before a change reaches production.

Comparing Data Outputs

One of the most common questions in data work is also one of the most important: did this change alter the data?

Sometimes the answer should be yes. You fixed a bug, changed a business definition, or added new logic. But often, especially during a refactor or migration, the goal is to preserve behavior while changing the implementation. That is where the data-parity skill becomes useful.

The data-parity skill helps Claude Code compare two tables or query results and diagnose exactly how they differ. It can be used for migration validation, ETL regression checks, and query refactor verification. Instead of stopping at “these counts do not match,” the workflow pushes toward understanding where they differ, why they differ, and whether the difference is expected.

Think of it as asking Claude Code to check the receipt before you leave the store. The change may look fine at a glance, but you want to know whether the numbers actually add up.

Seeing How Lineage Changed

A data change is not only about the rows that come out the other side. It is also about how values move through the system. When the lineage changes, the meaning of a column can change with it.

The lineage-diff skill helps Claude Code compare column-level lineage between two versions of a SQL query or model. It can show which data flow edges were added, removed, or changed, giving the reviewer a clearer picture of how the transformation shifted.

This is especially useful when reviewing changes that look small in the SQL but may affect important downstream fields. A join changes. A source column is swapped. A derived field starts pulling from a different upstream path. The query may still run, but the story of the data has changed.

The goal is to make those invisible changes visible before they surprise someone downstream.

Catching Migration Risk

Schema changes are another place where “it ran successfully” is not enough. A migration can apply cleanly and still introduce data loss, break assumptions, or create problems for downstream consumers.

The schema-migration skill helps Claude Code analyze DDL changes before they are applied. It looks for risks like type narrowing, dropped columns, missing defaults, removed constraints, and other breaking column changes that can quietly turn into production issues.

This matters because schema migrations often feel mechanical until they are not. Renaming a column, changing a type, or tightening a constraint might look straightforward in code, but those changes can affect ingestion jobs, BI tools, reverse ETL flows, contracts, and every team that has built something on top of the data.

The skill gives Claude Code a migration review mindset: not just “Can this statement execute?” but “What could this break if it does?”


Operating the Warehouse

Once data work is running in production, a different set of questions starts to matter. What is expensive? What is slow? What changed? Which workloads are driving spend? Which queries or models need attention? The warehouse is not just where data lives. It is also where performance and cost decisions show up.

The warehouse operations skills help Claude Code reason about those questions as part of the engineering workflow. Instead of treating cost and performance as separate admin tasks, these skills bring them closer to the code, queries, and models that created them.

Understanding Cost

Warehouse spend can be difficult to reason about because the cost is usually spread across queries, users, jobs, warehouses, and workloads. A dashboard may feel slow, a bill may jump, or a team may notice that a routine transformation suddenly got more expensive, but finding the cause is not always obvious.

The cost-report skill helps Claude Code analyze Snowflake query costs and identify optimization opportunities. It can help look at expensive queries, warehouse usage, credit consumption, unused resources, and patterns that may point to waste or inefficient workloads.

This gives Claude Code a FinOps lens. Not in the “please enjoy this spreadsheet of guilt” sense, but in the useful sense: where is the money going, what changed, and what can we do about it?

Diagnosing Operational Issues

Cost is only one part of operating a warehouse. Data teams also have to deal with slow queries, failing jobs, runtime errors, and models that behave differently as volume grows.

This is where several skills work together. The query-optimize skill helps Claude Code reason about slow or inefficient SQL. The dbt-troubleshoot skill gives it a workflow for compilation failures, runtime database errors, failing tests, incorrect data, and performance issues in dbt projects.

The goal is to make Claude Code useful when something is not healthy in production. It can help move from symptoms to causes: from “this dashboard is slow,” “this model failed,” or “our warehouse spend jumped” toward a clearer explanation of what is happening and what to try next.


Governance, Privacy, and Team Knowledge

Data engineering also carries responsibilities that do not fit neatly into “write the model” or “make the query faster.” Teams need to know where sensitive data lives, whether a query exposes it, and whether new work follows the standards the team has already agreed on. The governance and training skills help Claude Code operate with more awareness of privacy, compliance, and team-specific conventions.

Auditing Sensitive Data

Sensitive data has a way of showing up where you least expect it. An email field gets added to a downstream model. A phone number moves into an analytics table. An IP address appears in a query result that was supposed to be safe to share. Unlike pie, PII is much better when it is not casually passed around.

The pii-audit skill helps Claude Code classify schema columns for personally identifiable information and sensitive data, including direct identifiers like SSNs, emails, phone numbers, names, addresses, and credit card numbers, as well as quasi-identifiers like dates of birth, zip codes, IP addresses, and device IDs.

It can also check whether a query or dbt model exposes PII, distinguish between sensitive fields used internally and sensitive fields returned in the output, and help generate a PII inventory for compliance workflows like GDPR, CCPA, and HIPAA.

The point is not to turn Claude Code into a compliance department. It is to give it enough privacy awareness to pause at the right moments, surface risk, and help data teams avoid accidentally spreading sensitive data through models, reports, or ad hoc queries.

https://youtu.be/KtWwjVIhlGI

Teaching Claude Code Your Team’s Standards

That same teaching instinct shows up in the teach, train, and training-status skills. A lot of what makes someone effective on a data team is not just knowing SQL or dbt. It is learning the patterns, preferences, definitions, and little bits of context that make work fit the team around it.

Every data team has conventions that are obvious to the people who have been there long enough and invisible to everyone else. How staging models should be named. What belongs in marts. Which patterns are encouraged. Which shortcuts should be avoided. Which business definitions have sharp edges. Most of that knowledge lives in scattered docs, review comments, Slack threads, and the brains of the people who have answered the same question ten times.

The teach skill lets you show Claude Code an example file from your codebase and extract reusable patterns from it. The train skill helps Claude Code learn team standards from a document, style guide, or review checklist. The training-status skill shows what it has learned so far.

These skills help Claude Code move closer to the way your team actually works. The goal is not just technically valid output. It is output that reflects your standards, your naming conventions, your modeling patterns, and the context your team has built over time.


Visualizing and Explaining Results

Data work does not end when the query returns rows. At some point, someone needs to understand what the data is saying. That might mean a chart for a trend, a dashboard for a team, a KPI view for leadership, or a more interactive way to explore a dataset.

The data-viz skill helps Claude Code turn data into visual interfaces: charts, dashboards, KPI cards, analytics views, and reporting experiences. It guides Claude Code toward modern component libraries like Recharts, Tremor, Nivo, D3, Victory, visx, and shadcn/ui, depending on the project.

This matters because the last mile of data work is often communication. A model can be correct, tested, documented, and optimized, but if people cannot understand the result, the work is not finished. The data-viz skill helps Claude Code move from “here is the data” to “here is what the data means.”


Claude Code is already a powerful place to work with code. These skills make it more useful for the specific work data teams do every day: building models, writing tests, reviewing SQL, validating changes, checking lineage, auditing PII, understanding cost, translating dialects, and explaining results.

The goal is not to replace data engineers. It is to give them a better teammate inside the workflows they already know. Data engineering requires code, context, caution, and communication. These skills give Claude Code more of that context, so it can help with the work around the code, not just the code itself.

If you want to use the skills without installing them one by one in Claude Code, they are included in Altimate Code, our open-source LLM harness for data engineering. Altimate Code comes with these skills installed and is ready to work with Claude:

npm install -g altimate-code