How Much Do Dynamic Tables Cost? The answer will surprise you!

·

6 min read

In my previous article on Dynamic Tables, I explained their purpose, to help simplify building transformation pipelines for either batch or real time data feeds in Snowflake. In this article, I'll explain what they cost.

Bored already? Read on to see how to reduce the annual cost of running three Dynamic Tables from $6.3m to just $26,280.

Dynamic Tables Cost Components

It may not be obvious but Dynamic Tables include multiple cost elements including:

  • Cloud Compute: Dynamic Tables are automatically refreshed on a repeating sequence depending upon the TARGET_LAG. For example, once per minute or once per hour.

  • Data Storage: Like any table, Dynamic Tables store data, and this will be billed at £23 per terabyte per month.

  • Warehouse Cost: Every time the Dynamic Table is refreshed it (currently) needs a Virtual Warehouse to execute the SQL, and this is billed on a per-second basis with a minimum of 60 seconds.

Let's consider each of these elements and it's potential impact upon the overall cost of ownership.

Cloud Compute Cost

This is by far the least expensive of the three cost elements and results from the automatic scheduling executed in the Cloud Services layer. Determined by the TARGET_LAG, the DT "wakes up" and checks if there any changes in the upstream table(s). This is likely to be thousands of a dollar each time, and for 99% of customers is free, unless it becomes more than 10% of your daily warehouse cost.

In short, don't worry about this.

Dynamic Table Storage Cost

While you need to store data in Snowflake (and therefore you'll be billed a passthru cost of £23 per terabyte), it's not clear that Dynamic Tables add to the storage cost. Consider the diagram below.

In the above diagram tables T1 and T2 are default tables, wheres T3, T4 and T5 are dynamic tables. If we assume T1 and T2 hold 1TB of data storage, and the transformation to T3 and T4 is simply to perform a basic data quality correction (for example, setting in incorrect SALES_TYPE column to NULL when the values are incorrect), then tables T3 and T4 will equally hold around 1TB of storage.

If the results of T3 and T4 are then combined using a JOIN operation into a single table T5 - ready for querying, this adds another 2TB to storage.

In total we store 6TBs of storage.

However, if we were not using Dynamic Tables, then tables T1 and T2 could be deleted down each time the new entries where processed. Likewise, once the process to join the tables and produce T5 was complete, tables T3 and T4 could be deleted.

This means, a solution without Dynamic Tables could deploy the same functionality in 2TBs compared to 6 TBs.

Data storage is a relatively minor cost on most Snowflake systems, but it's a cost worth understanding.

Dynamic Tables Warehouse Cost

This is almost certainly the greatest cost associated with Dynamic Tables, although unlike the data storage example above, there's no additional cost when compared to a transformation pipeline without DTs.

Effectively, each time a Dynamic Table triggers a refresh (provided there's changed data in the upstream tables), Snowflake executes the associated SQL on a Virtual Warehouse. However, even this has a potential gotcha. Consider again the diagram below.

The diagram above illustrates the same transformation pipeline, with two source tables and three dynamic tables. Notice that two of the tables are marked as Downstream? This means their refresh rate is determined by the downstream table - in this case a table refreshed every minute.

Let's assume new data is automatically inserted into the Sales table using Snowpipe which runs on a 24x7 basis. The Stores data is however, refreshed once per day.

Can you see what might happen to the cost profile of this simple transformation pipeline?

As new sales are automatically loaded into the Sales table (T1), Dynamic Tables T3 and T5 are automatically refreshed based upon the one minute TARGET_LAG.

Let's assume all Dynamic Tables were initially deployed on an XSMALL warehouse, this incredibly simple transformation pipeline would cost up to $1,576,800 per year. Worse still, if the warehouse was subsequently increased in size to a MEDIUM, this would increase to a shocking $6.3m a year for a single warehouse.

Around four years ago, I did a consultancy visit to a customer in Munich Germany who had deployed a similar pipeline, also running 24x7 and were shocked by the high cost. The customer justified the 24x7 operation because the system tracked global internet based sales, and could therefore receive sales 24x7.

After some time on the whiteboard to map out the equally simple pipeline, I understood the problem and asked the customer - "So, where are your Data Analysts based".

I'm sure they thought it was a stupid question, as their reply was like talking to a five year-old. "Here in Munich, John".

In response, I asked whether the Data Analysts were looking at the data at 4am, and if not, why were we processing data at 4am and paying for a virtual warehouse to wake up every minute leading to a huge bill?

Finally, I also questioned, why they were continuously processing data every minute. What action would the analysts take that needed the data to be as little as one minute stale?

In conclusion, the same transformation pipeline cost, even on a MEDIUM size warehouse could be cut by 50%, simply by suspending the Dynamic Table refresh from 8pm to 8am using a Snowflake Task.

We could further reduce the cost by a factor of four by executing the following SQL statements.

alter dynamic table T3 
   set warehouse = DT_XSMALL_WAREHOUSE;
alter dynamic table T4
   set warehouse = DT_XSMALL_WAREHOUSE;

The two simple SQL statements move the refresh operation to an XSMALL warehouse costing 25% of the cost. Of course, we'd need to check the refresh performance was acceptable, but assuming an INCREMENTAL refresh method, this should be pretty fast to complete.

Finally, we should question whether the Data Analytics really need the data refreshed every minute. Executing the following SQL would reduce costs by a factor of 60.

alter dynamic table T5
   set target_lag = '1 Hour';

"Never give a user what they ask for, give them what they need" - Dr. David Pearson.

Conclusion

While the cost of Cloud Services compute and even storage for Dynamic Tables is likely to negligible, we need to consider the cost of running a virtual warehouse.

With just three SQL statements, we reduced the maximum cost of this simple transformation pipeline from $6.3m per year down to $26,280.

To summarize what we learned:

  • Cloud Compute and Storage costs can (in most cases) be ignored. They shouldn't add significant cost to a pipeline deployed using a Dynamic Table.

  • Warehouse Costs however, can be remarkably high and lead to an eye-watering annual bill if deployed without thought.

  • Snowflake is not expensive! Like any tool, if used incorrectly it can lead to a poor outcome. Applying some common sense goes a long way.

  • TARGET_LAG: Is critical in determining the frequency with which a Dynamic Table query is executed. If the data changes frequently, it can lead to significant cost. Carefully choose a TARGET_LAG based on how fresh you need the data. Don't just set it to once per minute because you can.

  • Dynamically Change the lag: You can set up a simple schedule to suspend Dynamic Table refreshes or adjust the TARGET_LAG during off-peak or holiday periods. Suspending the refresh overnight will immediately reduce the costs by up to 50%.


DataPilot from Altimate.ai can proactively monitor and provide you with detailed insights on how and where time can be saved. If you, too, want to leverage this feature and save development time without increasing the cost, why not get in touch?

Click on the image below to book a demo.