Redshift vs snowflake

Redshift vs snowflake code#

The key differences between their benchmark and ours are: Amazon reported that Redshift was 6x faster and that BigQuery execution times were typically greater than one minute. In October 2016, Amazon ran a version of the TPC-DS queries on both BigQuery and Redshift. It's strange that they observed such slow performance, given that their clusters were 5–10x larger and their data was 30x larger than ours. They configured different-sized clusters for different systems, and observed much slower runtimes than we did: They used 30x more data (30 TB vs 1 TB scale). This benchmark was sponsored by Microsoft.

In April 2019, Gigaom ran a version of the TPC-DS queries on BigQuery, Redshift, Snowflake and Azure SQL Data Warehouse (Azure Synapse). Why Are Our Results Different Than Previous Benchmarks? Gigaom's cloud data warehouse performance benchmark A "spiky" workload that contains periodic large queries interspersed with long periods of idleness or lower utilization will be much cheaper in on-demand mode. A "steady" workload that utilizes your compute capacity 24/7 will be much cheaper in flat-rate mode. On-demand mode can be much more expensive, or much cheaper, depending on the nature of your workload. Snowflake has several pricing tiers associated with different features our calculations are based on the cheapest tier, "Standard." If you expect to use "Enterprise" or "Business Critical" for your workload, your cost will be 1.5x or 2x higher.īigQuery flat-rate is similar to Snowflake, except there is no concept of a compute cluster, just a configurable number of "compute slots." BigQuery on demand is a pure serverless model, where the user submits queries one at a time and pays per query. Every compute cluster sees the same data, and compute clusters can be created and removed in seconds. Snowflake is a nearly serverless experience: The user only configures the size and number of compute clusters. Redshift RA3 brings Redshift closer to the user experience of Snowflake by separating compute from storage. Pre-RA3 Redshift is somewhat more fully managed, but still requires the user to configure individual compute clusters with a fixed amount of memory, compute and storage. Presto is open-source, unlike the other commercial systems in this benchmark, which is important to some users. On the "self-hosted" end of the spectrum is Presto, where the user is responsible for provisioning servers and detailed configuration of the Presto cluster. We set up each warehouse in a small and large configuration for the 100GB and 1TB scales: We ran each query only once, to prevent the warehouse from caching previous results. These queries are complex: They have lots of joins, aggregations and subqueries. The largest fact table had 4 billion rows. TPC-DS has 24 tables in a snowflake schema the tables represent web, catalog and store sales of an imaginary retailer.

We generated the TPC-DS data set at 1TB scale.

Redshift vs snowflake code#

The source code for this benchmark is available at. They are complex: They contain hundreds of tables in a normalized schema, and our customers write complex SQL queries to summarize this data. These data sources aren’t that large: A typical source will contain tens to hundreds of gigabytes. We’ve tried to make these choices in a way that represents a typical Fivetran user, so that the results will be useful to the kind of company that uses Fivetran.Ī typical Fivetran user might sync Salesforce, JIRA, Marketo, Adwords and their production Oracle database into a data warehouse. The question we get asked most often is, “What data warehouse should I choose?” In order to better answer this question, we’ve performed a benchmark comparing the speed and cost of four of the most popular data warehouses:īenchmarks are all about making choices: What kind of data will I use? How much? What kind of queries? How you make these choices matters a lot: Change the shape of your data or the structure of your queries and the fastest warehouse can become the slowest. The market is converging around two key principles: separation of compute and storage, and flat-rate pricing that can "spike" to handle intermittent workloads.įivetran is a data pipeline that syncs data from apps, databases and file stores into our customers’ data warehouses. Redshift and BigQuery have both evolved their user experience to be more similar to Snowflake. Over the last two years, the major cloud data warehouses have been in a near-tie for performance. If you're interested in downloading this report, you can do so here.