8 Cloud Data Warehousing Solutions


Cloud Data Warehousing Solutions 2015

Cloud data warehousing solutions offers several potential advantages – the most important being resource elasticity. The demands made on a data warehouse vary enormously and a scalable cloud environment is an excellent solution to demand variability. All the big suppliers are engaged in this market, but some innovative solutions come from new players such as BitYota and Snowflake.

Amazon Redshift is a fully managed, petabyte-scale data warehouse solution and delivers fast query performance by using columnar storage. It uses standard PostgreSQL JDBC and ODBC drivers so familiar SQL clients can be used, and data load speed scales linearly with cluster size. Redshift allows users to automate most of the common administrative tasks associated with provisioning, configuring and monitoring a cloud data warehouse. Backups to Amazon S3 are continuous, incremental and automatic, and enabling disaster recovery across regions takes just a few clicks. Users can encrypt data at rest and in transit using hardware-accelerated AES-256 and SSL, isolate clusters using Amazon VPC and even manage keys using hardware security modules (HSMs). All API calls, connection attempts, queries and changes to the cluster are logged and auditable.

BitYota Data Warehouse as a Service solution can be hosted by several cloud providers (including MS Azure and AWS). This is a fully managed solution and BitYota monitor, manage, provision, and scale a data platform as needed. BitYota’s support for standard ANSI SQL allows users to query their data using a familiar language. Users have access to SQL OLAP operators or script based user-defined functions (UDFs) for complex analysis including joins, aggregations, window functions, etc, and use of favorite BI tool (such as Tableau, SQL Workbench, Looker, Power BI, etc.) to connect to the BitYota DWS using a standard ODBC/JDBC API to analyze data and visualize the results. Architected to maintain semi-structured data in its native format during the loading process, BitYota DWS does not require a transformation that results in a loss of data fidelity. The user interface allows users to visualize and explore the hierarchies and complex relationships in their data (e.g. nested arrays within JSON documents.).

dashDB is a fully managed cloud data warehouse that distinguishes itself through its in-memory processing and in-data base analytics. The in-memory processing utilises IBM’s BLU Acceleration technology which makes sure the data that needs processing is available in memory. dashDB comes with Netezza analytics – linear regression, decision tree clustering, k-means clustering, for example, and R for predictive analytics. So it’s more than just a database and other tools, such as IBM Cognos for reporting, are available if needed. dashDB can pull in both structured and unstructured data (IBM’s Cloudera JSON data store for example), and many other data sources are supported. The platform is deployed on IBM’s SoftLayer cloud infrastructure with multiple laters of security, and encryption if needed. Users can get their feet wet by subscribing to the free service, which is limited to 1GB of storage. Several enterprise subscription levels are available.

Microsoft Modern Data Warehouse offers the option to deploy directly to the cloud with the elastic scalability of Windows Azure. SQL Server Enterprise for data warehousing can be installed and hosted in the cloud on Windows Azure Virtual Machines. This image takes advantage of best practices from the Fast Track reference architecture to tune SQL Server for data warehousing in Windows Azure. Users can provision a highly tuned data warehouse image within minutes without knowing Azure storage configurations or needing expertise on how to optimize SQL Server for data warehousing workloads. This is an ideal solution for customers who want to deploy a data warehouse quickly without having to manage a hardware infrastructure. Users also can benefit from deploying non-relational Hadoop data in the cloud using the HDInsight Service on Windows Azure. The HDInsight Service provides an Hadoop solution that can seamless process data of all types through Microsoft’s modern data platform, which provides the simplicity, ease of management and enterprise-ready Hadoop service in the cloud.

SAP Business Warehouse cloud deployment provides the fast in-memory capabilities of SAP HANA for analytics and other purposes. It provides simplified administration and warehouse management.

Snowflake has built a new data warehouse from the ground up as a software service. Because Snowflake can directly load not just structured data, but also semi-structured data in formats including JSON and Avro, users don’t need to spend time transforming and converting data to make it ready to load. They also don’t need to worry about the traditional problem of finding an idle window in which to do loading—Snowflake’s architecture allows users to load data whenever they need, without any performance impact on other users and workloads. The environment is totally elastic, scaling storage and compute power up and down as needed, and very importantly it can accommodate any number of users without the usual contention and performance degradation.

Teradata provides its Data Warehousing, Aster discovery platform and Hadoop in a hosted infrastructure. The Data Warehouse as a Service is available in the US and includes core database functionality, workload management and Teradata tools and utilities. Discovery as a service (Aster Discovery Platform) includes Aster MR (Map Reduce) Analytical Foundation, Aster MR Analytical Premium (Path/Pattern, Statistics, Relationship), and Aster Graph Analytics. Each service includes:

  • Cloud Foundation: for cloud infrastructure functions such as hardware and software monitoring and maintenance, backup and recovery, and resource provisioning.
  • Enhanced Services: optional services for implementation of advanced database features (such as temporal, SQL-H etc.) and added network bandwidth options.
  • Consulting Services: optional services for advising on analytic best practices, migrating from existing databases, database administration, data quality, implementing applications (Teradata and 3rd party) and providing analytic access to BI applications. Teradata Managed Services, which provide operational and development support, are also a critical component of the consulting service capabilities.

Vertica on Demand from HP employs the fast Vertica column store database in an elastic infrastructure. Integration with other programs and tools is a feature, and users can seamlessly connect to Apache Hadoop, R, and a range of ETL and BI tools. Standard and advanced SQL are supported and Vertica Flex Zone allows exploration of semi-structured data. Advanced analytics tools enable sentiment analysis, geospatial analysis and predictive analytics.