CDP Public Cloud Preview Features

The information in these pages is released as part of a preview for the features described. Access to preview features is provided upon request to customers for trial and evaluation. The components are provided ‘as is’ without warranty or support. Further, Cloudera assumes no liability for the use of preview components, which should be used by customers at their own risk. Please contact your Cloudera account team to have a preview feature enabled in your CDP account.

Data Engineering

Using JVM debugger with Apache Spark jobs
published: 2022-11-23; modified: 2022-11-23
Learn how to connect a JVM debugger remotely to Spark jobs (driver/executor).
Using Custom Spark Runtime Docker Images via API/CLI
published: 2022-09-06; modified: 2022-07-31
Learn how to run Spark jobs using custom Spark runtime Docker images via API/CLI.
CDE In-place Upgrades
published: 2022-07-20; modified: 2022-12-06
Cloudera Data Engineering (CDE) supports upgrades from two versions prior to the latest available version.

Data Hub

Schedule-based Autoscaling for Data Hub Clusters Using Impala
published: 2023-11-09; modified: 2023-11-07
Schedule-based autoscaling for Data Hub clusters using Impala is a feature that scales the number of nodes in an executor host group up or down based upon a schedule that you define.
SQL AI Assistant in Data Hub
published: 2024-02-29; modified: 2024-02-29
Learn how to set up and use the Hue SQL AI Assistant in DataHub. Generate, optimize, edit, and fix SQL statements based on natural query language input prompts.

Data Warehouse

Using Hive Data Connectors to support External Data Sources
published: 2023-11-20; modified: 2023-11-20
Learn how you can achieve SQL query federation by using Hive data connectors to map databases present in external data sources to a local Hive Metastore (HMS).
SQL AI Assistant in CDW
published: 2023-11-20; modified: 2023-11-28
Learn how to set up and use the Hue SQL AI Assistant in CDW. Generate, optimize, edit, and fix SQL statements based on natural query language input prompts.
Deploying Hue at Environment Level
published: 2023-11-20; modified: 2023-11-28
Learn about the advantages and upgrade limitations of deploying Hue at the environment level and some FAQs that can help you understand more about the feature.
Reserving nodes for auto-scaling
published: 2022-07-26; modified: 2022-07-26
To speed up Virtual Warehouse startup and autoscaling, keep some number of compute instances on standby. You configure extra buffer nodes to stand by, ready to join a new compute or autoscaled cluster.

Governance

Integrating CDP Data Catalog with AWS Glue Data Catalog
published: 2021-08-09; modified: 2021-12-08
While using AWS Glue in Data Catalog, you will be able to experience a complete snapshot metadata view, along with other visible attributes that can power your data governance capabilities.
Navigating to tables and databases in Hue using Data Catalog
published: 2021-08-07; modified: 2021-08-07
The integration between Data Catalog and Cloudera Data Warehouse (CDW) service provides a direct web link to the Hue instance from the Data Catalog web UI, making it easy to navigate across services.
Support for CDP Private Cloud Base clusters in Data Catalog
published: 2022-02-24; modified: 2022-04-06
Data Catalog now supports discovering and profiling assets that reside in CDP Private Cloud Base clusters.
Supporting High Availability for Profiler services
published: 2021-08-07; modified: 2021-08-07
The Data Catalog profiler services is now supported by enabling the High Availability (HA) feature.
Transitioning Profiler Manager Service into SDX
published: 2022-02-24; modified: 2022-02-24
The Profiler Manager Service is moved to the SDX infrastructure.

Machine Learning

Model Registry
published: 2023-01-31; modified: 2023-05-02
Cloudera Machine Learning now features Model Registry which stores and manages machine learning models and associated metadata.
Private Cluster Support
published: 2022-01-06; modified: 2023-07-17
Private Clusters provide a simple way to create a secure cluster, where the API server and the workloads themselves only rely on private IP addresses that are not accessible from the internet.
CMK Encryption on AWS
published: 2021-08-10; modified: 2022-08-10
Cloudera Machine Learning on AWS is now able to use a Customer Master Key (CMK) to encrypt data.
Retry Workspace Installation
published: 2023-04-26; modified: 2023-04-26
When Workspace Provisioning experiences a problem, it is easy to restart the provisioning process from the point where it stopped.

Management Console

Horizontal scaling for the Data Lake
published: 2024-03-22; modified: 2024-03-22
An enterprise Data Lake can be scaled horizontally, meaning that you can add additional instances to dedicated host groups for some services.
Disk Vertical Scaling — Disk Type Change and Resizing in AWS
published: 2023-12-12; modified: 2023-12-12
The standard magnetic storage disks attached to Data Lake and Data Hub clusters can be changed or resized without downtime.
Rolling Data Lake Upgrades
published: 2023-07-11; modified: 2023-07-11
A Data Lake rolling upgrade allows you to upgrade the Data Lake Runtime and OS without stopping attached Data Hubs or Data Services.
GCS Fine-Grained Access Control
published: 2023-09-23; modified: 2023-12-13
Register a GCP environment with Ranger Authorization Service (RAZ) enabled to allow Google Cloud Storage (GCS) users to use fine-grained access policies and audit capabilities available in Apache Ranger.
Cluster Orchestrator Component Password Rotation
published: 2023-03-02; modified: 2023-03-02
If required, you can use the CDP CLI to manually rotate the cluster orchestrator component password.
Disabling S3Guard in an Existing CDP Environment
published: 2022-10-05; modified: 2022-10-05
You may need to disable S3Guard when upgrading your Data Lakes or Data Hubs. Use the Beta CDP CLI to disable S3Guard in an existing CDP environment.
Azure VM Encryption at Host
published: 2022-06-06; modified: 2022-06-06
You can optionally enable encryption at host for Data Lake, FreeIPA, and Data Hubs. Currently, you need to enable it individually for each Virtual Machine (VM) on Azure Portal.
New UI for adding a CDP Private Cloud Base cluster
published: 2022-03-29; modified: 2022-03-29
Register a CDP Private Cloud Base cluster as a classic cluster using Cloudera Manager and Knox endpoints so that you can use this cluster in Replication Manager and Data Catalog services.

Replication Manager

Snapshot Policies in Replication Manager
published: 2022-02-25; modified: 2022-02-25
You can create HDFS and HBase snapshot policies in Replication Manager to schedule taking snapshots of snapshottable HDFS directories and HBase tables at regular intervals. An HDFS directory is snapshottable after it has been enabled for snapshots, or because a parent directory is enabled for snapshots in Cloudera Manager.