Training Data Description

This page describes Step 2 in the dAIve web app.

Overview

This step displays two side-by-side visualization panels that help you understand the training dataset before designing a model. No configuration is needed — the panels render automatically based on the selected inputs and outputs from Step 1.

Data Distribution

The left panel shows the Training Data Distribution visualization.

Use this panel to inspect:

outliers and unusual values
class balance (for classification targets)
weak or low-variance columns

Questions this panel helps answer:

are some values clearly outside the expected operating range?
is the target distribution usable for the planned task?
do some columns look effectively constant?

Data Correlation

The right panel shows the Training Data Correlation visualization (heatmap).

Use this panel to inspect:

relationships between features
relationships to the selected outputs
multicollinearity patterns

Questions this panel helps answer:

which inputs appear most related to the selected outputs?
which inputs are strongly redundant with each other?
do the relationships look plausible before training starts?

When to revisit this step

Come back here when:

the training file changed
inputs or outputs changed
a model behaves unexpectedly and data quality should be re-checked

This step is diagnostic only:

it helps you understand the dataset
it does not by itself change the model or training configuration

Training Data Description ​

Overview ​

Data Distribution ​

Data Correlation ​

When to revisit this step ​

Training Data Description

Overview

Data Distribution

Data Correlation

When to revisit this step