Skip to content

Training Data Description

This page describes Step 2 in the dAIve web app.

Overview

This step displays two side-by-side visualization panels that help you understand the training dataset before designing a model. No configuration is needed — the panels render automatically based on the selected inputs and outputs from Step 1.

Data Distribution

The left panel shows the Training Data Distribution visualization.

Use this panel to inspect:

  • outliers and unusual values
  • class balance (for classification targets)
  • weak or low-variance columns

Questions this panel helps answer:

  • are some values clearly outside the expected operating range?
  • is the target distribution usable for the planned task?
  • do some columns look effectively constant?

Data Correlation

The right panel shows the Training Data Correlation visualization (heatmap).

Use this panel to inspect:

  • relationships between features
  • relationships to the selected outputs
  • multicollinearity patterns

Questions this panel helps answer:

  • which inputs appear most related to the selected outputs?
  • which inputs are strongly redundant with each other?
  • do the relationships look plausible before training starts?

When to revisit this step

Come back here when:

  • the training file changed
  • inputs or outputs changed
  • a model behaves unexpectedly and data quality should be re-checked

This step is diagnostic only:

  • it helps you understand the dataset
  • it does not by itself change the model or training configuration

dAIve customer documentation for web app and desktop app