Training Data Description
This page describes Step 2 in the dAIve web app.
Overview
This step displays two side-by-side visualization panels that help you understand the training dataset before designing a model. No configuration is needed — the panels render automatically based on the selected inputs and outputs from Step 1.
Data Distribution
The left panel shows the Training Data Distribution visualization.
Use this panel to inspect:
- outliers and unusual values
- class balance (for classification targets)
- weak or low-variance columns
Questions this panel helps answer:
- are some values clearly outside the expected operating range?
- is the target distribution usable for the planned task?
- do some columns look effectively constant?
Data Correlation
The right panel shows the Training Data Correlation visualization (heatmap).
Use this panel to inspect:
- relationships between features
- relationships to the selected outputs
- multicollinearity patterns
Questions this panel helps answer:
- which inputs appear most related to the selected outputs?
- which inputs are strongly redundant with each other?
- do the relationships look plausible before training starts?
When to revisit this step
Come back here when:
- the training file changed
- inputs or outputs changed
- a model behaves unexpectedly and data quality should be re-checked
This step is diagnostic only:
- it helps you understand the dataset
- it does not by itself change the model or training configuration
