Training Data Assessment

This page describes Step 5 in the desktop app.

Overview

This step provides two side-by-side analysis panels that help you assess the training data and model after a successful training run. Both analyses retrain the model under controlled conditions to produce diagnostic results.

Data Size Analysis

The left panel answers the question: do I need more data?

dAIve retrains on progressively smaller subsets and shows how performance changes as data volume increases.

Interpretation:

rising curve: more data could help
plateau: more data is unlikely to help much
early sharp drop: the model learns basic patterns quickly

Use this analysis when:

you are unsure whether the model is data-limited
you need to decide between collecting more data and redesigning the model
stakeholders ask whether more samples are worth the effort

Input Dropout Analysis

The right panel removes one feature at a time and measures the impact on performance.

Use it to:

find important features
remove weak features
explain model behavior
find noisy or problematic columns

Good follow-up actions:

remove clearly unhelpful inputs
return to Step 1 to simplify the feature set
retrain after any important feature selection change

Availability

This step is only available after Step 4 completed successfully.

Retraining invalidates the assessment so that the analysis always matches the current model.

Compute credits

Both analyses consume compute credits. The credit cost depends on the model type:

NN models (FNN/RNN): 2.0 credits per analysis run
RF/XGB models: 1.0 credits per analysis run

Credits are multiplied by the run count. In batch mode, each trained model counts as a separate run. For optimizer runs, each trial counts. For cross-validation, each fold/repeat counts.

Training Data Assessment ​

Overview ​

Data Size Analysis ​

Input Dropout Analysis ​

Availability ​