Training Data Assessment
This page describes Step 5 in the desktop app.
Overview
This step provides two side-by-side analysis panels that help you assess the training data and model after a successful training run. Both analyses retrain the model under controlled conditions to produce diagnostic results.
Data Size Analysis
The left panel answers the question: do I need more data?
dAIve retrains on progressively smaller subsets and shows how performance changes as data volume increases.
Interpretation:
- rising curve: more data could help
- plateau: more data is unlikely to help much
- early sharp drop: the model learns basic patterns quickly
Use this analysis when:
- you are unsure whether the model is data-limited
- you need to decide between collecting more data and redesigning the model
- stakeholders ask whether more samples are worth the effort
Input Dropout Analysis
The right panel removes one feature at a time and measures the impact on performance.
Use it to:
- find important features
- remove weak features
- explain model behavior
- find noisy or problematic columns
Good follow-up actions:
- remove clearly unhelpful inputs
- return to Step 1 to simplify the feature set
- retrain after any important feature selection change
Availability
This step is only available after Step 4 completed successfully.
Retraining invalidates the assessment so that the analysis always matches the current model.
Compute credits
Both analyses consume compute credits. The credit cost depends on the model type:
- NN models (FNN/RNN): 2.0 credits per analysis run
- RF/XGB models: 1.0 credits per analysis run
Credits are multiplied by the run count. In batch mode, each trained model counts as a separate run. For optimizer runs, each trial counts. For cross-validation, each fold/repeat counts.
