Model Design

This page describes Step 3 in the desktop app.

Desktop-specific note:

desktop use is not tied to profile plans, Stripe billing, or workspace access checks
all features (Batch Mode, Advanced Model Design, Optimizer) are available without plan restrictions

Structure of the page

The page is built from a sidebar for model selection and a main panel for configuration:

Single Model or Batch Mode (sidebar)
model-specific hyperparameters (main panel)
Validation & Reliability (below the hyperparameters)

Recommended order:

choose single model or batch mode
choose the model family
decide between Basic and Advanced
configure validation and data split
enable the optimizer only after you have a usable baseline

Single Model vs Batch Mode

Single Model

Use this mode for one concrete configuration.

Available model families:

Random Forest (abbreviated RF on small screens)
XGBoost
Forward Neural Network (abbreviated FNN on small screens)
Recurrent Neural Network (abbreviated RNN on small screens) — only when Time Series is active

Batch Mode

Use this mode to compare multiple configurations in one run.

Plan restriction

Batch Mode is only available on the Pro plan. On the Basic plan, the button shows a lock overlay and clicking it prompts an upgrade.

In batch mode, you build a set of model configurations as cards:

select a model type from the dropdown (RF, XGBoost, FNN, or RNN when time series is active)
select a task type (Classification, Regression, or Mixed model)
configure the hyperparameters for that configuration
add the configuration to the set
repeat to compare multiple configurations side by side

Batch mode is useful when:

you want to compare model families directly
you want several hyperparameter variants in one training campaign
you need a more systematic shortlist before choosing one final model

Model families

Random Forest

Basic controls:

Number of Trees (n_estimators)

Advanced controls add:

Max Depth (default: 10)

XGBoost

Basic controls:

Number of Trees (n_estimators)

Advanced controls add:

Learning Rate (default: 0.1)
Max Depth (default: 6, optional Unlimited)
Subsample (default: 1.0)

Forward Neural Network

Basic controls:

Layers (1–5)
Neurons (1, 2, 3, 4, 5, 8, 16, 32, 64, 128, 256, 512)

Advanced controls add:

Learning Rate (default: 0.01)
Epochs (default: 300)
Batch Size (default: 32)
Optimizer
- Classification: fixed to adam
- Regression / Mixed: adam, adam weight decay, or stochastic gradient descent
Numeric Loss
- shown for Regression and Mixed
- options: mean squared error, mean absolute error, mean absolute percentage error, huber
Categorical Loss
- shown for Classification and Mixed
- Classification: fixed to categorical crossentropy
- Mixed: fixed to sparse categorical crossentropy
Dropout Rate (default: 0)
Normalization (default: on)
Batch Normalization (default: off)
per-layer architecture editing

When switching the optimizer to stochastic gradient descent, dAIve sets the learning-rate field to 0.001 as a starting point. adam weight decay is executed as an AdamW-style optimizer in the backend.

Recurrent Neural Network

Basic controls:

RNN Type (default: LSTM; options: LSTM, GRU, SimpleRNN)
Layers (1–5)
Neurons
Sequence Length (default: 10)
Bidirectional (default: off)

Advanced controls add:

Learning Rate (default: 0.01)
Epochs (default: 300)
Batch Size (default: 32)
Optimizer
- Classification: fixed to adam
- Regression / Mixed: adam, adam weight decay, or stochastic gradient descent
Numeric Loss
- shown for Regression and Mixed
- options: mean squared error, mean absolute error, mean absolute percentage error, huber
Categorical Loss
- shown for Classification and Mixed
- Classification: fixed to categorical crossentropy
- Mixed: fixed to sparse categorical crossentropy
Dropout Rate (default: 0)
Recurrent Dropout Rate (default: 0)
Normalization (default: on)
Batch Normalization (default: off)
per-layer architecture editing

Choosing a starting model

start with Random Forest or XGBoost for general tabular work
move to Forward Neural Network when you need more flexible nonlinear modeling
use Recurrent Neural Network only when time order is central to the task

Basic vs Advanced

Basic

fewer controls
best for first baselines
faster setup

Advanced

full parameter control
custom cross-validation controls (folds, repeats)
deeper neural network settings (per-layer architecture, dropout, normalization)

Plan restriction

Advanced Model Design is only available on the Pro plan.

Practical recommendation:

use Basic for the first baseline
switch to Advanced only when you already know what needs improvement

Optimizer

The Optimizer toggle enables automated hyperparameter optimization powered by Optuna.

Plan restriction

The Optimizer (Optuna) is only available on the Pro plan.

Basic optimizer mode

Displays a set of preset cards. Each preset defines a trial count and search configuration. Select a preset to apply it.

Advanced optimizer mode

Exposes full study settings:

Trials — number of optimization trials (1–1000)
Timeout (seconds) — optional time limit
Sampler — Tree-Structured Parzen (default), Random, or Grid
Pruner — Median or Disabled
Random Seed — optional reproducibility seed
Optimize Metric — the metric Optuna optimizes against

Advanced mode exposes sampler-specific search controls:

TPE and Random
- use min / max / step ranges for scalar parameters
- for FNN/RNN, use Network Structure Search with min/max layers and neurons
Grid
- uses Explicit Values only for scalar parameters
- values are entered as semicolon-separated lists such as 0.001; 0.01; 0.05
- decimal values use . as the decimal separator
- for FNN/RNN, grid search uses Network Structure Choices instead of min/max architecture ranges
- each architecture is entered on its own line, for example:

text

[8, 8]
[16, 16]
[32, 16, 8]

For RNN models, an additional Sequence Options section lets you choose which RNN types to include in the search (LSTM, GRU, SimpleRNN).

If the current Optuna configuration is incomplete or invalid, dAIve shows a warning box until the missing values are fixed.

Use the optimizer after:

the dataset and target setup are stable
you already ran at least one manual baseline
you know which model family is worth tuning further

Validation & Reliability

Fast

simple holdout validation
best for quick iteration

Use Fast when:

you are still exploring the setup
runtime matters more than robustness
you want the quickest feedback loop

When the optimizer is enabled in fast mode, Optuna runs in holdout mode without cross-validation.

Robust

cross-validation based evaluation
in Basic mode the defaults are 3 folds and 1 repeat
in Advanced mode folds (2–20) and repeats (1–10) can be edited

When the optimizer is active in robust mode, a CV Scope selector appears:

Tuning + Reporting (recommended) — cross-validation for both optimization and final reporting
Tuning only (faster) — skips separate cross-validation reporting to reduce training runs

For time series, dAIve automatically switches to TimeSeriesSplit without shuffling. An amber note confirms this when time series is active.

Use Robust when:

the dataset is small enough that variance between splits matters
you are comparing candidates seriously
you need more confidence before exporting a final model

Execution preview

Below the validation settings, an execution preview box summarizes exactly what will happen:

Tuning — shows the optimizer strategy and how many training runs per trial
Reporting — shows whether separate CV reporting runs will happen and how many
Final training — always 1× on the full training pool

This preview updates live as you change settings, so you can see the total training effort before starting.

Data Split Configuration

Automatic Split

Lets the user set:

Validation %
Test %
Train % (calculated) — derived as 100 minus validation and test
Random Seed

The percentages must sum to 100.

Warnings appear when:

validation or test is set to 0% (some metrics will be unavailable)
some values in the validation or test sets fall outside the training data range (extrapolation risk)

Automatic split is best when:

you do not already have dedicated validation and test sets
you want dAIve to manage the split consistently inside one dataset

Manual Upload

Lets the user upload:

Validation Data (.csv)
Test Data (.csv)

Manual upload is best when:

you already have fixed holdout datasets
the split must match an external evaluation standard
you want exact control over which records are used where

Model Design ​

Structure of the page ​

Single Model vs Batch Mode ​

Single Model ​

Batch Mode ​

Model families ​

Random Forest ​

XGBoost ​

Forward Neural Network ​

Recurrent Neural Network ​

Choosing a starting model ​

Basic vs Advanced ​

Basic ​

Advanced ​

Optimizer ​

Basic optimizer mode ​

Advanced optimizer mode ​

Validation & Reliability ​

Fast ​

Robust ​

Execution preview ​

Data Split Configuration ​

Automatic Split ​

Manual Upload ​

Model Design

Structure of the page

Single Model vs Batch Mode

Single Model

Batch Mode

Model families

Random Forest

XGBoost

Forward Neural Network

Recurrent Neural Network

Choosing a starting model

Basic vs Advanced

Basic

Advanced

Optimizer

Basic optimizer mode

Advanced optimizer mode

Validation & Reliability

Fast

Robust

Execution preview

Data Split Configuration

Automatic Split

Manual Upload