Normality & Levene module
This module evaluates whether your data meet the assumptions of normality and homogeneity of variances (homoscedasticity) that underlie parametric tests such as One-way ANOVA. It applies Shapiro-Wilk, Lilliefors (Kolmogorov-Smirnov), and Levene's test per group, generates Q-Q plots, and provides an overall interpretive summary with a recommendation.
Accepted file format
Upload an .xlsx file where:
- One column is a categorical factor (treatment, group, medium...) → set as Factor.
- One or more columns are numeric response variables → set as Response.
- Other columns can be set to Ignore.
Sidebar options
Step 1 — Upload data
Click
Browse .xlsx
to load an Excel file.
The file is read by
readxl::read_excel()
. Only the first
sheet is imported. Character/factor columns are auto-classified as Factor and
numeric columns as Response.
Step 2 — Classify variables
Each column is assigned one of three roles via a dropdown:
Factor (grouping):
The categorical column that defines the groups
(e.g. treatment, medium, genotype). Exactly one factor must be selected.
Response variable:
A numeric column with the measurements to test.
Ignore:
Columns excluded from the analysis.
Step 3 — Analysis setup
Grouping factor:
Select which Factor column defines the groups.
Response variable:
Select which Response column to test for normality.
Significance level (α):
The threshold for declaring significance
(default: 0.05). Applied uniformly to Shapiro-Wilk, Lilliefors, and Levene's test.
If p < α, the null hypothesis is rejected.
Tests included
Shapiro-Wilk (1965):
The most powerful test for normality for small-to-moderate samples (n ≤ 5000).
Tests the null hypothesis that the sample was drawn from a normal distribution.
Applied per group independently. Implemented via
stats::shapiro.test()
.
Lilliefors / KS (1967):
A variant of the Kolmogorov-Smirnov test that does not require the population
mean and variance to be specified a priori (parameters are estimated from the data).
Useful as a complement to Shapiro-Wilk, especially for larger n. Implemented via
nortest::lillie.test()
.
Q-Q Plots:
Normal quantile-quantile plots are generated per group using
ggplot2::stat_qq()
and
stat_qq_line()
. Points falling along the diagonal indicate normality; systematic deviations
suggest non-normal distributions.
Levene's test — median-based (Brown-Forsythe, 1974):
Tests the null hypothesis that all group variances are equal (homoscedasticity).
The median-based version (Brown-Forsythe) is more robust to non-normality than the
original mean-based formulation. Implemented via
car::leveneTest(center = 'median')
. This is the recommended check before running ANOVA.
Summary: The Summary tab provides an overall interpretation based on the results of all three tests. It counts the proportion of groups passing/failing Shapiro-Wilk and Lilliefors, evaluates Levene's test, and issues a recommendation (e.g. proceed with parametric ANOVA vs. consider a non-parametric alternative).
Output tabs
Shapiro-Wilk:
Table with W statistic, p-value, and significance verdict per group.
Lilliefors (K-S):
Table with D statistic, p-value, and significance verdict per group.
Q-Q Plots:
Faceted Q-Q plot for visual inspection of normality per group.
Levene's Test:
F-statistic, p-value, and significance verdict, plus a variance-per-group table.
Summary:
Integrated verdict and recommendation.
Downloads
Download Results (.xlsx): An Excel workbook with sheets for Analysis_Info, Shapiro-Wilk, Lilliefors, Levene, and Variance_Groups.
R packages used
stats
(shapiro.test),
nortest
(lillie.test),
car
(leveneTest),
ggplot2
(Q-Q plots),
readxl
(data import),
writexl
(Excel export),
DT
(interactive tables).
How to cite in your manuscript
Shapiro-Wilk:
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality
(complete samples).
Biometrika
,
52
(3–4), 591–611.
Lilliefors:
Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean
and variance unknown.
Journal of the American Statistical Association
,
62
(318), 399–402.
Levene / Brown-Forsythe:
Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances.
Journal of the American Statistical Association
,
69
(346), 364–367.
Correlations module
This module computes pairwise correlation coefficients between two or more numeric variables, tests their statistical significance, and provides visual summaries (heatmap and scatter plot). Both parametric (Pearson) and non-parametric (Spearman) methods are available.
Accepted file format
Upload an .xlsx file where:
- Two or more columns are numeric variables → set as Numeric.
- Other columns can be set to Ignore.
Sidebar options
Step 1 — Upload data
Click Browse .xlsx to load an Excel file. Numeric columns are auto-classified as Numeric; text/factor columns as Ignore.
Step 2 — Classify variables
Each column is assigned one of two roles:
Numeric variable:
Included in the pairwise correlation matrix.
At least two numeric variables are required.
Ignore:
Excluded from the analysis.
Step 3 — Analysis setup
Correlation method:
•
Pearson
— Measures linear association between two continuous
variables. Assumes bivariate normality. Best when the relationship is approximately linear.
•
Spearman
— A rank-based non-parametric measure of monotonic
association. Robust to non-normality and outliers. Recommended when normality cannot be assumed.
Handle missing values:
•
Complete observations only
— Only rows with no missing values
across all selected variables are used. All pairs share the same sample size.
•
Pairwise complete
— Each pair uses all rows where both variables
are non-missing. Maximises data usage but pairs may have different sample sizes.
Significance level (α):
The threshold for declaring a correlation
statistically significant (default: 0.05). Each pair is tested individually with
cor.test()
. Significance codes: *** p < 0.001, ** p < 0.01, * p < 0.05,
. p < 0.10, ns otherwise.
Output tabs
Correlation Table:
A long-format table showing all pairwise combinations
with columns for Variable_1, Variable_2, n, r (or ρ), p-value, confidence interval
(Pearson only), and significance code. Rows with p < α are highlighted.
Heatmap:
A colour-coded matrix of correlation coefficients (red = negative,
green = positive). Significant pairs are annotated with their r value and significance code.
Scatter Plot:
A bivariate scatter plot with regression line for any
selected pair. The controls above the plot allow choosing the X and Y variables.
The annotation shows r, p-value, and n.
Downloads
Download Results (.xlsx):
An Excel workbook with sheets for
Analysis_Info and Correlations (full pairwise table).
Download Heatmap (.pptx):
An editable PowerPoint slide containing the
correlation heatmap as a vector graphic (
rvg::dml
).
R packages used
stats
(cor.test),
ggplot2
(heatmap, scatter plot),
readxl
(data import),
writexl
(Excel export),
officer
+
rvg
(PowerPoint export),
DT
(interactive tables).
How to cite in your manuscript
Pearson:
Pearson, K. (1895). Notes on regression and inheritance in the case of two parents.
Proceedings of the Royal Society of London
,
58
, 240–242.
Spearman:
Spearman, C. (1904). The proof and measurement of association between two things.
The American Journal of Psychology
,
15
(1), 72–101.
One-way ANOVA module
This module performs a One-way Analysis of Variance (ANOVA) to test whether the means of a numeric response variable differ significantly across the levels of a single categorical factor. If the ANOVA is significant, a post-hoc test (Tukey HSD or Bonferroni) identifies which specific group pairs differ. Results are presented as a publication-ready bar plot with compact letter display (CLD), an ANOVA summary table, descriptive statistics, and a pairwise comparison table.
Accepted file format
Upload an .xlsx file where:
- One (or more) column(s) are categorical factors (treatment, medium, genotype...) → set as Factor.
- The remaining columns are numeric response variables (measurements) → set as Response.
- Columns you do not need can be set to Ignore.
Sidebar options
Step 1 — Upload data
Click Browse .xlsx to load an Excel file. Character/factor columns are auto-classified as Factor and numeric columns as Response.
Step 2 — Classify variables
Each column is assigned one of three roles:
Factor (grouping):
The categorical column defining groups.
Response variable:
A numeric column with measurements.
Ignore:
Excluded from the analysis.
Step 3 — Analysis setup
Grouping factor:
Which Factor column defines the groups to compare.
Response variable:
Which numeric column contains the measurements.
Y-axis label:
Custom label for the bar plot Y-axis. Auto-filled with the
response variable name; editable to any text (e.g. 'Shoot length (cm)').
Significance level (α):
Threshold for ANOVA and post-hoc significance
(default: 0.05). Affects the CLD grouping letters and the Tukey/Bonferroni significance flag.
Post-hoc test:
•
Tukey HSD
— Honestly Significant Difference. Controls the
family-wise error rate for all pairwise comparisons. Computed via
agricolae::HSD.test()
. This is the standard choice for ANOVA post-hoc.
•
Bonferroni
— Adjusts α by the number of comparisons. More
conservative than Tukey HSD; appropriate when controlling for multiple comparisons strictly.
Color palette:
•
AgroBio (green)
— The institutional green palette.
•
Viridis
— A perceptually uniform, colourblind-friendly palette.
•
Colorblind safe
— The Okabe-Ito palette, optimised for
deuteranopia and protanopia.
Bar plot order:
•
Alphabetical (A → Z / Z → A)
— Groups sorted by name.
•
Mean (high → low / low → high)
— Groups sorted by
their mean response value.
•
Custom
— A text area appears where you define the exact order
(one group per line).
What is One-way ANOVA?
One-way Analysis of Variance tests the null hypothesis that all group means
are equal (H₀: μ₁ = μ₂ = … = μk) against
the alternative that at least one differs. The F-statistic is the ratio of
between-group variance to within-group variance. If p < α, at least one
pair of groups differs significantly. ANOVA is computed via
stats::aov()
.
The bar plot displays group means with error bars (± standard error) and compact letter display (CLD). Groups sharing the same letter are not significantly different. CLD letters are derived from the selected post-hoc test.
Output tabs
Bar Plot:
Publication-ready bar chart with group means, SE error bars,
significance letters (CLD), and customisable colours/order.
ANOVA Results:
The ANOVA summary table (Df, Sum Sq, Mean Sq, F, p)
plus descriptive statistics per group (n, mean, SD, SE, CV%).
Post-hoc Pairwise:
Full pairwise comparison table from the selected
post-hoc test (difference, CI, adjusted p-value, significance flag).
Downloads
Download Plot (.pptx):
An editable PowerPoint slide with the bar plot
as a vector graphic (
rvg::dml
). Fully editable in PowerPoint.
Download Results (.xlsx):
An Excel workbook with sheets for
ANOVA_Table, Descriptive_Stats, Post-hoc pairwise comparisons, CLD letters, and
Analysis_Info.
R packages used
stats
(aov),
agricolae
(HSD.test for Tukey HSD and CLD),
ggplot2
(bar plot),
scales
(colour palettes),
readxl
(data import),
writexl
(Excel export),
officer
+
rvg
(PowerPoint export),
DT
(interactive tables).
How to cite in your manuscript
For One-way ANOVA:
Fisher, R. A. (1925).
Statistical Methods for Research Workers.
Oliver & Boyd.
For Tukey HSD:
Tukey, J. W. (1949). Comparing individual means in the analysis of variance.
Biometrics
,
5
(2), 99–114.
For agricolae (CLD implementation):
de Mendiburu, F. (2023).
agricolae: Statistical Procedures for
Agricultural Research.
R package.
For Bonferroni correction:
Dunn, O. J. (1961). Multiple comparisons among means.
Journal of the American Statistical Association
,
56
(293), 52–64.
Conditional Inference Tree module
This module fits a Conditional Inference Tree (CTree) to identify the predictor variables and split thresholds that best partition the data with respect to a target response variable. CTree avoids variable selection bias by embedding permutation-based hypothesis testing directly into the tree-building procedure.
Accepted file format
Upload an .xlsx file where:
- One (or more) column(s) are categorical factors (treatment, medium, genotype...) → set as Factor or Predictor.
- The remaining columns are numeric response variables (measurements) → set as Target or Predictor.
- Columns you do not need can be set to Ignore.
Sidebar options
Step 1 — Upload data
Click Browse .xlsx to load an Excel file. Numeric columns are auto-classified as Predictor; text/factor columns also as Predictor (CTree handles both numeric and categorical predictors natively).
Step 2 — Classify variables
Each column is assigned one of three roles:
Target (response):
The variable to model (numeric or categorical).
Exactly one target must be selected.
Predictor:
Input variables used to split the tree. Multiple predictors
can (and should) be selected. Numeric predictors with ≤ 10 unique values are
automatically converted to factors.
Ignore:
Excluded from the analysis.
Step 3 — Analysis setup
Target variable:
The response variable to model.
Predictors (select multiple):
Multi-select list of input variables.
All predictors are selected by default; deselect any you wish to exclude.
Significance level (α):
The threshold for splitting (default: 0.05).
At each node, the algorithm tests independence between each predictor and the response
using permutation tests. A split is only performed if the smallest adjusted p-value
is below α. Lower values produce simpler trees.
Max tree depth:
Maximum number of levels from root to terminal node
(default: 4). Depth 3–4 is recommended for biological interpretability.
Depth ≥ 6 may produce overly specific splits.
Min observations per node:
Minimum number of observations required in
a terminal node (default: 10, minimum: 3). Prevents the tree from making splits
based on very few data points. Increase for noisy datasets.
Node annotation (continuous target):
Controls the statistics displayed
inside terminal-node boxplots:
•
None
— Standard boxplot only.
•
Median
— Annotates each node with Md = value.
•
Mean ± SD
— Annotates with x̅ = value ± SD.
•
Median + Mean
— Shows both Md and x̅ side by side.
What is CTree?
The Conditional Inference Tree (CTree) is a non-parametric decision tree algorithm based on the conditional inference framework proposed by Hothorn et al. (2006). Unlike classical recursive partitioning methods such as CART or CHAID, CTree avoids variable selection bias by embedding statistical hypothesis testing directly into the tree-building procedure. At each node, the algorithm tests the null hypothesis of independence between each predictor and the response variable using permutation-based significance tests, and only proceeds with a split if the association is statistically significant after correction for multiple comparisons. The splitting variable is selected as the one with the smallest corrected p-value, and the optimal split point is determined by maximising the test statistic. This approach ensures that variable selection and split criteria are not influenced by the number of categories or the scale of measurement of the predictors.
In AgroBioSTAT, CTree is implemented via the
ctree()
function from the
partykit
R package.
Output tabs
Tree:
The full conditional inference tree plot. Inner nodes show the
splitting variable, the test p-value, and the split threshold. Terminal nodes show
boxplots (continuous target) or bar plots (categorical target), optionally annotated
with summary statistics. Full-screen mode is available.
Node Statistics:
A table with one row per terminal node, showing
N, Mean, SD, Min, Max (continuous targets) or class distribution (categorical targets).
Model Info:
A summary table with the target variable, predictors used,
number of observations, terminal/internal node counts, and parameter settings.
Downloads
Download Results (.xlsx):
An Excel workbook with sheets for
Model_Info, Node_Statistics, and Variable_Importance (split frequency count).
Download Tree (.pptx):
An editable PowerPoint slide with the
CTree plot as a vector graphic (
rvg::dml
).
R packages used
partykit
(ctree, ctree_control, plotting),
ggplot2
(auxiliary plots),
readxl
(data import),
writexl
(Excel export),
officer
+
rvg
(PowerPoint export),
DT
(interactive tables).
How to cite CTree in your manuscript
For the algorithm:
Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: A
conditional inference framework.
Journal of Computational and Graphical Statistics
,
15
(3), 651–674.
https://doi.org/10.1198/106186006X133933
For the R implementation:
Hothorn, T., & Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning
in R.
Journal of Machine Learning Research
,
16
, 3905–3909.
Decision Rules (ANN) module
This module trains an Artificial Neural Network (ANN) and then extracts human-readable IF–THEN rules via a surrogate decision tree. It combines the predictive power of ANNs with the interpretability of rule-based models. Additionally, it computes variable importance using connection-weight algorithms and provides nested cross-validation for robust model evaluation.
Accepted file format
Upload an .xlsx file where:
- One or more columns are Predictors (inputs, numeric or categorical).
- One column is the Target (response variable, numeric).
- Other columns can be set to Ignore.
Note: Categorical predictors are automatically one-hot encoded before training. The Target must always be numeric.
Sidebar options
Step 1 — Upload data
Click Browse .xlsx to load an Excel file. Numeric columns are auto-classified as Predictor (numeric); text/factor columns as Predictor (categorical).
Step 2 — Classify variables
Each column is assigned one of the available roles:
Target (response):
The numeric variable to predict. Exactly one must be selected.
Predictor (numeric):
A continuous input variable.
Predictor (categorical):
A categorical input variable. Automatically
one-hot encoded via
model.matrix()
(full-rank encoding) before scaling
and training.
Ignore:
Excluded from the analysis.
Note: text/factor columns can only be assigned as Predictor (categorical)
or Ignore; numeric columns can be Target, Predictor (numeric), or Ignore.
Step 3 — Analysis setup
Target variable:
The numeric response to model.
Predictors (select multiple):
Multi-select list of input variables
(both numeric and categorical). All predictors are selected by default.
Hidden layers
Defines the topology of the ANN. Enter comma-separated integers,
e.g.
8,5,3
for three hidden layers with 8, 5, and 3 neurons.
A single number (e.g.
5
) creates one hidden layer.
8,5,3
is the default. Simpler datasets may work with
5,3
or even
3
; more complex datasets may benefit from wider layers.
Avoid very deep architectures with small datasets (n < 100).
Activation function
The non-linear function applied at each hidden neuron.
Logistic
(sigmoid) — outputs in [0, 1]. The classical
default for most applications.
Tanh
— outputs in [−1, 1]. Often converges
faster than logistic for standardised data.
Learning algorithm
The optimisation algorithm used for training.
Rprop
(Resilient Propagation) — the recommended default.
Adapts the learning rate per weight, solving convergence issues common with
standard backpropagation.
SCG
(Scaled Conjugate Gradient) — a second-order method
that can be faster for medium-sized datasets.
Backprop + Momentum
— classic backpropagation with momentum
term. May require tuning the learning rate.
Standard Backprop
— basic gradient descent without momentum.
Generally slower and less robust than Rprop or SCG.
Max iterations (maxit)
Maximum number of training iterations. The algorithm may converge before
reaching this limit. The default (
1000
) is appropriate
for most plant tissue culture datasets with Rprop. If the model underfits,
increase to 2000–5000. Very high values increase computation time.
Training repetitions
Number of independent ANN training runs, each starting from different
random initial weights. AgroBioSTAT automatically selects the replicate
with the lowest error. More repetitions increase the probability of finding
a good solution, at the cost of longer computation time.
The default (
3
) is adequate for most cases;
increase to 5–10 for complex datasets.
Surrogate tree max. depth
Controls the maximum depth of the surrogate decision tree. A deeper tree
generates more rules with more conditions (conjunctions). A shallower tree
produces fewer, simpler rules. Depth 2–4 is recommended for biological
interpretability; depth ≥ 5 tends to generate overly specific rules.
Default:
2
.
Surrogate tree complexity (cp)
The complexity parameter for
rpart
. A split is only performed if it
improves the overall fit by at least
cp
(relative to the root node
error). Lower values allow more splits (more complex tree, more rules); higher values
produce simpler trees. Default:
0.01
. Range: 0.001–0.1.
Validation (nested)
AgroBioSTAT uses a
nested validation
strategy: a Train/Test split
(external validation) combined with internal cross-validation within the training set.
Training set (%):
Slider to set the proportion of data used for
training (default: 70%). The remaining observations form the held-out test set.
A donut chart shows the split visually. Range: 50–90%.
Internal CV method:
•
k-fold CV
— Splits the training set into k folds;
trains on k−1 folds and validates on the held-out fold, rotating through all k.
The default k = 5 is suitable for most datasets; increase to 10 for smaller datasets.
•
LOOCV
(Leave-One-Out CV) — k = n(train). Each observation
is used as validation once. Computationally expensive for large datasets; consider
k-fold if n > 50.
Number of folds (k):
Only visible when k-fold CV is selected.
Default: 5. Range: 3–20.
Random seed:
Ensures reproducibility of the Train/Test split, fold
assignment, and ANN weight initialisation. Default: 42.
Decision Rules from Artificial Neural Networks
Artificial Neural Networks (ANNs) are powerful non-linear models that can capture complex relationships between inputs and outputs. However, they are typically considered black-box models: it is difficult to understand why they make a particular prediction.
This module uses a Surrogate Modelling approach to extract interpretable IF–THEN rules from a trained ANN:
-
An ANN is trained on the user data (
RSNNS::mlppackage). Inputs are automatically standardised (z-score). Categorical predictors are one-hot encoded. - The ANN generates predicted values for the entire dataset.
-
A regression tree (
rpart) is trained using the original inputs to predict the ANN predictions — not the raw experimental observations. - The paths from root to terminal node in this surrogate tree define interpretable IF–THEN rules that approximate the behaviour of the neural network.
-
Variable importance is computed from connection weights using the
Olden
(signed) and
Garson
(relative) algorithms from
NeuralNetTools.
All thresholds in the rules are reported in original experimental units (the inverse z-score transformation is applied automatically). The surrogate tree will never perfectly replicate the ANN, but it provides a human-readable approximation of the most important decision boundaries learned by the network.
Output tabs
ANN Summary:
Displays the model topology (input → hidden → output),
the number of connection weights, the final training error (SSE), and overall fit
metrics (R², RMSE) on the full training set.
Surrogate Tree:
The regression tree trained to approximate the ANN
predictions. Shows splitting variables, thresholds (in original units), and terminal
node predicted values. Full-screen mode is available.
Rules:
Extracted IF–THEN rules from the surrogate tree. Each rule
shows the conditions (variable, operator, threshold), the predicted value (mean of the
ANN predictions in that node), and the number of observations covered. Rules can be
filtered and sorted. A legend at the bottom shows the level meanings for categorical
predictors (mapping one-hot encoded columns back to original factor levels).
Fit:
Scatter plot of ANN-predicted vs. observed values with R² and RMSE
annotations. The 1:1 line is shown for reference.
Variable Importance:
Bar plot of connection-weight-based variable importance.
Two algorithms are available: Olden (signed, the default, suitable for multi-layer networks)
and Garson (relative, 0–100%, only for single hidden layer). The table below shows
exact numeric values. Use the dropdown above the plot to switch between Olden and Garson.
Validation:
Summary of the nested validation: R² and RMSE for the
external test set (Train/Test split) and for the internal CV (mean across folds).
Two scatter plots show predictions vs. observed for both validation layers.
A per-fold metrics table shows R² and RMSE for each individual fold.
Model Info:
Complete parameter summary: target, predictors, topology,
activation, learning algorithm, iterations, repetitions, surrogate tree depth/cp,
number of rules extracted, surrogate fidelity R², validation method, and seed.
Downloads
Download Results (.xlsx): An Excel workbook with sheets for: Rules, Raw_Rules, Predicted_vs_Obs, Model_Info, Variable_Importance, Validation (nested metrics), Ext_Test_Preds, Int_CV_Preds, and Fold_Metrics.
R packages used
RSNNS
(mlp — ANN training),
NeuralNetTools
(olden, garson — variable importance),
rpart
(surrogate regression tree),
ggplot2
(fit plots, variable importance bar plots),
gridExtra
(multi-panel layouts),
readxl
(data import),
writexl
(Excel export),
DT
(interactive tables).
How to cite in your manuscript
For RSNNS (ANN engine):
Bergmeir, C., & Benítez, J. M. (2012). Neural Networks in R Using
the Stuttgart Neural Network Simulator: RSNNS.
Journal of Statistical Software
, 46(7), 1–26.
For NeuralNetTools (variable importance):
Beck, M. W. (2018). NeuralNetTools: Visualization and Analysis Tools
for Neural Networks.
Journal of Statistical Software
, 85(11), 1–20.
For Olden variable importance:
Olden, J. D., Joy, M. K., & Death, R. G. (2004). An accurate comparison of
methods for quantifying variable importance in artificial neural networks using
simulated data.
Ecological Modelling
,
178
(3–4), 389–397.
For rpart (surrogate tree):
Therneau, T. M., & Atkinson, E. J. (2019). An introduction to recursive
partitioning using the RPART routines. Mayo Foundation.
For the surrogate modelling approach:
Craven, M. W., & Shavlik, J. W. (1996). Extracting tree-structured
representations of trained networks.
Advances in Neural
Information Processing Systems
, 8, 24–30.
AgroBioSTAT
AgroBioSTAT is a data analysis tool developed by the AgroBioTech for Health group at the University of Vigo.
What does it do?
It allows researchers to perform comprehensive and rigorous data analysis without any programming knowledge. Specifically, it:
- Assesses Data Quality: Checks whether the data meet the necessary requirements (normality and homoscedasticity) before applying any statistical test.
- Compares Groups: Performs ANOVA and post-hoc tests to detect significant differences, visually indicating which groups differ from one another using compact letter display (CLD).
- Identifies Key Drivers: Uses multivariate tools like Conditional Inference Trees to find which variables best explain a given outcome, even when relationships are complex.
- Simplifies Artificial Intelligence: Trains AI models capable of making predictions and translates them into simple IF–THEN rules so that results remain transparent and understandable.
How does it work?
AgroBioSTAT integrates several advanced statistical packages from the R ecosystem — such as RSNNS, NeuralNetTools, agricolae, partykit, and ggplot2 — into a single intuitive interface. The user simply loads their data and obtains ready-to-interpret results, without writing code or managing any of the technical processes (such as one-hot encoding or z-score transformations) running in the background.
AgroBioSTAT — AgroBioTech for Health Group, University of Vigo · 2.0.0 — 14 March 2026
AgroBioSTAT — A Comprehensive Interface for Life Sciences