Introduction to maxRgain

An integer-programming method of maximizing genetic gains in polyclonal selection

Sónia Surgy, Jorge Cadima, Elsa Gonçalves

Introduction

Polyclonal selection is a breeding strategy developed for ancient grapevine varieties, but of more widespread interest. It seeks to select a group of different genotypes that collectively meet predefined desirable criteria for quantitative traits. Polyclonal selection, as opposed to the selection of a single genotype, helps to preserve genetic diversity and provides greater environmental stability, while still securing high genetic gains.

This approach relies on the predictors of the genetic effects obtained from the fitting of a mixed model. Rather than focusing on individual genotypes, polyclonal selection considers the group as the unit of selection, with the goal of maximizing the overall predicted genetic gain across multiple traits of interest. Identifying such a superior group is an optimization problem, in which the objective is to select a subset of genotypes that jointly maximize the expected genetic value, subject to constraints which satisfy agronomic and enological interests.

Most existing methods for genetic selection based on several traits, such as selection indices, are designed to maximize the genetic gain of selection of a single genotype. When the objective is to select a group of genotypes, rather than a single genotype, the conventional approach has been to simply choose the top-ranking genotypes. However, this strategy does not ensure that the selected group, as a whole, satisfies predefined breeding objectives when multiple traits are considered simultaneously.

This package implements a novel Integer Programming-based method developed by Surgy et al. (2025), designed specifically for group-based selection under multiple trait criteria. While the methodology is demonstrated using data from grapevine breeding trials, it is broadly applicable across different species and breeding programs, especially in scenarios where group performance is more relevant than individual merit alone.

For a complete description of the methodology, please refer to:

Surgy, S., Cadima, J. & Gonçalves, E. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122 (2025). https://doi.org/10.1007/s00122-025-04885-0

Each function in the package includes an example that reproduces scenarios described in the original publication, thereby facilitating both practical understanding and a direct application of the method. This document provides a more detailed explanation of the outputs generated by the various functions and offers guidance on how to interpret them effectively.

Data

The package includes a real dataset - Gouveio - to ensure the reproducibility of the examples and to allow users to explore additional scenarios. This dataset contains Empirical Best Linear Unbiased Predictors (EBLUPs) of genotypic effects, obtained from the fitting of a linear mixed model to phenotypic data from a selection field trial involving 150 genotypes of the grapevine variety Gouveio, established according to a resolvable row-column design.

head(Gouveio)
#>   Clone           yd         pa         ta         ph          bw
#> 1 GV001 -0.005005668 0.23031296 -0.3811156 0.12110451 -0.02831195
#> 2 GV002 -0.486152063 0.41496698  0.0265791 0.02831156  0.12689991
#> 3 GV003 -0.402470892 0.03527182 -0.2086296 0.07848777 -0.08730846
#> 4 GV004 -0.316439937 0.15302600 -0.3562443 0.11582836  0.03532205
#> 5 GV005  0.341634591 0.42183408 -0.1215062 0.05332889  0.03196529
#> 6 GV006  0.575303862 0.62634443 -0.1553684 0.08762044  0.04216359

The traits evaluated were:

The selection criteria are defined by the breeder. In this example, they are: to increase yield, potential alcohol, and total acidity, while decreasing pH and berry weight.

Common arguments

Below is a table describing the common arguments used across the functions in this package:

Argument Description Used in functions
traits A vector with the names of the traits to be optimized, i.e., those included in the objective function All
ref Name of the reference column (e.g., genotype ID). Defaults to the first column. All
clmin Integer, indicating the minimum group size (default is 1). All
clmax Integer, indicating the maximum group size. If omitted, equal to clmin All
dmg Can be of one of two types. In the standard form, it will be a dataframe with three columns defining constraints: trait names, constraint relations (">=", "<="), and right-hand side values. Alternatively, it can be given the text value dmg = "base" which sets the right-hand side of the constraints of all traits to zero. polyclonal()
constraints Named numeric vector with traits to which constraints apply. If omitted, all except ref are used. rmaxa()
meanvec Named numeric vector of the means for each trait. If omitted, data are assumed to be normalized (divided) by the mean. All
criteria Named numeric vector with the criterion for each trait: 1 to increase, -1 to decrease. If omitted, all traits are assumed to be increasing. All
data A dataframe with the predictors of genotypic effects for the several traits. Rows are genotypes; columns are traits. All

Important notes

Functions

polyclonal()

This is the main function of this package.

polyclonal(traits, ref = NULL, clmin = 2, clmax, dmg = NULL, 
           meanvec = NULL, criteria = NULL, data)

In the example for the polyclonal() function, the objective is to maximize the genetic gain of groups with sizes ranging from 7 to 20 genotypes, while enforcing desired minimum gains for the five traits (expressed as percentages of the overall mean of each trait in the field trial): 20% for yield (yd), 3% for potential alcohol (pa), 3% for total acidity (ta), 1% for pH, and 2% for berry weight (bw).

# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)

# Vector with the traits to be optimized
mytraits <- c("yd", "pa",  "ta", "ph", "bw")

# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)

# Name of the reference column with the genotype labels
myref = "Clone"

If meanvec is omitted, it is assumed that all trait values have already been standardized by dividing by their respective overall means. If criteria is omitted, the default assumption is that an increase is desired for all traits. If ref is omitted, the first column in the dataset is assumed to contain the genotype labels.

In the polyclonal() function the constraints are indicated by argument dmg (desired minimum gains). The standard form of this argument is a data frame with three columns: the first specifying the traits to be constrained; the second with the constraint relation; the second with the relation of the constraint (“>=” or “<=”); and the third with the right-hand side values of the constraints. Column names are arbitrary, but their order must be preserved.

# dataframe with the desired minimum gains
mydmg <- data.frame(
  lhs = c("yd", "pa", "ta", "ph", "bw"),
  rel = c(">=", ">=", ">=", ">=", ">="),
  rhs = c(20, 3, 3, 1, 2)
  )

Note that the positive values of 1 and 2 for pH (ph) and berry weight (bw), which have selection criteria -1, mean a decrease of at least 1% in pH and a decrease of at least 3% in berry weight.

mydmg
#>   lhs rel rhs
#> 1  yd  >=  20
#> 2  pa  >=   3
#> 3  ta  >=   3
#> 4  ph  >=   1
#> 5  bw  >=   2

Now, the polyclonal function is called to perform the selection based on the specified constraints:

# Using polyclonal() function
with_dmg <- polyclonal(
  traits = mytraits,
  clmin = 7,
  clmax = 20,
  dmg = mydmg,
  meanvec = mymeanvec,
  criteria = mycriteria,
  data = Gouveio
  )
#> No possible solution was found for some of the requested group sizes.

Groups ranging from 7 to 20 genotypes were requested, as these cover the most appropriate group sizes for practical applications of the polyclonal methodology for grapevine selection. However, other group sizes can also be specified. In fact, it is possible to request a single group size by assigning the same value to both clmin and clmax or by omitting clmax. Analysing the output reveals how the imposed constraints influence the selection results.

# Results
with_dmg
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd       pa       ta       ph       bw
#>          10 21.35475 3.009712 3.321584 1.062815 2.314695
#>           8 20.97573 3.029873 4.004230 1.002937 2.130498
#>           7 24.20511 3.057488 3.278892 1.010023 2.255069
#> 
#> 
#> Selected genotypes (per group size)
#> 
#> $selected
#>     10     8     7
#>  GV053 GV060 GV038
#>  GV065 GV065 GV081
#>  GV080 GV088 GV088
#>  GV081 GV089 GV089
#>  GV088 GV095 GV127
#>  GV089 GV097 GV144
#>  GV095 GV144 GV146
#>  GV097 GV146      
#>  GV144            
#>  GV146

The output displays the genetic gain achieved for each trait across the different group sizes. All values are expressed as percentages of the trait mean. For example, in the 10-genotype group, the gains obtained were approximately 21.4%, 3.0%, 3.3%, 1.1% and 2.3% for yield (yd), potential alcohol (pa), total acidity (ta), pH (ph), and berry weight (bw), respectively. Positive values indicate gains in accordance with the selection criteria.

In this case, although group sizes between 7 and 20 genotypes were requested, the object gain only provides results for groups of 7, 8, and 10 genotypes. This indicates that the desired minimum gains could not be achieved for other group sizes within the specified range.

Examining the lists of genotypes within each group reveals that genotype groups of different sizes are not necessarily nested. For example, genotype GV060 is not included in the group of 7 genotypes, appears in the group of 8 genotypes, and is then excluded from the group of 10 genotypes.

The summary() method can be used with polyclonal() object to retrieve a summary of the initial conditions and results.

# Summary results
summary(with_dmg)
#> Summary of Selection Results
#> -----------------------------------
#> 
#> Number of groups selected: 3 
#> Group sizes: 10, 8, 7 
#> 
#> Overview
#>    Crit   Mean DMG   MaxGain MaxGroup
#> yd    1  3.517  20 24.205112        7
#> pa    1 12.760   3  3.057488        7
#> ta    1  4.495   3  4.004230        8
#> ph   -1  3.927   1  1.062815       10
#> bw   -1  1.653   2  2.314695       10

The summary() indicates how many groups were achieved for those conditions and the related dimensions. In the table, the first column gives the trait names, followed by the means, the criteria and the desired minimum gain (DMG). The Last two columns, “MaxGain” and “MaxGroup” show the maximum gain achieved for each trait and the group size where that gain occurs, respectively.

The so-called base situation, as described in the original publication of the method (Surgy et al., 2025), consists in maximizing the predicted genetic gain of the target traits while ensuring that no trait decreases. To implement this, the right-hand side of all constraints is set to zero.

To simplify the computation of this scenario, the argument dmg can be set to "base". In that case, all traits present in the dataset are included as constraints, each required to be greater than or equal to zero, except for the reference trait specified in ref. If ref is omitted, the first column of the dataset is assumed to correspond to the reference trait.

It is important to note that when using the “base” mode, the user must provide values for both meanvec and criteria for all traits in the dataset. If the user wishes to exclude any traits from the constraint set, the full specification of dmg must be provided instead.

The following example illustrates the application of the base situation. Groups of 7 to 12 clones are selected under these conditions. The same values for traits, meanvec, and criteria are used throughout.


# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)

# Vector with the traits to be optimized
mytraits <- c("yd", "pa","ta", "ph", "bw")

# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)


base_sit <- polyclonal(
  traits = mytraits,
  clmin = 7,
  clmax = 12,
  dmg = "base",
  meanvec = mymeanvec,
  criteria = mycriteria,
  data = Gouveio
  )

Unlike what happens with a selection index, where increasing the group size typically involves the sequential inclusion of genotypes according to a fixed ranking, the method implemented here allows each group size to be independently optimized. As a result, groups of different sizes are not necessarily nested.


base_sit
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd          pa       ta       ph       bw
#>          12 37.93654 0.087693734 3.145778 1.755908 3.933390
#>          11 39.23907 0.029947255 3.244531 1.886373 3.506967
#>          10 38.30817 0.148095205 3.356216 1.864944 4.434100
#>           9 40.16918 0.004116393 3.103095 1.857812 3.837374
#>           8 39.12182 0.148572474 3.225021 1.827456 5.037591
#>           7 40.82980 0.069781809 2.568590 1.646327 5.396038
#> 
#> 
#> Selected genotypes (per group size)
#> 
#> $selected
#>     12    11    10     9     8     7
#>  GV079 GV080 GV080 GV080 GV080 GV080
#>  GV080 GV088 GV088 GV088 GV088 GV088
#>  GV088 GV089 GV089 GV089 GV089 GV095
#>  GV089 GV093 GV095 GV093 GV095 GV127
#>  GV093 GV095 GV127 GV095 GV127 GV132
#>  GV094 GV127 GV132 GV127 GV137 GV144
#>  GV095 GV132 GV137 GV137 GV144 GV145
#>  GV127 GV137 GV144 GV144 GV145      
#>  GV137 GV144 GV145 GV145            
#>  GV144 GV145 GV146                  
#>  GV145 GV146                        
#>  GV146

For example, consider the groups of 7 and 8 genotypes. In the 8-genotype group, not just one but two genotypes differ from the groups of 7: genotypes GV089 and GV137. On the other hand, genotype GV132, which is included in the group of 7 genotypes, is not present in the group of 8. This illustrates how the inclusion of constraints can lead to changes in group composition, rather than simply adding the next best-ranked individual.

In the base situation, summary() show all DMG constraints as 0.


summary(base_sit)
#> Summary of Selection Results
#> -----------------------------------
#> 
#> Number of groups selected: 6 
#> Group sizes: 12, 11, 10, 9, 8, 7 
#> 
#> Overview
#>    Crit   Mean DMG    MaxGain MaxGroup
#> yd    1  3.517   0 40.8297959        7
#> pa    1 12.760   0  0.1485725        8
#> ta    1  4.495   0  3.3562158       10
#> ph   -1  3.927   0  1.8863728       11
#> bw   -1  1.653   0  5.3960377        7

rmaxp()

rmaxp(traits, ref = NULL, clmin = 2 , clmax, meanvec = NULL, criteria = NULL, data)

The package includes a function rmaxp(), which computes the maximum possible gain for each trait. The maximum possible gain corresponds to the value obtained by selecting the top genotypes for each trait individually, without applying any constraints. The function allows the user to request the gain for one or multiple traits simultaneously. The output displays one column per trait; however, the values in different columns are not linked to the same solution and should be interpreted independently.

In the following example, it is asked the maximum possible gain for yd and pa for group sizes from 9 to 20 genotypes.

# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760)

# Vector with the traits to be optimized
mytraits <- c("yd", "pa")

The vector specifying the selection criteria can be omitted, as the default value of 1 is used for both traits. With this function, no constraints are applied.

# Using rmaxp()
max_pos_gain <- rmaxp(
  traits = mytraits,
  clmin = 9, 
  clmax = 20,
  meanvec = mymeanvec,
  data = Gouveio
  )

# Results
max_pos_gain
#> Maximum possible gains for each trait correspond to independently selected groups;
#> thus, gains are not directly comparable and should be interpreted separately.
#> See 'selected' for group details
#> 
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain
#>  Group.Size       yd       pa
#>          20 39.45423 5.915127
#>          19 40.21696 5.968099
#>          18 40.89337 6.024443
#>          17 41.62513 6.078201
#>          16 42.41884 6.135137
#>          15 43.13445 6.195106
#>          14 43.90097 6.256910
#>          13 44.58007 6.305906
#>          12 45.30719 6.359933
#>          11 45.96563 6.419047
#>          10 46.63716 6.483659
#>           9 47.43910 6.559512
#> 
#> 
#> Selected genotypes (per group)
#> 
#> $selected
#>     yd    pa Entry
#>  GV084 GV050    9 
#>  GV088 GV063    9 
#>  GV092 GV065    9 
#>  GV093 GV069    9 
#>  GV095 GV074    9 
#>  GV132 GV085    9 
#>  GV138 GV127    9 
#>  GV143 GV135    9 
#>  GV145 GV141    9 
#>  GV137 GV144    10
#>  GV079 GV133    11
#>  GV070 GV125    12
#>  GV080 GV142    13
#>  GV089 GV055    14
#>  GV091 GV053    15
#>  GV094 GV082    16
#>  GV090 GV068    17
#>  GV078 GV103    18
#>  GV040 GV073    19
#>  GV131 GV006    20

As in previous outputs, one column per trait is shown. However, in this case, the gain reported in each column refers to an independent solution. For example, the group of 20 genotypes achieving a yield gain of 39.5% is not composed of the same genotypes as the group achieving a potential alcohol gain of 5.9%. This can be verified by examining the output stored in the selected object.

Since no constraints are applied in this scenario, the output represents a ranking of genotypes for each trait. Increasing the group size simply adds the next genotype in the ranking. Thus, each column corresponds to the top genotypes for a given trait. The final column, Entry, indicates the smallest group size at which each genotype is first included. It means that, for example, for yield, the group size of 9 is composed by genotypes from GV084 to GV145. Genotype GV137 appears for the first time in yield when the group size reaches 10 genotypes. When the group size increases to 11, all the previous 10 genotypes remain in the group and genotype GV079 is added, and so on.

No customized summary() is provided, given that the conditions do not vary and no constraints are applied.

rmaxa()

rmaxa(traits, ref = NULL, clmin = 2, clmax, constraints = NULL, meanvec = NULL, criteria = NULL, data)

The package includes a function rmaxa(), which computes the maximum admissible gains. The maximum admissible gain corresponds to the value obtained by maximizing one trait while avoiding losses in all others, enforced through constraints—predefined in this case as >= 0. As with rmaxp(), the user may request gains for one or multiple traits simultaneously. The output displays one column per trait; however, the values in different columns correspond to independent solutions and should be interpreted separately. This distinguishes the output of the gain object from this function from that of the polyclonal() function with dmg = "base", where the sum of the values related to all traits are optimized.

# Named numeric vector with the trait means
mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653)

# Vector with the traits to be optimized
mytraits <- c("yd", "pa")

# Named numeric vector with the selection criteria
mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)

In this function, if the traits to which the constraints apply are not specified, all the traits are assumed to be greater or equal than zero. If this vector is omitted, all columns in the dataset are considered as constrained traits, except the one indicated in ref or the first column if ref is omitted. The criteria as well as the meanvec arguments must indicate the criteria for all the traits involved in both maximization and the constraints.

This example requests groups with 12 to 20 genotypes.

# Using rmaxa()
max_adm_gain <- rmaxa(
  traits = mytraits,
  clmin = 12,
  clmax = 20,
  meanvec = mymeanvec,
  data = Gouveio
  )

# Results
max_adm_gain
#> Maximum admissible gains for each trait correspond to independently selected groups;
#> thus, gains are not directly comparable and should be interpreted separately.
#> See 'selected' for group details
#> 
#> Predicted genetic gains as a % of the overall mean
#> 
#> $gain 
#>  Group.Size       yd       pa
#>          20 26.21528 5.659511
#>          19 26.63661 5.696647
#>          18 27.05655 5.755160
#>          17 27.44606 5.802291
#>          16 27.74894 5.833466
#>          15 28.17461 5.862092
#>          14 28.50718 5.905133
#>          13 28.85517 5.937964
#>          12 29.71520 6.011691
#> 
#> 
#> $ selected_yd 
#>     20    19    18    17    16    15    14    13    12
#>  GV005 GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV006
#>  GV006 GV007 GV007 GV025 GV040 GV040 GV040 GV025 GV040
#>  GV007 GV039 GV040 GV040 GV049 GV049 GV049 GV040 GV049
#>  GV025 GV040 GV049 GV049 GV067 GV067 GV067 GV049 GV067
#>  GV040 GV049 GV067 GV067 GV075 GV088 GV079 GV067 GV079
#>  GV049 GV067 GV079 GV079 GV088 GV090 GV088 GV088 GV088
#>  GV067 GV079 GV088 GV088 GV090 GV093 GV090 GV090 GV093
#>  GV079 GV080 GV090 GV090 GV093 GV095 GV093 GV093 GV095
#>  GV080 GV088 GV093 GV093 GV095 GV112 GV095 GV095 GV127
#>  GV088 GV090 GV095 GV095 GV112 GV127 GV112 GV112 GV138
#>  GV090 GV093 GV112 GV112 GV127 GV132 GV127 GV127 GV140
#>  GV093 GV095 GV127 GV127 GV138 GV138 GV132 GV140 GV148
#>  GV095 GV112 GV138 GV140 GV140 GV140 GV140 GV145      
#>  GV112 GV127 GV140 GV144 GV145 GV146 GV148            
#>  GV127 GV138 GV144 GV145 GV146 GV148                  
#>  GV138 GV140 GV145 GV146 GV148                        
#>  GV140 GV145 GV146 GV148                              
#>  GV145 GV146 GV148                                    
#>  GV146 GV148                                          
#>  GV148                                                
#> 
#> $ selected_pa 
#>     20    19    18    17    16    15    14    13    12
#>  GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV006 GV050
#>  GV050 GV050 GV050 GV050 GV050 GV050 GV050 GV053 GV053
#>  GV053 GV053 GV053 GV053 GV053 GV053 GV053 GV065 GV065
#>  GV060 GV060 GV060 GV060 GV065 GV065 GV065 GV069 GV069
#>  GV063 GV063 GV065 GV065 GV069 GV069 GV069 GV081 GV081
#>  GV065 GV065 GV069 GV069 GV073 GV081 GV074 GV085 GV125
#>  GV069 GV069 GV073 GV074 GV074 GV085 GV081 GV125 GV127
#>  GV073 GV074 GV074 GV081 GV081 GV097 GV097 GV127 GV133
#>  GV074 GV081 GV081 GV085 GV097 GV125 GV127 GV133 GV135
#>  GV081 GV085 GV085 GV097 GV125 GV127 GV133 GV135 GV141
#>  GV085 GV097 GV097 GV125 GV127 GV133 GV135 GV141 GV144
#>  GV097 GV125 GV125 GV127 GV133 GV135 GV141 GV144 GV146
#>  GV125 GV127 GV127 GV133 GV135 GV141 GV144 GV146      
#>  GV127 GV133 GV133 GV135 GV141 GV144 GV146            
#>  GV133 GV135 GV135 GV141 GV144 GV146                  
#>  GV135 GV140 GV141 GV144 GV146                        
#>  GV140 GV141 GV144 GV146                              
#>  GV141 GV144 GV146                                    
#>  GV144 GV146                                          
#>  GV146

As in the rmaxp() gain output, one column per trait is shown; however, the gain reported in each column corresponds to an independent solution. For example, the group of 20 genotypes achieving a gain of 26.2% for yd is not composed of the same genotypes as the group achieving a gain of 5.7% for pa. Furthermore, the gains obtained with rmaxa() for any trait are always less than or equal to those obtained with rmaxp(), since rmaxa() incorporates constraints during optimization.

Regarding the selected object, due to the constraints applied in this function, the genotypes composing one group differ from those in other groups. Consequently, there is one selected object per trait, named as selected_<trait>.

Conclusion

This package offers a ready-to-use implementation of the integer programming method for maximizing the predicted genetic gains in polyclonal selection developed by Surgy et al. (2025). It allowing breeders to efficiently and consistently perform group-based multi-trait selection, which supports a balanced genetic gain across multiple traits in breeding programs. A dataset and examples are provided to assist in understanding and reproducing results.

References

Surgy, S., Cadima, J. & Gonçalves, E., 2025. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122. https://doi.org/10.1007/s00122-025-04885-0