3  Your First Simulation

3.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Create a complete breeding program simulation from scratch
  • Phenotype and genotype individuals
  • Perform genomic prediction
  • Select superior individuals
  • Generate offspring from selected parents
  • Analyze genetic gain and inbreeding

3.2 The Complete Workflow

Let’s simulate a realistic breeding program with:

  • 100 founder animals
  • Genomic selection based on GBLUP
  • 10 generations of selection
  • Analysis of genetic gain and diversity

3.3 Step 1: Load MoBPS

Remember you need to install MoBPS first, visit chapter 1 to see how to install MoBPS on your OS.

# Load packages
library(tidyverse)
library(MoBPS)

# Optional: Set seed for reproducibility
set.seed(12345)

3.4 Step 2: Create Founder Population

# Create a founder population
population <- creating.diploid(
  nsnp   = 10000,              # 10,000 SNP markers
  nindi  = 100,                # 100 individuals (50 male, 50 female)
  chr.nr = 5,                  # 5 chromosomes
  chromosome.length = 2,       # 2 Morgan each (10M total)
  n.additive = 100,            # 100 additive QTLs
  n.dominant = 20,             # 20 dominant QTLs
  share.genotyped = 1,         # 100% genotyped
  var.target = 1,              # Genetic variance = 1
  name.cohort = "Founders"     # Name this cohort
)

What did we just create?

  • A population with realistic genomic structure (5 chromosomes)
  • 100 animals with 10,000 SNPs each
  • A complex trait with both additive and dominance effects
  • All individuals are genotyped (ready for genomic selection)

3.5 Step 3: Generate Phenotypes

# Phenotype all individuals in generation 1
population <- breeding.diploid(
  population,
  phenotyping     = "all",     # Phenotype everyone
  phenotyping.gen = 1,         # From generation 1
  heritability    = 0.5        # h² = 0.5 (moderate heritability)
)

What happened?

  • Phenotypes were simulated for all 100 founders
  • Residual variance was automatically calculated to achieve h² = 0.5
  • Phenotypes = True Breeding Value + Environmental Effect

3.6 Step 4: Genomic Prediction

# Perform genomic prediction (GBLUP)
population <- breeding.diploid(
  population,
  bve     = TRUE,                 # Calculate breeding value estimates
  bve.gen = 1                     # Use generation 1 as training set
)

What happened?

  • A genomic relationship matrix (G) was calculated
  • Breeding values were estimated using the internal GBLUP
  • Each individual now has an estimated breeding value (EBV)
NoteInternal BVE Method

The bve = TRUE uses MoBPS’s internal GBLUP assuming known heritability. For more realistic scenarios, use external software like rrBLUP, BGLR, or BLUPF90 (covered in Chapter 13).

3.7 Step 5: Selection of Parents

# Select top individuals as parents
population <- breeding.diploid(
  population,
  selection.size = c(10, 10),         # 10 males, 10 females
  selection.criteria  = "bve",        # Select on estimated BV
  selection.m.cohorts = "Founders_M",   # Males from Founders
  selection.f.cohorts = "Founders_F",   # Females from Founders
  breeding.size = c(50, 50),          # Generate 50M + 50F offspring
  name.cohort = "Generation_2"        # Name the offspring
)

What happened?

  • The top 10 males (by EBV) were selected as sires
  • The top 10 females (by EBV) were selected as dams
  • 100 offspring (50 male, 50 female) were created via random mating
  • Selection intensity: 20% of males, 20% of females

NOTE: If you use “Founders” instead of “Founders_M” it will calculate the wrong selection intensity

3.8 Step 6: Repeat for Multiple Generations

NOTE: selection.gen does not exist so separate into males and females in the last step!

# Simulate 10 generations
for (gen in 2:10) {

  # 1. Phenotype the current generation
  population <- breeding.diploid(
    population,
    phenotyping     = "all",
    phenotyping.gen = gen,
    heritability    = 0.5
  )

  # 2. Update genomic predictions (GEBV) using all data
  population <- breeding.diploid(
    population,
    bve     = TRUE,
    bve.gen = 1:gen                   # Cumulative training set
  )

  # 3. Select and 4. Mate to produce new offspring
  population <- breeding.diploid(
    population,
    selection.size     = c(10, 10),    # 10 males and 10 females
    selection.criteria = "bve",        # use EBV to select top animals
    selection.m.gen    = gen,          # Select Males from current generation
    selection.f.gen    = gen,          # Select Females from current generation
    breeding.size      = c(50, 50),    # create 50 new males and 50 new females
    name.cohort        = paste0("Generation_", gen+1) # name the next generation
  )
}

What’s happening in the loop?

  1. Each generation is fully phenotyped
  2. GEBVs are updated with cumulative data (generations 1 to current)
  3. Top 10 males and 10 females are selected
  4. 100 offspring are produced

3.9 Step 7: Analyze Results

3.9.1 Genetic Gain

# Plot breeding value trends
bv.development(population, gen = 1:11)

This creates a plot showing:

  • Mean breeding value by generation
  • Genetic gain over time
  • Standard deviation within generation

3.9.2 Extract True Breeding Value (TBV) Trend

The key is to extract generation and cohort information for each individual and combine it with breeding values:

# Method 1: Loop through each generation separately
data_all_gens <- tibble()

# loop over all 11 generations to extract info
for (gen in 1:11) {
  
  # Extract breeding values for this generation
  # get.bv() returns a p x n matrix (p = number of traits, n = number of animals)
  bv <- get.bv(population, gen = gen)

  # Extract cohort names for each individual
  cohorts <- get.cohorts.individual(population, gen = gen, use.id = TRUE)

  # Combine gen + ID + cohort + TBV (bv) into a data frame
  data_gen <- tibble(
    Generation    = gen,                   # Generation
    Ind_ID        = names(cohorts),        # Individual ID
    Cohort        = as.character(cohorts), # convert cohorts vector to character
    TBV           = as.numeric(bv)        # convert TBV matrix to vector
  )

  # stack new dataframe on top of last generation's data
  data_all_gens <- bind_rows(data_all_gens, data_gen)
}

# remove temporary loop objects
rm(bv); rm(cohorts); rm(data_gen)

# View structure of returned data frame (all data from gens 1:11)
print(data_all_gens)

We can aggregate by Generation to find means and variances

# Use dplyr (if installed) to aggregate by generation
data_summary <- data_all_gens %>%
  group_by(Generation) %>%
  summarise(
    TBV_Mean = mean(TBV),
    TBV_SD   = sd(TBV),
    n        = n()
  )

# print results
print(data_summary)

Now we can properly visualize and analyze the data. Below is a plot of the true genetic gain for this replicate.

# Plot mean BV by generation
ggplot(data_summary, aes(x=Generation, y=TBV_Mean)) +
  geom_line(color="dodgerblue3") + 
  geom_point(color="dodgerblue3") +
  labs(
    title = "Mean Breeding Value by Generation",
    x     = "Generation",
    y     = "Mean True Breeding Value (TBV)"
  )

We can calculate the total (true) genetic gain for this simulation with the following:

# Calculate genetic gain from generation 1 to 11
genetic_gain <- data_summary$TBV_Mean[11] - data_summary$TBV_Mean[1]
cat("Total genetic gain:", round(genetic_gain, 3), "\n")

Now we can plot a histogram of all TBVs from all generations:

# Histogram of all breeding values
ggplot(data_all_gens, aes(x=TBV)) +
  geom_histogram(fill="dodgerblue3", color="white") +
  labs(
    title = "Distribution of TBVs from All Generations",
    x = "True Breeding Value (TBV)"
  )

3.9.3 Inbreeding Analysis

We can use helper functions to extract the inbreeding coefficients:

# Calculate inbreeding for generation 11 (A matrix I think - 'IBD' in help)
# this returns a named numeric vector (IDs available)
inbreeding_gen11 <- inbreeding.emp(population, gen = 11)

# calculate the mean for this group (gen 11 here)
mean(inbreeding_gen11)

NOTE: inbreeding.emp() returns a named numeric vector (IDs) and represents the IBD inbreeding (F) coefficients

If you need the IDs of these animals, use the names() function

names(inbreeding_gen11)

We can extract the F (inbreeding) value for each generation using the following:

# Track inbreeding across generations
inbreeding_trend <- numeric(11)
for (gen in 1:11) {
  inbreeding_trend[gen] <- mean(inbreeding.emp(population, gen = gen))
}

# create a data frame with all results
data_inbreeding <- tibble(
  Generation = 1:11,
  FVal = inbreeding_trend
)

Plot the mean F value trend over time with the following:

# plot with ggplot2
data_inbreeding %>%
ggplot(., aes(x=Generation, y=FVal)) +
  geom_line(color="orange2") +
  geom_point(color="orange2") +
  labs(
    title = "Mean Inbreeding Over Time",
    x     = "Generation",
    y     = "Mean Inbreeding Coefficient (F)"
  )

3.9.4 Kinship Between Individuals

We can extract the relationship among animals with kinship.emp()

# Calculate kinship matrix for generation 11
kinship_matrix <- kinship.emp(population, gen = 11)

# print first 5 animals
kinship_matrix[1:5, 1:5]

Calculate the average

# Average kinship (off-diagonal elements)
n <- nrow(kinship_matrix)
avg_kinship <- sum(kinship_matrix - diag(diag(kinship_matrix))) / (n * (n - 1))
cat("Average kinship in gen 11:", round(avg_kinship, 4), "\n")

3.9.5 Population Structure (PCA)

# PCA of generations 1, 5, and 11
get.pca(population, gen = c(1, 5, 11))

This creates a PCA plot showing how the population has diverged from founders.

3.10 Understanding the Results

3.10.1 Expected Outcomes

After 10 generations of selection with these parameters, you should see:

  • Genetic gain: ~10 units
  • Inbreeding: ~50% (due to small effective population size)
  • Selection response: Linear increase over generation (mostly)
  • Prediction accuracy: Improving with cumulative training data (and decrease in Ne)

3.10.2 Why These Results?

  1. Strong selection (10% selected) = high genetic gain
  2. Small Ne (10 males) = rapid inbreeding accumulation
  3. Moderate h² (0.5) = good response to selection
  4. Finite genome = some exhaustion of genetic variance

3.11 Variations to Try

Modify the simulation to explore different scenarios:

3.11.1 Higher Selection Intensity

# More extreme selection
selection.size = c(5, 5)  # Top 5 of each sex (vs 10)

3.11.2 Larger Population

# Reduce inbreeding
nindi = 500                  # 500 founders (250 each)
selection.size = c(50, 50)   # But still select top 20%
breeding.size = c(250, 250)  # Offspring numbers produce (250 of each sex)

3.11.3 Multiple Traits

# Add a second trait
n.additive = c(100, 80)           # Trait 1 & 2
trait.cor = matrix(c(1.0, -0.3,      # Negative correlation
                    -0.3,  1.0), nrow = 2)

3.11.4 Different Mating Strategies

# Avoid inbreeding
avoid.mating.halfsib = TRUE
avoid.mating.fullsib = TRUE

3.12 Common Issues and Solutions

Warning“Not enough males/females”

Problem: selection.size too large for available individuals.

Solution: Check population size:

get.size(population)  # outputs matrix of (number of generations) x 2 (Sex or Males/Females)

Ensure selection.size doesn’t exceed population size.

Warning“No phenotypes available”

Problem: Forgot to phenotype before BVE.

Solution: Often you want to phenotype before prediction, but not always:

population <- breeding.diploid(population,
                                phenotyping = "all",
                                heritability = 0.5)
TipSpeeding Up Simulations

For large-scale simulations:

  • Reduce nsnp (1000 SNPs often sufficient)
  • Use miraculix package for fast matrix operations
  • Set copy.breeding.values = FALSE to reduce memory

3.13 Summary

In this chapter, you:

  • ✅ Created a complete breeding program simulation
  • ✅ Implemented genomic selection with GBLUP
  • ✅ Tracked genetic gain over 10 generations
  • ✅ Analyzed inbreeding and population structure
  • ✅ Learned the core MoBPS workflow

3.14 What’s Next?

Now that you can run a basic simulation, let’s dive deeper into each component:

  • Part 2: Learn how to create more realistic founder populations with LD structure, import real data, and design complex trait architectures
  • Chapter 4: Advanced population creation
  • Chapter 5: Sophisticated trait modeling

Continue to Chapter 4: Creating Populations!