# Load packages
library(tidyverse)
library(MoBPS)
# Optional: Set seed for reproducibility
set.seed(12345)3 Your First Simulation
3.1 Learning Objectives
By the end of this chapter, you will be able to:
- Create a complete breeding program simulation from scratch
- Phenotype and genotype individuals
- Perform genomic prediction
- Select superior individuals
- Generate offspring from selected parents
- Analyze genetic gain and inbreeding
3.2 The Complete Workflow
Let’s simulate a realistic breeding program with:
- 100 founder animals
- Genomic selection based on GBLUP
- 10 generations of selection
- Analysis of genetic gain and diversity
3.3 Step 1: Load MoBPS
Remember you need to install MoBPS first, visit chapter 1 to see how to install MoBPS on your OS.
3.4 Step 2: Create Founder Population
# Create a founder population
population <- creating.diploid(
nsnp = 10000, # 10,000 SNP markers
nindi = 100, # 100 individuals (50 male, 50 female)
chr.nr = 5, # 5 chromosomes
chromosome.length = 2, # 2 Morgan each (10M total)
n.additive = 100, # 100 additive QTLs
n.dominant = 20, # 20 dominant QTLs
share.genotyped = 1, # 100% genotyped
var.target = 1, # Genetic variance = 1
name.cohort = "Founders" # Name this cohort
)What did we just create?
- A population with realistic genomic structure (5 chromosomes)
- 100 animals with 10,000 SNPs each
- A complex trait with both additive and dominance effects
- All individuals are genotyped (ready for genomic selection)
3.5 Step 3: Generate Phenotypes
# Phenotype all individuals in generation 1
population <- breeding.diploid(
population,
phenotyping = "all", # Phenotype everyone
phenotyping.gen = 1, # From generation 1
heritability = 0.5 # h² = 0.5 (moderate heritability)
)What happened?
- Phenotypes were simulated for all 100 founders
- Residual variance was automatically calculated to achieve h² = 0.5
- Phenotypes = True Breeding Value + Environmental Effect
3.6 Step 4: Genomic Prediction
# Perform genomic prediction (GBLUP)
population <- breeding.diploid(
population,
bve = TRUE, # Calculate breeding value estimates
bve.gen = 1 # Use generation 1 as training set
)What happened?
- A genomic relationship matrix (G) was calculated
- Breeding values were estimated using the internal GBLUP
- Each individual now has an estimated breeding value (EBV)
The bve = TRUE uses MoBPS’s internal GBLUP assuming known heritability. For more realistic scenarios, use external software like rrBLUP, BGLR, or BLUPF90 (covered in Chapter 13).
3.7 Step 5: Selection of Parents
# Select top individuals as parents
population <- breeding.diploid(
population,
selection.size = c(10, 10), # 10 males, 10 females
selection.criteria = "bve", # Select on estimated BV
selection.m.cohorts = "Founders_M", # Males from Founders
selection.f.cohorts = "Founders_F", # Females from Founders
breeding.size = c(50, 50), # Generate 50M + 50F offspring
name.cohort = "Generation_2" # Name the offspring
)What happened?
- The top 10 males (by EBV) were selected as sires
- The top 10 females (by EBV) were selected as dams
- 100 offspring (50 male, 50 female) were created via random mating
- Selection intensity: 20% of males, 20% of females
NOTE: If you use “Founders” instead of “Founders_M” it will calculate the wrong selection intensity
3.8 Step 6: Repeat for Multiple Generations
NOTE:
selection.gendoes not exist so separate into males and females in the last step!
# Simulate 10 generations
for (gen in 2:10) {
# 1. Phenotype the current generation
population <- breeding.diploid(
population,
phenotyping = "all",
phenotyping.gen = gen,
heritability = 0.5
)
# 2. Update genomic predictions (GEBV) using all data
population <- breeding.diploid(
population,
bve = TRUE,
bve.gen = 1:gen # Cumulative training set
)
# 3. Select and 4. Mate to produce new offspring
population <- breeding.diploid(
population,
selection.size = c(10, 10), # 10 males and 10 females
selection.criteria = "bve", # use EBV to select top animals
selection.m.gen = gen, # Select Males from current generation
selection.f.gen = gen, # Select Females from current generation
breeding.size = c(50, 50), # create 50 new males and 50 new females
name.cohort = paste0("Generation_", gen+1) # name the next generation
)
}What’s happening in the loop?
- Each generation is fully phenotyped
- GEBVs are updated with cumulative data (generations 1 to current)
- Top 10 males and 10 females are selected
- 100 offspring are produced
3.9 Step 7: Analyze Results
3.9.1 Genetic Gain
# Plot breeding value trends
bv.development(population, gen = 1:11)This creates a plot showing:
- Mean breeding value by generation
- Genetic gain over time
- Standard deviation within generation
3.9.2 Extract True Breeding Value (TBV) Trend
The key is to extract generation and cohort information for each individual and combine it with breeding values:
# Method 1: Loop through each generation separately
data_all_gens <- tibble()
# loop over all 11 generations to extract info
for (gen in 1:11) {
# Extract breeding values for this generation
# get.bv() returns a p x n matrix (p = number of traits, n = number of animals)
bv <- get.bv(population, gen = gen)
# Extract cohort names for each individual
cohorts <- get.cohorts.individual(population, gen = gen, use.id = TRUE)
# Combine gen + ID + cohort + TBV (bv) into a data frame
data_gen <- tibble(
Generation = gen, # Generation
Ind_ID = names(cohorts), # Individual ID
Cohort = as.character(cohorts), # convert cohorts vector to character
TBV = as.numeric(bv) # convert TBV matrix to vector
)
# stack new dataframe on top of last generation's data
data_all_gens <- bind_rows(data_all_gens, data_gen)
}
# remove temporary loop objects
rm(bv); rm(cohorts); rm(data_gen)
# View structure of returned data frame (all data from gens 1:11)
print(data_all_gens)We can aggregate by Generation to find means and variances
# Use dplyr (if installed) to aggregate by generation
data_summary <- data_all_gens %>%
group_by(Generation) %>%
summarise(
TBV_Mean = mean(TBV),
TBV_SD = sd(TBV),
n = n()
)
# print results
print(data_summary)Now we can properly visualize and analyze the data. Below is a plot of the true genetic gain for this replicate.
# Plot mean BV by generation
ggplot(data_summary, aes(x=Generation, y=TBV_Mean)) +
geom_line(color="dodgerblue3") +
geom_point(color="dodgerblue3") +
labs(
title = "Mean Breeding Value by Generation",
x = "Generation",
y = "Mean True Breeding Value (TBV)"
)We can calculate the total (true) genetic gain for this simulation with the following:
# Calculate genetic gain from generation 1 to 11
genetic_gain <- data_summary$TBV_Mean[11] - data_summary$TBV_Mean[1]
cat("Total genetic gain:", round(genetic_gain, 3), "\n")Now we can plot a histogram of all TBVs from all generations:
# Histogram of all breeding values
ggplot(data_all_gens, aes(x=TBV)) +
geom_histogram(fill="dodgerblue3", color="white") +
labs(
title = "Distribution of TBVs from All Generations",
x = "True Breeding Value (TBV)"
)3.9.3 Inbreeding Analysis
We can use helper functions to extract the inbreeding coefficients:
# Calculate inbreeding for generation 11 (A matrix I think - 'IBD' in help)
# this returns a named numeric vector (IDs available)
inbreeding_gen11 <- inbreeding.emp(population, gen = 11)
# calculate the mean for this group (gen 11 here)
mean(inbreeding_gen11)NOTE: inbreeding.emp() returns a named numeric vector (IDs) and represents the IBD inbreeding (F) coefficients
If you need the IDs of these animals, use the names() function
names(inbreeding_gen11)We can extract the F (inbreeding) value for each generation using the following:
# Track inbreeding across generations
inbreeding_trend <- numeric(11)
for (gen in 1:11) {
inbreeding_trend[gen] <- mean(inbreeding.emp(population, gen = gen))
}
# create a data frame with all results
data_inbreeding <- tibble(
Generation = 1:11,
FVal = inbreeding_trend
)Plot the mean F value trend over time with the following:
# plot with ggplot2
data_inbreeding %>%
ggplot(., aes(x=Generation, y=FVal)) +
geom_line(color="orange2") +
geom_point(color="orange2") +
labs(
title = "Mean Inbreeding Over Time",
x = "Generation",
y = "Mean Inbreeding Coefficient (F)"
)3.9.4 Kinship Between Individuals
We can extract the relationship among animals with kinship.emp()
# Calculate kinship matrix for generation 11
kinship_matrix <- kinship.emp(population, gen = 11)
# print first 5 animals
kinship_matrix[1:5, 1:5]Calculate the average
# Average kinship (off-diagonal elements)
n <- nrow(kinship_matrix)
avg_kinship <- sum(kinship_matrix - diag(diag(kinship_matrix))) / (n * (n - 1))
cat("Average kinship in gen 11:", round(avg_kinship, 4), "\n")3.9.5 Population Structure (PCA)
# PCA of generations 1, 5, and 11
get.pca(population, gen = c(1, 5, 11))This creates a PCA plot showing how the population has diverged from founders.
3.10 Understanding the Results
3.10.1 Expected Outcomes
After 10 generations of selection with these parameters, you should see:
- Genetic gain: ~10 units
- Inbreeding: ~50% (due to small effective population size)
- Selection response: Linear increase over generation (mostly)
- Prediction accuracy: Improving with cumulative training data (and decrease in Ne)
3.10.2 Why These Results?
- Strong selection (10% selected) = high genetic gain
- Small Ne (10 males) = rapid inbreeding accumulation
- Moderate h² (0.5) = good response to selection
- Finite genome = some exhaustion of genetic variance
3.11 Variations to Try
Modify the simulation to explore different scenarios:
3.11.1 Higher Selection Intensity
# More extreme selection
selection.size = c(5, 5) # Top 5 of each sex (vs 10)3.11.2 Larger Population
# Reduce inbreeding
nindi = 500 # 500 founders (250 each)
selection.size = c(50, 50) # But still select top 20%
breeding.size = c(250, 250) # Offspring numbers produce (250 of each sex)3.11.3 Multiple Traits
# Add a second trait
n.additive = c(100, 80) # Trait 1 & 2
trait.cor = matrix(c(1.0, -0.3, # Negative correlation
-0.3, 1.0), nrow = 2)3.11.4 Different Mating Strategies
# Avoid inbreeding
avoid.mating.halfsib = TRUE
avoid.mating.fullsib = TRUE3.12 Common Issues and Solutions
Problem: selection.size too large for available individuals.
Solution: Check population size:
get.size(population) # outputs matrix of (number of generations) x 2 (Sex or Males/Females)Ensure selection.size doesn’t exceed population size.
Problem: Forgot to phenotype before BVE.
Solution: Often you want to phenotype before prediction, but not always:
population <- breeding.diploid(population,
phenotyping = "all",
heritability = 0.5)For large-scale simulations:
- Reduce
nsnp(1000 SNPs often sufficient) - Use
miraculixpackage for fast matrix operations - Set
copy.breeding.values = FALSEto reduce memory
3.13 Summary
In this chapter, you:
- ✅ Created a complete breeding program simulation
- ✅ Implemented genomic selection with GBLUP
- ✅ Tracked genetic gain over 10 generations
- ✅ Analyzed inbreeding and population structure
- ✅ Learned the core MoBPS workflow
3.14 What’s Next?
Now that you can run a basic simulation, let’s dive deeper into each component:
- Part 2: Learn how to create more realistic founder populations with LD structure, import real data, and design complex trait architectures
- Chapter 4: Advanced population creation
- Chapter 5: Sophisticated trait modeling
Continue to Chapter 4: Creating Populations!