# Create a simple founder population
population <- creating.diploid(
nsnp = 5000, # 5,000 SNP markers
nindi = 200, # 200 individuals
n.additive = 100 # 100 QTLs
)4 Creating Founder Populations
4.1 Learning Objectives
By the end of this chapter, you will be able to:
- Generate simulated genomic data with different patterns
- Import real genomic data (VCF, PLINK formats)
- Set up genetic maps and chromosome structures
- Control allele frequencies and genetic diversity
- Create multiple breeds/populations
4.2 Overview of creating.diploid()
The creating.diploid() function is your starting point for every MoBPS simulation. It creates the founder population with:
- Genomic data (markers/haplotypes)
- Genetic map (chromosomes, positions)
- Trait architecture (QTLs, effects)
- Population structure (cohorts, sex, pools)
4.3 Generating Simulated Data
4.3.1 Basic Random Population
The simplest approach: generate random genotypes.
Default behavior: - Single chromosome (5 Morgan length) - Random genotypes with allele frequency ~ Uniform(0, 1) - 50% male, 50% female - All individuals diploid and unrelated
4.3.2 Controlling Allele Frequencies
Allele frequencies dramatically affect genetic architecture:
# Uniform allele frequencies (default)
pop_uniform <- creating.diploid(
nsnp = 1000,
nindi = 100,
beta_shape1 = 1, # Beta(1,1) = Uniform(0,1)
beta_shape2 = 1
)
# Common variants (higher MAF)
pop_common <- creating.diploid(
nsnp = 1000,
nindi = 100,
beta_shape1 = 2, # Beta(2,2) concentrates near 0.5
beta_shape2 = 2
)
# Rare variants (low MAF)
pop_rare <- creating.diploid(
nsnp = 1000,
nindi = 100,
beta_shape1 = 0.5, # Beta(0.5,0.5) concentrates near 0 and 1
beta_shape2 = 0.5
)
Real populations often have many rare variants and fewer common variants. Use beta_shape1 = 0.3, beta_shape2 = 1.5 to approximate this.
4.3.3 Initialization Modes
Control starting genotypes with dataset parameter:
# Mode 1: All zeros (000.../000...)
pop_zero <- creating.diploid(nsnp = 1000, nindi = 100,
dataset = "all0")
# Mode 2: Fully heterozygous (000.../111...)
pop_het <- creating.diploid(nsnp = 1000, nindi = 100,
dataset = "allhetero")
# Mode 3: Random (default, X₁X₂X₃.../X₄X₅X₆...)
pop_random <- creating.diploid(nsnp = 1000, nindi = 100,
dataset = "random")
# Mode 4: Homozygous random (X₁X₂X₃.../X₁X₂X₃... same haplotypes)
pop_homo <- creating.diploid(nsnp = 1000, nindi = 100,
dataset = "homorandom")When to use each mode:
"random"- General purpose, unrelated founders"homorandom"- Create inbred lines or DH lines"allhetero"- Maximum heterozygosity (F1s)"all0"- Specific starting conditions or testing
4.4 Chromosome Structure
4.4.1 Single vs. Multiple Chromosomes
# Single chromosome (default)
pop_single <- creating.diploid(
nsnp = 5000,
chromosome.length = 5 # 5 Morgan
)
# Multiple chromosomes
pop_multi <- creating.diploid(
nsnp = 10000,
chr.nr = c(2000, 2000, 3000, 3000), # SNPs per chromosome
chromosome.length = c(2, 2, 1.5, 1.5) # Length in Morgan
)Consequences of chromosome structure: - More chromosomes = more recombination = faster LD decay - Longer chromosomes = stronger linkage between distant markers - Realistic structure improves simulation accuracy
4.4.2 Using Template Species Maps
MoBPS includes common species maps:
# Cattle map
pop_cattle <- creating.diploid(
nsnp = 50000,
nindi = 100,
template.chip = "cattle" # 29 autosomes
)
# Other available templates
template.chip = "pig" # Sus scrofa
template.chip = "chicken" # Gallus gallus
template.chip = "sheep" # Ovis aries
template.chip = "maize" # Zea maysThese provide realistic: - Number of chromosomes - Relative chromosome lengths (in Morgan) - Approximate recombination rates
Templates provide chromosome structure only, not: - Real marker positions (SNPs are evenly spaced) - Real allele frequencies - Real LD patterns
For these, import real data (see next section).
4.4.3 Custom Genetic Maps
Provide a full genetic map:
# Create a custom map
# Columns: chr, snp_name, position_Morgan, position_bp, allele_freq
my_map <- data.frame(
chr = c(1, 1, 1, 2, 2, 2),
snp = c("rs001", "rs002", "rs003", "rs004", "rs005", "rs006"),
pos_M = c(0.0, 0.5, 1.0, 0.0, 0.3, 0.6),
pos_bp = c(1000, 50000000, 100000000, 1000, 30000000, 60000000),
freq = c(0.2, 0.5, 0.8, 0.1, 0.45, 0.7)
)
# Use the map
population <- creating.diploid(
map = my_map,
nindi = 100,
n.additive = 50
)4.5 Importing Real Data
4.5.1 From VCF Files
VCF (Variant Call Format) is standard for genomic data:
# Import VCF file
population <- creating.diploid(
vcf = "path/to/genotypes.vcf", # Or .vcf.gz
n.additive = 100,
vcf.maxsnp = 10000, # Optional: limit SNPs
vcf.maxindi = 500 # Optional: limit individuals
)What gets imported: - Phased or unphased genotypes - Chromosome numbers - Base pair positions - Marker names (rsIDs)
4.5.2 From PLINK Files
PLINK (.ped/.map or .bed/.bim/.fam) is also widely used:
# Import .ped and .map files
ped_data <- read.pedmap("path/to/data.ped", "path/to/data.map")
# Create population from imported data
population <- creating.diploid(
dataset = ped_data$dataset, # Genotype matrix
map = ped_data$map, # Genetic map
n.additive = 100
)4.5.3 Converting Genotypes to Haplotypes
Imported genotype data needs proper format:
# If you have a genotype matrix (individuals × SNPs coded 0/1/2)
# Convert to haplotype format for MoBPS
# Example: genotype matrix
geno_matrix <- matrix(sample(0:2, 1000, replace = TRUE),
nrow = 10, ncol = 100) # 10 indi, 100 SNPs
# Convert: each individual becomes 2 haplotypes
haplo_matrix <- matrix(0, nrow = ncol(geno_matrix), ncol = 2 * nrow(geno_matrix))
for (i in 1:nrow(geno_matrix)) {
for (j in 1:ncol(geno_matrix)) {
if (geno_matrix[i,j] == 0) {
haplo_matrix[j, 2*i-1] <- 0
haplo_matrix[j, 2*i] <- 0
} else if (geno_matrix[i,j] == 1) {
haplo_matrix[j, 2*i-1] <- 0
haplo_matrix[j, 2*i] <- 1
} else { # == 2
haplo_matrix[j, 2*i-1] <- 1
haplo_matrix[j, 2*i] <- 1
}
}
}
# Use in MoBPS
population <- creating.diploid(
dataset = haplo_matrix,
n.additive = 20
)4.6 Sex Ratio Control
4.6.1 Controlling Sex Proportions
# Equal sex ratio (default)
pop_equal <- creating.diploid(nsnp = 1000, nindi = 100,
sex.quota = 0.5) # 50% female
# More females
pop_female <- creating.diploid(nsnp = 1000, nindi = 100,
sex.quota = 0.7) # 70% female
# Specify exactly
pop_exact <- creating.diploid(
nsnp = 1000,
nindi = 100,
sex.s = c(rep(1, 30), rep(2, 70)) # 30 males, 70 females
)4.6.2 One-Sex Mode
For plants or situations where sex doesn’t matter:
# All individuals in same group
population <- creating.diploid(
nsnp = 1000,
nindi = 200,
one.sex.mode = TRUE # Deactivate two-sex system
)4.7 Creating Multiple Breeds/Populations
4.7.1 Sequential Addition
Create distinct founder populations:
# Create breed 1
population <- creating.diploid(
nsnp = 5000,
nindi = 100,
n.additive = 100,
founder.pool = 1, # Mark as pool 1
name.cohort = "Breed_A"
)
# Add breed 2 (different allele frequencies)
population <- creating.diploid(
population = population, # Add to existing population
nsnp = 5000,
nindi = 100,
n.additive = 100,
founder.pool = 2, # Mark as pool 2
name.cohort = "Breed_B",
freq = "diff" # Different frequencies
)Uses for founder pools: - Model crossbreeding programs - Track breed composition (admixture) - Assign breed-specific QTL effects - Study heterosis
4.7.2 Adding Chromosomes
Add additional chromosomes to existing population:
# Start with chromosome 1
pop <- creating.diploid(nsnp = 1000, nindi = 100)
# Add chromosome 2
pop <- creating.diploid(
population = pop,
nsnp = 1500,
add.chromosome = TRUE # Add, don't replace
)4.8 Marker Positions
4.8.1 Even Spacing (Default)
# Markers evenly distributed
pop <- creating.diploid(
nsnp = 1000,
chromosome.length = 5 # Spread evenly over 5M
)Best for: Fast computation, when exact positions don’t matter.
4.8.2 From Base Pairs
Convert physical positions to Morgan:
# Provide base pair positions
bp_positions <- seq(1, 100000000, length.out = 5000) # 100 Mb
pop <- creating.diploid(
nsnp = 5000,
bp = bp_positions,
bpcm.conversion = 1000000 # 1 Mb = 1 cM (typical for mammals)
)Common conversion rates: - Mammals: 1,000,000 bp/cM (= 100,000,000 bp/Morgan) - Chicken: 300,000 bp/cM (= 30,000,000 bp/Morgan) - Varies by species and chromosome!
4.8.3 Directly in Morgan
# Provide positions in Morgan directly
positions_M <- c(0.0, 0.01, 0.05, 0.1, 0.15, ...) # Custom positions
pop <- creating.diploid(
nsnp = 1000,
snp.position = positions_M,
position.scaling = FALSE # Don't rescale
)4.9 Advanced Options
4.9.1 Genotyping Arrays
Simulate partial genotyping (chip data):
# Not all SNPs genotyped
pop <- creating.diploid(
nsnp = 50000, # 50K total SNPs
nindi = 1000,
genotyped.s = rep(c(1, 0, 0, 0), 12500), # Every 4th SNP genotyped
share.genotyped = 0.8 # 80% of individuals genotyped
)Uses: - Model cost of genotyping - Test effects of marker density - Simulate imputation scenarios
4.9.2 Size Scaling
When founders are related (real data), scale effective size:
pop <- creating.diploid(
vcf = "real_data.vcf",
size.scaling = 0.7 # Effective size is 70% of actual
)This affects: - Calculations of expected inbreeding - Expected relationships - Effective population size estimates
4.10 Practical Examples
4.10.1 Example 1: Cattle Population
# Realistic cattle breeding population
cattle <- creating.diploid(
nsnp = 50000,
nindi = 500,
template.chip = "cattle",
n.additive = 100,
n.dominant = 20,
beta_shape1 = 0.5, # Some rare variants
beta_shape2 = 1.2,
share.genotyped = 0.8, # 80% genotyped
sex.quota = 0.7, # More females (dairy)
var.target = 100,
name.cohort = "HolsteinFounders"
)4.10.2 Example 2: Maize Inbred Lines
# Fully homozygous inbred lines
maize <- creating.diploid(
nsnp = 10000,
nindi = 20,
template.chip = "maize",
dataset = "homorandom", # Homozygous
one.sex.mode = TRUE, # No sexes
n.additive = 200,
name.cohort = "InbredLines"
)4.10.3 Example 3: Crossbreeding Setup
# Breed A
cross_pop <- creating.diploid(
nsnp = 10000, nindi = 100, n.additive = 100,
founder.pool = 1, name.cohort = "BreedA",
beta_shape1 = 2, beta_shape2 = 2 # Common variants
)
# Breed B (different frequencies)
cross_pop <- creating.diploid(
population = cross_pop,
nsnp = 10000, nindi = 100, n.additive = 100,
founder.pool = 2, name.cohort = "BreedB",
beta_shape1 = 0.8, beta_shape2 = 2, # Rare variants
freq = "diff" # Independent frequencies from Breed A
)4.11 Summary
Key concepts from this chapter:
- ✅
creating.diploid()initializes founder populations - ✅ Control genomic structure (chromosomes, positions, allele frequencies)
- ✅ Import real data from VCF/PLINK or simulate data
- ✅ Use species templates for realistic chromosome structure
- ✅ Create multiple breeds with founder pools
- ✅ Control sex ratios and population composition
4.12 What’s Next?
Now that you can create populations, let’s design the trait architecture - the genetic basis of the traits you want to select on.
Continue to Chapter 5: Trait Architecture!