4  Creating Founder Populations

4.1 Learning Objectives

By the end of this chapter, you will be able to:

  • Generate simulated genomic data with different patterns
  • Import real genomic data (VCF, PLINK formats)
  • Set up genetic maps and chromosome structures
  • Control allele frequencies and genetic diversity
  • Create multiple breeds/populations

4.2 Overview of creating.diploid()

The creating.diploid() function is your starting point for every MoBPS simulation. It creates the founder population with:

  • Genomic data (markers/haplotypes)
  • Genetic map (chromosomes, positions)
  • Trait architecture (QTLs, effects)
  • Population structure (cohorts, sex, pools)

4.3 Generating Simulated Data

4.3.1 Basic Random Population

The simplest approach: generate random genotypes.

# Create a simple founder population
population <- creating.diploid(
  nsnp = 5000,          # 5,000 SNP markers
  nindi = 200,          # 200 individuals
  n.additive = 100      # 100 QTLs
)

Default behavior: - Single chromosome (5 Morgan length) - Random genotypes with allele frequency ~ Uniform(0, 1) - 50% male, 50% female - All individuals diploid and unrelated

4.3.2 Controlling Allele Frequencies

Allele frequencies dramatically affect genetic architecture:

# Uniform allele frequencies (default)
pop_uniform <- creating.diploid(
  nsnp = 1000,
  nindi = 100,
  beta_shape1 = 1,      # Beta(1,1) = Uniform(0,1)
  beta_shape2 = 1
)

# Common variants (higher MAF)
pop_common <- creating.diploid(
  nsnp = 1000,
  nindi = 100,
  beta_shape1 = 2,      # Beta(2,2) concentrates near 0.5
  beta_shape2 = 2
)

# Rare variants (low MAF)
pop_rare <- creating.diploid(
  nsnp = 1000,
  nindi = 100,
  beta_shape1 = 0.5,    # Beta(0.5,0.5) concentrates near 0 and 1
  beta_shape2 = 0.5
)

Allele frequency distributions from Beta(α, β)
TipRealistic Allele Frequencies

Real populations often have many rare variants and fewer common variants. Use beta_shape1 = 0.3, beta_shape2 = 1.5 to approximate this.

4.3.3 Initialization Modes

Control starting genotypes with dataset parameter:

# Mode 1: All zeros (000.../000...)
pop_zero <- creating.diploid(nsnp = 1000, nindi = 100,
                              dataset = "all0")

# Mode 2: Fully heterozygous (000.../111...)
pop_het <- creating.diploid(nsnp = 1000, nindi = 100,
                             dataset = "allhetero")

# Mode 3: Random (default, X₁X₂X₃.../X₄X₅X₆...)
pop_random <- creating.diploid(nsnp = 1000, nindi = 100,
                                dataset = "random")

# Mode 4: Homozygous random (X₁X₂X₃.../X₁X₂X₃... same haplotypes)
pop_homo <- creating.diploid(nsnp = 1000, nindi = 100,
                              dataset = "homorandom")

When to use each mode:

  • "random" - General purpose, unrelated founders
  • "homorandom" - Create inbred lines or DH lines
  • "allhetero" - Maximum heterozygosity (F1s)
  • "all0" - Specific starting conditions or testing

4.4 Chromosome Structure

4.4.1 Single vs. Multiple Chromosomes

# Single chromosome (default)
pop_single <- creating.diploid(
  nsnp = 5000,
  chromosome.length = 5  # 5 Morgan
)

# Multiple chromosomes
pop_multi <- creating.diploid(
  nsnp = 10000,
  chr.nr = c(2000, 2000, 3000, 3000),  # SNPs per chromosome
  chromosome.length = c(2, 2, 1.5, 1.5) # Length in Morgan
)

Consequences of chromosome structure: - More chromosomes = more recombination = faster LD decay - Longer chromosomes = stronger linkage between distant markers - Realistic structure improves simulation accuracy

4.4.2 Using Template Species Maps

MoBPS includes common species maps:

# Cattle map
pop_cattle <- creating.diploid(
  nsnp = 50000,
  nindi = 100,
  template.chip = "cattle"   # 29 autosomes
)

# Other available templates
template.chip = "pig"       # Sus scrofa
template.chip = "chicken"   # Gallus gallus
template.chip = "sheep"     # Ovis aries
template.chip = "maize"     # Zea mays

These provide realistic: - Number of chromosomes - Relative chromosome lengths (in Morgan) - Approximate recombination rates

NoteWhat Templates DON’T Include

Templates provide chromosome structure only, not: - Real marker positions (SNPs are evenly spaced) - Real allele frequencies - Real LD patterns

For these, import real data (see next section).

4.4.3 Custom Genetic Maps

Provide a full genetic map:

# Create a custom map
# Columns: chr, snp_name, position_Morgan, position_bp, allele_freq
my_map <- data.frame(
  chr = c(1, 1, 1, 2, 2, 2),
  snp = c("rs001", "rs002", "rs003", "rs004", "rs005", "rs006"),
  pos_M = c(0.0, 0.5, 1.0, 0.0, 0.3, 0.6),
  pos_bp = c(1000, 50000000, 100000000, 1000, 30000000, 60000000),
  freq = c(0.2, 0.5, 0.8, 0.1, 0.45, 0.7)
)

# Use the map
population <- creating.diploid(
  map = my_map,
  nindi = 100,
  n.additive = 50
)

4.5 Importing Real Data

4.5.1 From VCF Files

VCF (Variant Call Format) is standard for genomic data:

# Import VCF file
population <- creating.diploid(
  vcf = "path/to/genotypes.vcf",  # Or .vcf.gz
  n.additive = 100,
  vcf.maxsnp = 10000,             # Optional: limit SNPs
  vcf.maxindi = 500               # Optional: limit individuals
)

What gets imported: - Phased or unphased genotypes - Chromosome numbers - Base pair positions - Marker names (rsIDs)

4.5.3 Converting Genotypes to Haplotypes

Imported genotype data needs proper format:

# If you have a genotype matrix (individuals × SNPs coded 0/1/2)
# Convert to haplotype format for MoBPS

# Example: genotype matrix
geno_matrix <- matrix(sample(0:2, 1000, replace = TRUE),
                      nrow = 10, ncol = 100)  # 10 indi, 100 SNPs

# Convert: each individual becomes 2 haplotypes
haplo_matrix <- matrix(0, nrow = ncol(geno_matrix), ncol = 2 * nrow(geno_matrix))

for (i in 1:nrow(geno_matrix)) {
  for (j in 1:ncol(geno_matrix)) {
    if (geno_matrix[i,j] == 0) {
      haplo_matrix[j, 2*i-1] <- 0
      haplo_matrix[j, 2*i] <- 0
    } else if (geno_matrix[i,j] == 1) {
      haplo_matrix[j, 2*i-1] <- 0
      haplo_matrix[j, 2*i] <- 1
    } else {  # == 2
      haplo_matrix[j, 2*i-1] <- 1
      haplo_matrix[j, 2*i] <- 1
    }
  }
}

# Use in MoBPS
population <- creating.diploid(
  dataset = haplo_matrix,
  n.additive = 20
)

4.6 Sex Ratio Control

4.6.1 Controlling Sex Proportions

# Equal sex ratio (default)
pop_equal <- creating.diploid(nsnp = 1000, nindi = 100,
                               sex.quota = 0.5)  # 50% female

# More females
pop_female <- creating.diploid(nsnp = 1000, nindi = 100,
                                sex.quota = 0.7)  # 70% female

# Specify exactly
pop_exact <- creating.diploid(
  nsnp = 1000,
  nindi = 100,
  sex.s = c(rep(1, 30), rep(2, 70))  # 30 males, 70 females
)

4.6.2 One-Sex Mode

For plants or situations where sex doesn’t matter:

# All individuals in same group
population <- creating.diploid(
  nsnp = 1000,
  nindi = 200,
  one.sex.mode = TRUE  # Deactivate two-sex system
)

4.7 Creating Multiple Breeds/Populations

4.7.1 Sequential Addition

Create distinct founder populations:

# Create breed 1
population <- creating.diploid(
  nsnp = 5000,
  nindi = 100,
  n.additive = 100,
  founder.pool = 1,           # Mark as pool 1
  name.cohort = "Breed_A"
)

# Add breed 2 (different allele frequencies)
population <- creating.diploid(
  population = population,    # Add to existing population
  nsnp = 5000,
  nindi = 100,
  n.additive = 100,
  founder.pool = 2,           # Mark as pool 2
  name.cohort = "Breed_B",
  freq = "diff"               # Different frequencies
)

Uses for founder pools: - Model crossbreeding programs - Track breed composition (admixture) - Assign breed-specific QTL effects - Study heterosis

4.7.2 Adding Chromosomes

Add additional chromosomes to existing population:

# Start with chromosome 1
pop <- creating.diploid(nsnp = 1000, nindi = 100)

# Add chromosome 2
pop <- creating.diploid(
  population = pop,
  nsnp = 1500,
  add.chromosome = TRUE  # Add, don't replace
)

4.8 Marker Positions

4.8.1 Even Spacing (Default)

# Markers evenly distributed
pop <- creating.diploid(
  nsnp = 1000,
  chromosome.length = 5  # Spread evenly over 5M
)

Best for: Fast computation, when exact positions don’t matter.

4.8.2 From Base Pairs

Convert physical positions to Morgan:

# Provide base pair positions
bp_positions <- seq(1, 100000000, length.out = 5000)  # 100 Mb

pop <- creating.diploid(
  nsnp = 5000,
  bp = bp_positions,
  bpcm.conversion = 1000000   # 1 Mb = 1 cM (typical for mammals)
)

Common conversion rates: - Mammals: 1,000,000 bp/cM (= 100,000,000 bp/Morgan) - Chicken: 300,000 bp/cM (= 30,000,000 bp/Morgan) - Varies by species and chromosome!

4.8.3 Directly in Morgan

# Provide positions in Morgan directly
positions_M <- c(0.0, 0.01, 0.05, 0.1, 0.15, ...)  # Custom positions

pop <- creating.diploid(
  nsnp = 1000,
  snp.position = positions_M,
  position.scaling = FALSE  # Don't rescale
)

4.9 Advanced Options

4.9.1 Genotyping Arrays

Simulate partial genotyping (chip data):

# Not all SNPs genotyped
pop <- creating.diploid(
  nsnp = 50000,          # 50K total SNPs
  nindi = 1000,
  genotyped.s = rep(c(1, 0, 0, 0), 12500),  # Every 4th SNP genotyped
  share.genotyped = 0.8  # 80% of individuals genotyped
)

Uses: - Model cost of genotyping - Test effects of marker density - Simulate imputation scenarios

4.9.2 Size Scaling

When founders are related (real data), scale effective size:

pop <- creating.diploid(
  vcf = "real_data.vcf",
  size.scaling = 0.7  # Effective size is 70% of actual
)

This affects: - Calculations of expected inbreeding - Expected relationships - Effective population size estimates

4.10 Practical Examples

4.10.1 Example 1: Cattle Population

# Realistic cattle breeding population
cattle <- creating.diploid(
  nsnp = 50000,
  nindi = 500,
  template.chip = "cattle",
  n.additive = 100,
  n.dominant = 20,
  beta_shape1 = 0.5,      # Some rare variants
  beta_shape2 = 1.2,
  share.genotyped = 0.8,   # 80% genotyped
  sex.quota = 0.7,         # More females (dairy)
  var.target = 100,
  name.cohort = "HolsteinFounders"
)

4.10.2 Example 2: Maize Inbred Lines

# Fully homozygous inbred lines
maize <- creating.diploid(
  nsnp = 10000,
  nindi = 20,
  template.chip = "maize",
  dataset = "homorandom",  # Homozygous
  one.sex.mode = TRUE,      # No sexes
  n.additive = 200,
  name.cohort = "InbredLines"
)

4.10.3 Example 3: Crossbreeding Setup

# Breed A
cross_pop <- creating.diploid(
  nsnp = 10000, nindi = 100, n.additive = 100,
  founder.pool = 1, name.cohort = "BreedA",
  beta_shape1 = 2, beta_shape2 = 2  # Common variants
)

# Breed B (different frequencies)
cross_pop <- creating.diploid(
  population = cross_pop,
  nsnp = 10000, nindi = 100, n.additive = 100,
  founder.pool = 2, name.cohort = "BreedB",
  beta_shape1 = 0.8, beta_shape2 = 2,  # Rare variants
  freq = "diff"  # Independent frequencies from Breed A
)

4.11 Summary

Key concepts from this chapter:

  • creating.diploid() initializes founder populations
  • ✅ Control genomic structure (chromosomes, positions, allele frequencies)
  • ✅ Import real data from VCF/PLINK or simulate data
  • ✅ Use species templates for realistic chromosome structure
  • ✅ Create multiple breeds with founder pools
  • ✅ Control sex ratios and population composition

4.12 What’s Next?

Now that you can create populations, let’s design the trait architecture - the genetic basis of the traits you want to select on.

Continue to Chapter 5: Trait Architecture!