# Connect to SQLite database (example)
library(DBI)
library(RSQLite)
con <- dbConnect(RSQLite::SQLite(), "breeding_data.db")
# Simple query
query <- "SELECT animal_id, birth_weight FROM phenotypes WHERE birth_weight > 40"
result <- dbGetQuery(con, query)
head(result)
# Join query
query2 <- "
SELECT a.animal_id, a.sex, p.weaning_weight
FROM animals a
JOIN phenotypes p ON a.animal_id = p.animal_id
WHERE p.weaning_weight IS NOT NULL
"
result2 <- dbGetQuery(con, query2)
head(result2)
# Disconnect
dbDisconnect(con)14 Practical Animal Breeding and Data Skills
Learning Objectives
By the end of this chapter, you will be able to:
- Understand basics of SQL for querying breeding databases
- Explain what APIs are and how they’re used to access breeding data
- Describe animal identification systems used in livestock
- Distinguish phenotypic culling from genetic selection
- Identify career opportunities and skills needed in animal breeding
14.1 Introduction
[Content to be developed: Theory is essential, but successful animal breeders also need practical skills: data management, database queries, animal identification, and software tools.]
14.2 Database Basics
[Content to be developed: Animal breeding generates massive amounts of data. Understanding how data is stored and accessed is critical.]
14.2.1 Relational Databases
[Content to be developed:]
- Data organized into tables (e.g., animals, phenotypes, pedigrees)
- Tables linked by keys (e.g., animal ID)
- Efficient storage and retrieval of large datasets
14.2.2 Common Tables in Breeding Databases
[Content to be developed:]
- Animals table: animal_id, birth_date, sex, breed
- Pedigree table: animal_id, sire_id, dam_id
- Phenotypes table: animal_id, trait, value, date, contemporary_group
- Genotypes table: animal_id, snp_id, genotype
14.3 SQL Basics
[Content to be developed: Structured Query Language for querying databases.]
14.3.1 What is SQL?
[Content to be developed: Language for managing and querying relational databases. Universal standard used across industries.]
14.3.2 Basic SQL Commands
[Content to be developed: Focus on SELECT, WHERE, JOIN - the essentials for extracting data.]
SELECT Statement
[Content to be developed:]
SELECT animal_id, birth_weight
FROM phenotypes
WHERE birth_weight > 40;Retrieves animal_id and birth_weight for animals with birth weight greater than 40 kg.
Filtering with WHERE
[Content to be developed:]
SELECT *
FROM animals
WHERE sex = 'F' AND birth_date > '2023-01-01';Retrieves all female animals born after January 1, 2023.
Joining Tables
[Content to be developed:]
SELECT animals.animal_id, animals.birth_date, phenotypes.weaning_weight
FROM animals
JOIN phenotypes ON animals.animal_id = phenotypes.animal_id
WHERE phenotypes.weaning_weight IS NOT NULL;Combines data from animals and phenotypes tables.
14.3.3 SQL in Practice
[Content to be developed: Used to extract data for genetic evaluation, generate reports, QC data.]
14.4 Using SQL in R
[Content to be developed: R interfaces with databases.]
14.5 APIs (Application Programming Interfaces)
[Content to be developed: Accessing data and services programmatically.]
14.5.1 What is an API?
[Content to be developed:]
- Interface for software to communicate with other software
- Allows querying databases or services over the internet
- Common in modern breeding systems (genomic databases, genetic evaluation services)
14.5.2 Examples of APIs in Animal Breeding
[Content to be developed:]
- CDCB (dairy): Query genomic evaluations for bulls
- Breed associations: Access EPDs and pedigree data
- Genomic companies: Submit samples, retrieve genomic predictions
14.5.3 Using APIs in R
[Content to be developed:]
# Example: Query an API using httr package
library(httr)
library(jsonlite)
# Example API endpoint (hypothetical)
url <- "https://api.breedingservice.com/v1/ebv"
params <- list(animal_id = 12345, trait = "milk_yield")
# Make GET request
response <- GET(url, query = params)
# Parse JSON response
if (status_code(response) == 200) {
data <- content(response, "text") %>% fromJSON()
print(data)
} else {
cat("Error:", status_code(response), "\n")
}14.6 Data Management
[Content to be developed: Ensuring data quality and integrity.]
14.6.1 Data Entry and Validation
[Content to be developed:]
- Automated vs. manual data entry
- Validation rules (range checks, consistency checks)
- Error detection and correction
14.6.2 Data Cleaning and Quality Control
[Content to be developed:]
- Identify outliers (phenotypes > 3-4 SD from mean)
- Check for missing data
- Detect duplicate records
- Verify pedigree consistency
14.6.3 Example: Data QC in R
[Content to be developed:]
# Example: Detect outliers in weaning weight
library(tidyverse)
# Load data
data <- read_csv("weaning_weights.csv")
# Calculate mean and SD
mean_ww <- mean(data$weaning_weight, na.rm = TRUE)
sd_ww <- sd(data$weaning_weight, na.rm = TRUE)
# Flag outliers (> 3 SD from mean)
data <- data %>%
mutate(outlier = abs(weaning_weight - mean_ww) > 3 * sd_ww)
# Print outliers
outliers <- data %>% filter(outlier)
cat("Number of outliers:", nrow(outliers), "\n")
print(outliers)
# Visualize
ggplot(data, aes(x = weaning_weight, fill = outlier)) +
geom_histogram(bins = 30) +
labs(title = "Weaning Weight Distribution with Outliers",
x = "Weaning Weight (kg)", y = "Frequency") +
theme_minimal()14.7 Animal Identification Systems
[Content to be developed: How to uniquely identify animals.]
14.7.1 Visual ID
[Content to be developed:]
- Ear tags (numbered)
- Tattoos (permanent)
- Brands (hot or freeze)
- Challenges: Tags can be lost, numbers misread
14.7.2 Electronic ID (RFID)
[Content to be developed:]
- RFID ear tags or implants
- Scanned automatically (readers at feeders, scales, gates)
- More reliable than visual tags
- Cost: Higher initial investment
14.7.3 DNA-Based Parentage Verification
[Content to be developed:]
- SNP genotyping used to verify sire and dam
- Corrects pedigree errors
- Especially useful when natural mating is used (multiple sires)
14.7.4 Record Linkage
[Content to be developed: Matching records across databases (phenotypes, pedigrees, genotypes) using unique IDs.]
14.8 Phenotypic Culling vs. Genetic Selection
[Content to be developed: Two different management decisions.]
14.8.1 Phenotypic Culling
[Content to be developed:]
- Remove animals that fail to meet minimum standards
- Examples:
- Cows that don’t breed after multiple services
- Pigs with leg problems
- Animals with severe health issues
- Not genetic selection - based on individual performance, not breeding value
14.8.2 Genetic Selection
[Content to be developed:]
- Choose parents based on estimated breeding values (EBVs)
- Goal: Improve genetic merit of next generation
- Considers information from relatives, genomics
14.8.3 Both Are Important
[Content to be developed:]
- Phenotypic culling ensures only functional animals remain
- Genetic selection drives long-term improvement
- Do NOT confuse the two
14.9 Working with Breeding Companies
[Content to be developed: Career paths and roles.]
14.9.1 Common Roles
[Content to be developed:]
- Geneticist / Breeding Manager: Design breeding programs, analyze data, make selection decisions
- Data Manager: Maintain databases, QC data, generate reports
- Reproductive Technician: AI, embryo transfer, estrus detection
- Field Staff / Technical Service: Work with producers, collect data, troubleshoot
- Bioinformatician: Process genomic data, develop prediction models
14.9.2 Data Flow in a Breeding Program
[Content to be developed:]
- Field: Phenotypes collected (growth, reproduction, health)
- Data entry: Data entered into database, QC checks
- Genetic evaluation: Run BLUP or genomic prediction
- Selection decisions: Rank animals by EBV/GEBV, select parents
- Mating decisions: Allocate matings (OCS, MateSel)
- Feedback: Monitor genetic trends, adjust selection strategy
14.9.3 Software Used in Industry
[Content to be developed:]
- BLUPF90: Free software for genetic evaluation (widely used)
- ASReml: Commercial software for mixed models
- MateSel: Mating optimization
- AlphaSimR: Stochastic simulation (research and teaching)
- Custom software: Many companies develop proprietary tools
14.10 Professional Skills
[Content to be developed: Beyond technical knowledge.]
14.10.1 Communication
[Content to be developed:]
- Explain genetics to producers (non-technical audience)
- Write clear reports and documentation
- Present results to management or clients
14.10.2 Collaboration
[Content to be developed:]
- Work with nutrition, health, and management teams
- Coordinate across nucleus, multiplier, and commercial levels
- Partner with university researchers
14.10.3 Ethics and Integrity
[Content to be developed:]
- Data integrity: Accurate, unbiased data collection
- Animal welfare: Select for health and fitness, not just production
- Transparency: Honest reporting of genetic trends and limitations
14.11 Career Paths in Animal Breeding
[Content to be developed: Opportunities in industry, academia, government.]
14.11.1 Industry
[Content to be developed:]
- Genetics companies (PIC, Genus, Cobb, ABS, etc.)
- Breed associations (Angus, Holstein, etc.)
- AI studs and reproductive services
14.11.2 Academia
[Content to be developed:]
- University faculty (research and teaching)
- Extension specialists (applied research, outreach)
14.11.3 Government
[Content to be developed:]
- USDA (genetic evaluation, research)
- Regulatory agencies (genetic testing standards)
14.11.4 Consulting
[Content to be developed:]
- Independent consultant for breeding programs
- Software development for breeding tools
14.12 Summary
[Content to be developed.]
14.12.1 Key Points
- SQL is essential for querying breeding databases
- APIs enable programmatic access to data and services
- Data management and QC are foundational to genetic improvement
- Animal identification systems (visual, RFID, DNA) ensure accurate record keeping
- Phenotypic culling and genetic selection serve different purposes
- Career opportunities exist in industry, academia, and government
- Communication, collaboration, and ethics are critical professional skills
14.13 Practice Problems
[Problems to be developed]
Write a SQL query to retrieve all animals born in 2024 with weaning weights greater than 250 kg.
Explain the difference between phenotypic culling and genetic selection. Provide an example of each in dairy cattle.
Why is DNA-based parentage verification important in breeding programs? What problems does it solve?
14.14 Further Reading
[References to be added]
- SQL tutorials (W3Schools, Khan Academy)
- R database interfaces (DBI, RSQLite)
- Career resources (Animal Breeding and Genetics jobs, professional societies)