hide
Free keywords:
-
Abstract:
Plant pathogens have a major impact in both natural and agricultural ecosystems, inducing widespread disease, reduced fitness, and mortality. Genes encoding nucleotide-binding leucine-rich repeat (NLR) proteins are the major class of disease resistance (R) genes in plants, and encode receptors that directly or indirectly detect the molecular signals of pathogens and activate defense response. NLR genes are among the most variable in plant genomes, exhibiting tremendous diversity in sequence and structure. This structural diversity makes NLRs difficult to study but with long read sequencing we can directly sequence complex gene clusters. Here, we assembled the genomes of 18 diverse lines of Arabidopsis thaliana using the PacBio HIFI sequencing technology. We performed comprehensive genome annotation integrating full-length transcript data generated with Iso-Seq, pan-TEome (transposable elements) annotation, CG-methylation, segmental duplications, and recombination to investigate the processes that lead to the birth, death and maintenance of NLR diversity across the species. We found that TEs play a major role in generating structural diversity and that pseudogenization is a major force in moderating the genomic load of active NLRs. We also unravel hidden NLR diversity generated through isoform variation. Our findings give a better understanding of the different strategies used by plants to compete in the defensive arms race against pathogens.