Copy number variation in small nucleolar RNAs regulates personality behavior

Animals show behavioral traits that can collectively be called personality. We focus here on the role of the Prader-Willi Syndrom gene region in regulating personality behavior. It includes two clusters of tandem repeats coding for small nucleolar RNAs, SNORD115 and SNORD116. SNORD115 is known to regulate splicing of the serotonin receptor Ht2cr and SNORD116 is predicted to interact with the transcript of the chromatin regulator Ankrd11. We show that both snoRNA clusters display major copy number variation within and between populations, as well as in an inbred mouse strain and that this affects the expression of their specific target genes. Using a set of behavioral scores related to personality in populations of two species of wild mice, guinea pigs and humans, we find a strong correlation between the snoRNA copy number and these scores. Our results suggest that the SNORD clusters are major regulators of personality and correlated traits. Impact statement Behavioral variation among individuals is regulated by a tandemly repetitive genetic region that can generate new length variants in every generation.


Introduction
The study of consistent individual differences in behaviour, termed "animal personality", has flourished over the last decades, because it has been recognised as a major contributor to differences in survival and fitness among individuals (Reale, 2007;Reale et al., 2010;Wolf and Weissing, 2012).
A major goal is to understand the origin of individual variation in behaviour and the mechanisms generating this variation, especially since it is known to occur also in genetically isogenic strains (Bierbach et al., 2017;Freund et al., 2013;Lewejohann et al., 2011;Wolf and Weissing, 2012).
Personality can be defined as behaviors which are consistent over time and across context (Reale, 2007;Wilson, 1998).Several studies have shown that personality traits have a heritable component and hence, that genetics should play an important role (Dochtermann et al., 2015;Sanchez-Roige et al., 2018;van Oers and Mueller, 2010).However, despite considerable efforts, the genetic pathways that influence personality are largely unknown and there is so far no molecular mechanism that could explain the maintenance of such variation within populations.
A locus that has been implicated in behavioral traits in humans is the Prader-Willi syndrome (PWS) region.PWS is a neurodevelopmental disorder which leads to several abnormalities in cognitive behaviors such as social communication, speech, anxiety, intellectual ability and decision making, but also to metabolic syndromes and craniofacial shape changes (Cassidy et al., 2012).The region is subject to imprinting, i.e. parentally biased expression of genes (Nicholls et al., 1998).The protein coding genes expressed in this region include Ube3a and Snprn, of which Ube3a is expressed from the maternally provided chromosome and Snprn from the paternally provided one.This arrangement has specifically evolved in mammals and is generally conserved among them, including humans (Zhang et al., 2014).
In mice we have identified the PWS region as one of two imprinted regions that evolve particularly fast between natural populations and that could be involved in behavioral differences between them (Lorenc et al., 2014).Laboratory mice mutant for genes or gene regions from the PWS affect various behaviors including anxiety, activity, ultrasonic vocalization, social interaction and metabolism (Cavaille, 2017;Jiang et al., 2010;Nakatani et al., 2009;Qi et al., 2016).
The PWS regions expresses also non-coding RNAs, especially two small nucleolar RNA (SNORD) gene families which are organized in large, tandemly repeated clusters known as SNORD115 and SNORD116 (Cavaille, 2017).The expression of both SNORD115 and SNORD116 is brain-specific and restricted to the alleles on the paternal chromosome.
SNORDs are part of a large group of small, metabolically stable snoRNAs which regulate posttranscriptional modification of their target genes (Kiss, 2001).snoRNAs bind via a complementary antisense region to their target RNAs.The SNORD115 antisense element exhibits complementarity to the alternatively spliced exon V of the serotonin receptor Ht2cr which is part of the serotonin regulatory pathway.It has been shown in cell-culture experiments that there is a positive correlation between SNORD115 expression and exon part Vb usage (Kishore and Stamm, 2006).Activation of the receptor by serotonin inhibits dopamine and norepinephrine release in certain areas of the brain (Alex et al., 2005).This regulates mood, anxiety, feeding, and motoneuron functions (see (Stamm et al., 2017) for a recent review).SNORD116 shows a complementary sequence to Ankrd11 exon X, allowing to suggest a direct regulation (Bazeley et al., 2008).ANKRD11 is a ankyrin repeat domaincontaining protein that acts as a transcriptional co-factor with multiple possible target genes (Gallagher et al., 2015).But this interaction has only computationally been predicted and has not yet been experimentally confirmed.
Here we ask whether natural copy number variation in SNORD115 and SNORD116 could influence behavioral traits in mice and other mammals.We used standardized tests that are mostly connected to anxiety profiles, as well as comparable tests for wild guinea pigs and questionnaire-based tests for humans.We find that there is indeed a strong correlation between copy numbers of the respective SNORD genes and behavioral measures.Intriguingly, this is found not only for wildtype strains, but also for the common laboratory inbred strain C57BL/6J.Using transcriptomic analyses, we show that the predicted regulation of Ankrd11 can indeed be observed and that the network of genes affected by the copy number variation of SNORD116 can explain the behavioral and osteogenic phenotypes.
These observations confirm a direct causative link.Since our data suggest further that new alleles with different copy numbers are generated at an exceptionally high rate, we conclude that this variation could be the basis for the long sought molecular mechanism for the high variance of personality traits in families and populations.

Results
Given the possible molecular links to the regulation of behavioral responses, we first analyzed copy number variation within the SNORD115 and SNORD116 clusters in house mice (M.m. domesticus).
Both snoRNAs are part of a larger transcript, from which they are processed (Cavaille, 2017).The repeat unit length for SNORD115 is 1.97kb, for SNORD116 2.54kb.In the mouse reference genome (Waterston et al., 2002), SNORD115 is annotated with 143 copies and SNORD116 with 70 copies, but with an annotation gap.To type copy number differences in this range, we used droplet digital PCR, which we had previously shown to allow accurate copy number determination across a broad range of CNVs in mice (Pezer et al., 2015).We tested individuals derived from two populations of M. m. domesticus that were originally caught in the wild in Germany (CB) and France (MC) and were kept under outbreeding conditions (Harr et al., 2016).We find an average of 202 SNORD115 copies in CB animals and 170 in MC.A similar difference exists for SNORD116 copies, with 229 in CB and 188 in MC (Figure 1).Although there is a large variation within the populations, these differences between the populations are significant (t-test; p=0.03 for SNORD115 and p=0.015 for SNORD116; all data normally distributed, Shapiro-Wilk normality test p>0.21).Intriguingly, we observe also a strong co-variation of copy number between the two SNORD clusters in both populations, with highly significant correlation coefficients (Figure 1).
Figure 1: Structure of the PWS region and copy numbers of snoRNAs.The scheme depicts the genes in the cluster as arrows.All are paternally expressed, only Ube3a is maternally expressed.Note that the transcript structures are actually more complex and not yet fully resolved (Cavaille, 2017;Lorenc et al., 2014).The tandem clusters coding for the SNORD115 and SNORD116 snoRNAs are depicted by smaller block arrows (not drawn to size).The table below provides the measures and ranges of copy numbers of the clusters in the different species analyzed in the present study.The covariation regression coefficients (R 2 ) between the numbers in the two clusters are listed to the right.The full data are provided in Table S1.
The measured copy number variation shows a highly significant regression with the expression of the respective SNORDs in the brain (Figure 2A,B), suggesting a direct relationship between copy number and expression.Note that given that this locus is subject to imprinting, only the paternal copies would be expressed, implying that the correlation should be below 1, even when a perfect relationship exists between copy number and expression.
Figure 2: Regression between SNORD115 (blue) and SNORD116 (red) copy numbers and expression.A) and B) Regression with the respective RNAs.C) and D) Regression between snoRNA expression and the splice-site regulated target RNAs.All values were determined by ddPCR in brain RNA preparations from eight outbred animals each of the CB and MC populations.These animals constitute a subset of the 45 animals used for overall copy number variation and were chosen to reflect the spread of the variation.The data were combined, since tests for each population on their own yield similar correlations.The full data are provided in Table S2.
Further, we asked whether the target RNA expressions correlate with their respective SNORD expression.The target of SNORD115 is the alternatively spliced exon V of Htr2cr (Kishore and Stamm, 2006) and the predicted target of SNORD116 is exon X of Ankrd11 (Bazeley et al., 2008).
When assaying for the expression of these exons, we find indeed a high correlation with the respective SNORD expression levels and exon expression levels (Figure 2C,D).As a control, we assayed also two transcript variants for each locus that should not be affected by alternative splicing.For these, we find no significant regression with copy numbers of the respective SNORDs (Table S2).

Behavioral tests
To assess whether the SNORD gene copy number variation correlates also with the behavior of the mice, we have employed an aggregate score consisting of three separate standard tests focusing on anxiety.Anxiety is a consistent personality trait in mice (Freund et al., 2013;Lewejohann et al., 2011).Details on the tests and the derivation of the overall score are provided in the Methods section.
In short, the development of the score consisted of separate tests (Open Field Test, Dark/Light Box and Elevated Plus Maze).Single test scores were tested for repeatability over the course of the experiment using intra-class correlation coefficients.Those found to be repeatable were clustered and these clusters were subjected to a principle component analysis.The first principle component was then used as a combined single score for anxiety-related behavior.
We find a highly significant regression between these scores and SNORD copy numbers, with coefficients between 0.42-0.52(Figure 3A-D).The results were analyzed separately for the different populations, since overall copy numbers and overall behavioral scores differ between them.Still, both populations show the same pattern.In each population, animals with higher copy numbers have a higher relative anxiety score.E) for the wood mouse Apodemus uralensis (N=32), F) and G) wild guinea pig Cavia aperea (N=19).Note that the behavioral tests for cavies differ from the tests for the other species.Higher scores represent higher anxiety, i.e. the X-axis is reversed for better comparability.Full data in Table S1.

Inheritance pattern
Given the high variability of SNORD gene copy numbers in the mouse populations, one can ask whether they are subject to frequent changes, such as unequal cross-over events during meiosis.Given the total length of the repeat regions, this can not be directly tested, since the haplotypes can not be distinguished by conventional techniques.However, we approached this question by comparing the inheritance of copy numbers in nine families of the MC population.We find that the offspring shows a large variation, exceeding the spread of numbers measured for the parents in 7 out of the 9 families (Figure 4).Most importantly, if the alleles in the offspring would be only combinations of the parental alleles, one should expect at most four allele classes in the offspring.But families with more than four offspring (families 1, 7, 8 and 9) show additional allele classes for each offspring, suggesting that new length alleles have been generated.

Figure 4: Inheritance patterns of SNORD gene copy numbers (CN) in families.
Nine families of the MC population were tested and the copy numbers were determined for each individual (blue diamonds).The parental copy numbers are marked with an arrow.Note that in 7 out of the 9 families the spread of copy numbers exceeds that of the parents.The correlations with the behavioral tests for these animals are provided in Table S3.
Hence, the data show that at least some new haplotypes are generated in every generation.The experiment shows also that new large variation is generated in each generation, in line with the general observation that personality traits tend to differ between parents and offspring, as well as among the offspring.To confirm that this is also the case here, we subjected all the offspring to the behavioral tests described above.We find indeed a large variation, as well as the expected significant regression with copy numbers (SNORD115: R 2 =0.5, P << 0.001; SNORD116: R 2 = 0.41, P = 0.003) (Table S3).
Further, we asked whether the variability could be so high that somatic variation would become evident.We have therefore scored copy numbers of DNA isolated from different organs of a single animal, but we did not find significant differences in this case (Table S3).While this does not rule out the possibility of very high somatic variation such that every cell might be different, there is at least no obvious pattern that distinguishes organs, including the brain.

Inbred strain variation
It has been shown previously that personality measures can differ even between individuals in an inbred strain (Freund et al., 2013;Jakovcevski et al., 2008;Lewejohann et al., 2011).Hence, we asked whether inbred mice would also show copy number variability for SNORD115 and SNORD116.We used the C57BL/6J mouse strain for this purpose, which is known to harbor extremely little variation in the form of SNP polymorphisms (Zurita et al., 2011).We genotyped 60 individuals of both genders (30 female and 30 male).We find that there is indeed a large variation of copy numbers in these inbred mice, comparable to the spread seen in the outbred cohorts (Table 1).20 mice of each sex were then subjected to the behavioral tests.We found no significant difference between males and females in these tests, but we could confirm the strong correlation with copy numbers (Table S1).

Transcriptome analysis
To investigate transcriptome changes in response to copy number variation of the SNORD genes, we choose ten C57BL/6J individuals representing the spread of copy numbers from low to high and sequenced RNA from their brains.First, we sought to confirm that the target genes would show the expected correlations and the corresponding non-target transcripts of these genes would not show this.
The first analysis focused on Ht2cr as target of SNORD115.We find a high positive correlation between SNORD115 copy number and total read count from the target exon in Ht2cr-201.As expected, there was no significant correlation between SNORD115 copy number and Ht2cr-204 and Ht2cr-206 as non-target transcripts.Similar results are observed for the SNORD116 target Ankrd11.
The target exon of Ankrd11 shows significant correlation to SNORD116 copy number, while there was no significant correlation between SNORD116 copy number and Ankrd11-204 and Ankrd11-207 as non-target transcripts (Table S4).Hence, the RNAseq results confirm the previous observations from the ddPCR experiments (see Table S2).
ANKRD11 is a chromatin regulator which influences the expression of downstream genes by binding chromatin modifying enzymes like histone deacetylases (Neilsen et al., 2008;Zhang et al., 2004).An Ankrd11 knockout study has suggested that several hundred genes are regulated by Ankrd11 (Gallagher et al., 2015).We used a list of 635 of these target genes for a correlation analysis with the Ankrd11-202 transcript abundance.We find that more than 100 of these 635 genes showed a correlation (positive or negative) to Ankrd11-202 expression level.Around 70 percent of them are down-regulated and 30 percent are up-regulated by different Ankrd11-202 levels, with a subunit of the GABA A receptor (Gabrg1) showing the strongest correlation (Table S4).
Gabrg1 is known to play a crucial role in anxiety regulation in both humans and mice (Nuss, 2015;Tasan et al., 2011).To gain further insights into the function of the Ankrd11 downstream genes, gene ontology (GO) and KEGG pathway enrichment analysis were performed using the DAVID online tools.This revealed five different functional categories: (I): metabolic pathway (II): proliferation and differentiation (III): anxiety (IV): intellectual ability formation and (V): osteogenesis (Figure 5).and this affects a whole range of target genes.Among them is the GABA receptor Gabrg1, which is also known to be involved in anxiety responses.Three other groups of genes regulated by Ankrd11 have effects on behavioral phenotypes as well, one group has general effects on cell differentiation and development and one group on osteogenesis.The figure is drawn based on the results provided in Table S4.
In humans, splice site variant mutations skipping exons IX, X and XI of Ankrd11 cause the KGB syndrome (Low et al., 2017;Sirmaci et al., 2011).This is a rare genetic disorder characterized by intellectual disability, autism spectrum disorder, and craniofacial abnormalities (Ka and Kim, 2018).
Hence, this provides further support that the splice-site regulation of Ankrd11 through SNORD116 has a direct functional relevance for behavioral traits.

SNORD116 copy number and craniofacial features
Given that one of the categories of genes affected by Ankrd11 expression level relates to osteogenesis (see above) and a previous study of a particular mutation in the Ankrd11 gene has shown craniofacial abnormalities (Barbaric et al., 2008), we tested whether SNORD116 copy number variation would also have an effect on craniofacial features.We used a 3D landmarking approach to quantify differences in skull shapes based on CT scans, following the procedures described in (Pallares et al., 2014;Pallares et al., 2016).A principle component analysis was then used to assess the results.We find that PC1, which explains 35% of the variance, correlates indeed highly significantly with SNORD116 copy number (Figure 6).This suggests that not only behavioral, but also craniofacial traits are influenced by copy number variation at SNORD116.S5.

Other rodents
Given that the arrangement of the PWS region with the two clusters of snoRNAs is conserved throughout mammals (Sato, 2017), we were interested to assess in how far one can trace the correlation between copy number variation and personality traits also in other species.
We have first analyzed a second mouse species, the wood mouse Apodemus uralensis, which is separated from Mus m. domesticus since about 10 million years and which has different ecological adaptations.Still, given its general similarity to house mice, we applied the same phenotyping scheme.From the currently available genomic data, we could only retrieve sufficient information for the SNORD115 cluster, but this has even higher copy numbers than found in Mus musculus (Figure 1).We find indeed a significant regression with SNORD115 copy numbers and behavioral scores in the wood mice as well (Figure 3E).
As a second, even more distant rodent, we used a cavy species, Cavia aparea, the wild congener of domesticated guinea pigs.For this species we used two behavioral tests that can be considered to reflect anxiety behavior (Guenther and Trillmich, 2015).The current genome sequence of a close relative, Cavia aperea f. porcellus, provides only the annotation for SNORD116 repeats, hence we focused our analysis on this cluster.We find a much smaller range of copy numbers than in mouse (Figure 1), but still a significant regression with the anxiety scores from the two tests (Figure 3F, G).

Humans
To test a possible correlation between SNORD copy numbers and personality traits in humans, we tested a subset of a cohort of 541 healthy individuals that had taken part in a study based on the Tridimensional Personality Questionnaire (TPQ).From these individuals, we chose the top 48 each for low and high anxiety scores and typed them for SNORD copy numbers.SNORD115 in humans is represented by a single class only, while the SNORD116 family is split in three subclasses, for which we designed different ddPCR assays.We find significant differences of copy numbers between the two groups (Table 1) (Table S1).Particularly significant are SNORD115 and SNORD116_2.The latter is the variant that is predicted to bind to Ankrd11 exon X, while the possible target genes for the other two SNORD116 variants are not yet clear.These latter ones show generally only little copy number variation (Table 1).However, we note that the direction of the correlation is different between rodents and humans.In humans, the relatively higher anxiety group has the smaller number of copies, while it is the other way around in the three tested rodent species (see above).

Discussion
Our data establish a link between behavioral scores and copy number variation at the SNORD115 and SNORD116 loci within the PWS region across four species of mammals.This includes also an isogenic mouse strain, which is not expected to carry many other genetic variants apart of the copy number variants (Zurita et al., 2011).Further, our data show that new copy number alleles are generated every generation, which provides an explanation for the high variances observed between parents and offspring, as well as the low general heritability.Importantly, this observation implies also that an artificial manipulation of copy numbers by engineering mouse strains would not be possible, since strains with defined number could not be established.On the other hand, it is known that whole deletions of parts of the PWS region including the snoRNAs lead to complex phenotypes, including behavior, both in mice and humans.Further, the interaction of the snoRNAs with their target genes via sequence complementarity provides a mechanistic explanation for the observed regulatory effects.This, together with the consistency of the correlation across four species establishes a causative link between snoRNA copy number and behavioral traits.Hence, our results establish a new paradigm in the regulation of behavioral variance, namely the involvement of highly mutable regulatory snoRNA clusters.We note that current association mapping protocols would not be able to establish this, since they rely on a stable association with neighboring SNPs, which is not given in the light of the fast generation of new copy number alleles.

Origin of variation
The high level of copy number variation in the SNORD115 and SNORD116 clusters can be ascribed to their tandem repeat organization, which should facilitate unequal crossover and thus length variations of the cluster.Both the family studies, as well as the observation of many alleles in the otherwise isogenic strain C57BL/6J suggest an exceptional high rate of generation of new repeat number variants.C57BL/6J mice were originally generated from a single pair of mice in 1921 and have been kept as a well identifiable strain with very little SNP variation (Zurita et al., 2011).Still, we find at least 25 copy number variants for each SNORD cluster.While the exact number of generations that have passed since the founding of the strain is difficult to ascertain, it would not be more than a few hundred.This suggests at minimum the generation of one new variant every few generations.Our family studies suggest that this may even occur every generation, given that we find more than four length variants among the offspring.Imprinted regions are known to include hotspots of meiotic recombination (Paigen and Petkov, 2010) and this has also been shown for the PWS region in humans (Robinson and Lalande, 1995).However, such an extreme rate of change has not been described so far.
Another very unusual finding is the covariation of repeat numbers between the SNORD115 and SNORD116 clusters.The repeat units of these clusters do not show any sequence similarity and they are separated by a non-repetitive region.We are not aware of any mechanism that could explain this covariation, but it is consistently found in mice and humans, for which we could obtain the respective data.Given the overlapping regulation of the same behavioral pathways by the two clusters, it makes sense that they should show this covariation, but how this is achieved will require further studies.

Target pathways
Our transcript analysis data in response to SNORD115 copy number variation confirm the known interaction and regulation of the alternatively spliced exon Vb of the serotonin receptor Ht2cr through this snoRNA (Kishore and Stamm, 2006).We suggest that the corresponding predicted interaction of SNORD116 with Ankrd11 may also be a direct interaction, based on our data.This can be inferred from the specificity of the effect on the predicted interacting exon X, in comparison to the transcripts of the locus that do not contain this exon and which do not show a dependence on copy number variation.
There are a number of mouse knockout lines that delete more or less large parts of the PWS region.
However, only one knockout line exists that has removed specifically one SNORD gene cluster, namely SNORD116.The mice from this line show an early-onset postnatal growth deficiency, are deficient in motor learning and show increased anxiety (Ding et al., 2008).A further analysis of a full deletion variant of this line has revealed further effects on feeding related pathways, as well as bone mineral density (Qi et al., 2016).These findings are in line with a disturbance of the multiple pathways that are regulated by Ankrd11 as a direct target gene of SNORD116.
Interestingly, we find that not only behavior, but also craniofacial shape is under the control of the SNORD116 gene cluster copy number variation.For humans it has been suggested that there is indeed a link between personality and facial characteristics (Kramer and Ward, 2010;Squier and Mew, 1981), but a possible causality of these observations was left open.Our data suggest such causality for mice.But given that mice are essentially nocturnal animals that communicate mostly via scents and ultrasonic vocalization, it is unclear whether they would even recognize different craniofacial shapes among their conspecifics.However, there could be a general osteogenic effect on the whole bone system that could be of relevance for being combined with the behavioral tendency.
For example, bold animals might profit from stronger bone structures, in case they get more involved in fights.However, this connection will need further study, especially since the SNORD116 expression is confined to the brain.Qi et al. (Qi et al., 2016) have suggested that a general osteogenic effect may be mediated via the metabolic pathways that are regulated by Ankrd11.

Implications for the genetics of personality traits
Our findings can resolve a long standing controversy about the genetics and plasticity of personality traits.Our data suggest that these traits are indeed genetically specified, but through a hypervariable locus that would escape conventional genetic mapping approaches.Hence, it is not necessary to invoke developmental or secondary epigenetic effects in explaining variance in personality traits, given that there is a mechanistic pathway of modulation of target RNA expression through different concentrations of snoRNAs.
The fact that we find significant regressions between personality scores and SNORD copy numbers also in other rodents and even in humans, implies that the system is conserved throughout mammals.
In humans, we observed this even within a psychiatrically healthy control group, where we find that extremes of personality traits for anxiety and activity are associated with significant differences in copy numbers.The effects are somewhat weaker than for the rodents, but humans are evidently also more subjected to environmental influences than the animals that were bred under controlled conditions.Most intriguingly, however, the effect in humans is in the opposite direction, i.e. more anxious individuals harbor lower copy numbers.This would suggest that the copy number variation acts only as a general regulator, but the actual behavioral consequences are modified by downstream pathways.The existence of such modifiers is also suggested in the comparison between the two mouse populations in our study.MC mice have on average lower copy numbers than CB mice, but they are overall more anxious when compared at a population level (Linnenbrink et al. unpublished observations).Also the fact that a complete deletion of the SNORD116 cluster in inbred mice leads to very anxious animals (Ding et al., 2008) suggests that modifiers must play a role as well.We suggest that it should be possible to identify such modifiers in association studies when one controls for the SNORD115 and SNORD116 copy numbers as a covariate.Because of the variation induced by these snoRNAs, they may have been hidden in the background so far.In fact, it may generally become necessary for behavioral studies to include the SNORD115 and SNORD116 copy numbers as covariates, to reveal new patterns independent of this variation.

Evolutionary implications
One can raise the question of how and why such a hypervariable system could have evolved and how it is maintained.This question has in fact been posed since a long time and there have been a number of attempts to propose evolutionary models that could explain the large variability in behavioral traits.
These include drift effects in a neutral context (Tooby and Cosmides, 1990), mutation-selection balance (Zhang and Hill, 2005) or balancing selection processes (Dingemanse and Wolf, 2010;Penke and Jokela, 2016).But none of these models has taken the possibility into account that a hypervariable locus could control this variation.Evidently, we need to caution that that the specific mechanism that is revealed by our study is only applicable to mammals, while the question of variability of personality traits is an issue across all animal taxa.Still the system found in mammals could provide a guidance towards finding comparable systems also in other taxa.

Mouse strains
Wildtype mice used in this study were offspring of mice that originated from wild populations sampled in the Massif Central region of France (MC) and the Cologne/Bonn region of Germany (CB) in 2004 and 2005 and then held under outbreeding conditions at the Max-Planck-Institute for Evolutionary Biology in Plön (see supplementary files in (Harr et al., 2016) for details).The C57BL/6J inbred strain was purchased at the age of 3 weeks from Jackson Laboratory.
Mice were usually kept in type III cages (Bioscape, Germany) and were weaned at the age of 3 weeks.Males were housed together with brothers or in individual cages.Females were housed in sister groups to a maximum of 5 mice per cage.Enrichment, including wood wool, toilet paper, egg cartons and a spinning wheel (Plexx, Netherland), was provided in each cage.Mice were fed standard diet 1324 (Altromin, Germany) and provided water ad libitum.Housing prior to experiments was at 20-24°C, 50-65% humidity and on a 12:12 light-dark schedule with lights on at 7 am.

DNA and RNA extraction
All dissections were done by following standardized protocols and personal instructions.Prepared tissues were immediately frozen and kept at −70 ° until DNA/RNA preparation.
DNA extraction was performed according to a standard salt extraction protocol.Briefly, samples were lysed by using HOM buffer (80 mM EDTA, 100 mM Tris and 0.5 % SDS) with Proteinase K (0.2 mg/mL) for 16 hours in Thermomixer (Eppendorf, Germany) at 55°C.500 μL sodium chloride (4.5 M) was added to each sample and was incubated on ice for 10 minutes.Then chloroform was added, mixed and spun for 10 minutes at 10,000 rpm.The upper aqueous phase was separated, mixed well with Isopropanol (0.7 volume) and spun for 10 minutes at 13,000 rpm.The pellet was washed with Ethanol (70 %), air dried and dissolved in TE-buffer (10 mM Tris, 0.1 mM EDTA).DNA concentration was measured on the Nano Drop 3300 Fluorospectrometer using Quant-iT dsDNA BR Assay kit (Invitrogen) reagent.
RNA extraction was done by using Trizol reagent.1mL Trizol per 40mg tissue was added to each sample.Then the samples were lysed by Tissue lyser II (QIAGEN, Germany) at 30 Hertz for 5 minutes.Homogenized samples were incubated at room temperature for 5 minutes.200μL chloroform (per 1 mL TRIzol) was added to each sample, shook vigorously by hand 15 seconds, followed by 3 minutes incubation at room temperature and spun at 12,000g for 15 minutes at 4°C.The aqueous phase was transferred to a new tube and 0.5 volumes Isopropanol was added, incubated at room temperature for 10 minutes and spun at 12,000g at 4°C.The supernatant was removed and the pellet was washed with 75% EtOH (made with RNAse-free water).Samples were mixed by hand several times and then spun at 7,500g for 5 minutes at 4°C.The supernatant was removed and the pellet dried shortly at room temperature, dissolved in 200μl RNAse free water and stored at -20°C for overnight.An equal volume of LiCL (5M) was added to the crude RNA extract, mixed by hand and incubated for one hour at -20°C.Samples were spun at 16,000g for 30 minutes.The supernatant was removed; samples were washed twice with EtOH 70% and spun at 10,000 at 4°C.The pellet was dried at room temperature, dissolved in RNAse free water and kept for long-term storage at -70°C.
The quality of the RNA samples were measured with Bio-Analyzer chips and samples with RIN values below 7.5 were discarded.cDNA was synthesized using the MMLV High Performance Reverse Transcriptase kit according to the instructions of the supplier (epicenter, an Illumina company).

Small RNA extraction and cDNA synthesis
Total RNA which is enriched in small RNA was extracted by using the mirVana miRNA Isolation Kit.The quality of the RNA was measured with BioAnalyzer chips and samples with RIN values below 8 were discarded.Illumina® TruSeq® Small RNA Library Prep kit was used for small RNA cDNA synthesis to be used for the ddPCR. 1 μg of total RNA which was enriched in small RNA was used as input.

RNAseq analysis
Poly-A + RNA was used for cDNA synthesis and Illumina library preparation by using the Truseq stranded RNA HT kit.The libraries passing quality control were subjected to sequencing on an Illumina NextSeq 500 sequencing system.Raw sequence reads were quality trimmed using Trimmomatic (Bolger et al., 2014).The quality trimming was performed base wise, removing bases below quality score of 20 (Q20), and keeping reads whose average quality was of at least Q60.Reads were mapped to the mouse mm10 reference genome (Waterston et al., 2002) by using Hisat2 (Kim et al., 2015).Htseq was used for counting reads overlapping with a specific feature (gene) (Anders et al., 2015).Differential expression analysis was performed with the DESeq2 package (Love et al., 2014) in R environment.To perform the alternatively spliced isoforms analysis, BAM files from Hisat2 were used as input for SAMtools (Li et al., 2009) by using option -c for total read from each exon and option -q 60 for total read of each sample.GO and KEGG pathway enrichment analyses were performed using DAVID online tools (Version 6.8, https://david-d.ncifcrf.gov/),with the classification stringency set to "medium" P value of <0.05

Droplet digital PCR
Digital PCR is a method enabling absolute quantification of DNA targets without the need to construct a calibration curve as used commonly in qPCR (Zhao et al., 2016).It requires a reference to calculate gene copy number and to normalize gene expression level.For the latter we used β-catenin for mRNAs and SNORD66 for standardizing SNORD RNAs.SNORD 66 is a single copy gene located in an intron of the eukaryotic translation initiation factor 4 and was therefore used also as reference gene for copy number calculations.
Primers were designed with 50-70 bp amplicon length, GC content <50% and with very low potential for primer dimer structure.For tandem copy separation, sample viscosity reduction and improved template accessibility, DNA digestion was done with EcoRI for SNORD115 and BamHI for SNORD116.These enzymes cut only once in the repeated units.5 units of restriction enzymes were added to each sample.Then the samples were kept 20 minutes at room temperature for complete digestion.

Gene name
Droplets were generated using a Droplet Generator (DG) with an 8-channel DG8 cartridge and cartridge holder with 70 μL of DG oil/well, 20 μL of fluorescent PCR reaction mixture and a DG8 gasket.The prepared droplets were transferred to corresponding wells of a 96-well PCR plate (Eppendorf, Germany).
The PCR plate was subsequently heat-sealed with pierceable foil using a PX1™ PCR plate sealer (Bio-Rad) and then amplified in a LifeEco thermal cycler (Bioer, China).The thermocycling protocol was: initial denaturation at 95°C for 5 min, then 40 cycles of denaturation at 95°C for 30 s, annealing at 60°C for 45 s and, finally, incubation first at 4°C for 5 min and then at 90°C for 5 min.After cycling, the 96-well plate was fixed into a plate holder and placed into the Droplet Reader.Droplets of each sample were analyzed sequentially and fluorescent signals of each droplet were measured individually by a detector.Copy numbers were calculated according to the procedures suggested by BioRad.

Behavioral Tests
Behavioral tests were performed on the M. m. domesticus mice starting at an age of 24 weeks, on C57BL/6J inbred strain at the age of 16 weeks and on the A. uralensis mice at the age of 18 weeks.The testing included an Elevated Plus Maze, an Open Field and a Dark/Light Box test, all adapted towards the use of wild mice to avoid escape.Test setups were cleaned with 30% ethanol between individual tests.
The Elevated Plus Maze consisted of four arms, each 50 cm long and a 10x10 cm neutral area in the middle.Two of the arms were made of clear Plexiglas, indicating the unsafe zone and two were made of grey PVC, indicating a safe zone.The floor was made of white PVC.Mice were placed in the center and the behavior of each mouse was monitored for 5 min (Holmes et al., 2000).During this experiment, the time spent in the dark and light arms were measured, as well as the speed and distance travelled.
For the Open Field test, mice were placed in a 60x60 cm arena with 60 cm high grey PVC walls and they were allowed to explore it for 5 min (Reale, 2007;Wilson et al., 1994;Yuen et al., 2015).The speed of the mouse, the distance travelled and time spent within 10 centimeters off the wall vs in the central area were measured.
For the Dark/Light Box, the focal mouse was placed in a test apparatus containing a small dark shelter with two exits.During the first five minutes, the time until the mouse pokes its nose out of the shelter and the first time the tail is visible, was recorded.At five minutes, a set of keys was dropped next to the test apparatus from 136 cm height, and the second part of the experiment began.The time it took for the mouse to first look out and when the entire mouse was visible were measured.If mice did not come out at all, the time was set to be 600 seconds.This test was adapted from tests in inbred mice and common voles (Herde and Eccard, 2013;Young and Johnson, 1991).
The behavioral tests were filmed using a TSE camera (TSE system, Germany).To score the videos from each test, all the videos were transferred to Videomot2 system (TSE system, Germany).Mice were detected by the software in 3 points (head/center/tail base tracking) and then the software automatically generates the numerical data of the time that each mouse spent at zones of interest.
Since personality is defined as consistent behavioral traits, we needed only those measurements which were consistent over the course of the experiment.Hence, each behavioral test was repeated every 4 weeks for three times.
Statistical analysis were carried out using R 3.3.3and R 3.3.2 .As the repeatability of a behavior is a key component for the identification of personality trait, all single measurements assessed in the behavioral tests were subjected to repeatability analysis.Repeatability was calculated using "rptR" package (Nakagawa and Schielzeth, 2010).Normally distributed data were calculated using rpt (anova based) and for count data rpt.poisGLMM was used.ID was used as a random effect in these models.
To determine whether individual behavioral measurements are correlated, a Spearman correlation matrix was generated.P-values were corrected using the Holm method.Behaviors were clustered using the protocol from (Herde and Eccard, 2013).An hierarchical cluster function was used from the R package "cluster", specifically "agnes", to determine the relationship between the measurements.All measurements were clustered using Manhattan clustering with complete linkage (Gyuris et al., 2011;Herde and Eccard, 2013;Tremmel and Muller, 2013).Further details on the validation of the statistics are provided in the thesis of Rebecca Krebs (Krebs 2018, University of Kiel).
The behavioral tests on the wild cavies (Cavia aperea) were conducted as described in (Guenther and Trillmich, 2015).In short, to measure struggle docility, the animal was turned on its back and held in the hand of an observer.For 30 s, the time the animal struggled to actively escape that situation was recorded.To measure open field anxiety, the distance moved (cm) when individuals were exposed to an open field for 20 min, was scored.The first 10 min, a shelter was present in the arena under which animals could hide.For the second 10 min, this shelter was lifted out of the arena.

Craniofacial shape analysis
The morphometric analysis was performed as described previously (Pallares et al., 2015).Briefly, mouse heads were scanned using a computer tomograph (micro-CT-vivaCT 40; Scanco, Bruettisellen, Switzerland) at a resolution of 48 cross-sections per millimeter.Using the TINA landmarking tool (Schunke et al., 2012), 36 three-dimensional landmarks were positioned in the skull.The raw 3D landmark coordinates obtained in TINA tool were exported to MorphoJ (Klingenberg, 2011) for further morphometric analyses.
The symmetric component of the skull was obtained following (Klingenberg et al., 2002).In short, a mirror image of the landmark configuration of each individual was generated, and a full GPA was performed with the original and mirror configurations.Again, the resulting con-figurations were averaged to obtain the symmetric component of shape variation.The new landmark coordinates generated by the GPA are called "Procrustes coordinates".
Shape features were computed in MorphoJ principal components (PCs) from the n x 3k covariance matrix of Procrustes coordinates, where n is the number of samples and k is the number of landmarks; 3k represents the number of Procrustes coordinates.PC loadings computed in this analysis are defined as morphological score in this study.
Human cohort and evaluation of personality traits.
541 healthy controls were randomly selected from the Munich registry of residents and interviewed for the presence of DSM-IV anxiety, affective, somatoform, eating, alcohol dependence, drug abuse or dependence, disorders using a modified version of the Munich Composite International Diagnostic Interview (Wittchen and Pfister, 1997) at the MPIP.Only individuals negative for the above-named disorders were included in the context of a large study on the genetics of major depression using EDTA blood as the source for DNA (see (Heck et al., 2009) for more detail).Five hundred and fortyone individuals also completed the Tridimensional Personality Questionnaire (TPQ), a validated personality measure (Weyers et al., 1995).To match the behavioral assessments in mice, single items of the TPQ reflecting anxiety and activity were used to select individuals on the extremes of this distribution.Individuals can score from 0 to 6 on the activity and anxiety scale, respectively and their sum was used to identify the extreme groups, while matching for age and sex.The high anxiety and activity group (N = 48) had a mean additive score of 7.35 (SD 1.04).The low anxiety and activity group (N = 48) had a mean additive score of 0.73 (SD 0.49).The mean age of the high anxiety group was 42.3 year and 41.6 years for the low anxiety group with 33.3% and 62.5% females, respectively.

Figure 3 :
Figure 3: Regression between snoRNA gene copy numbers and behavioral scores.A), C), E) for SNORD115, B), D), F) and G) for SNORD116.A) and B) for the M. m. domesticus CB population (N=22), C) and D) for the M. m. domesticus MC population (N=23), E) for the wood mouseApodemus uralensis (N=32), F) and G) wild guinea pig Cavia aperea (N=19).Note that the behavioral tests for cavies differ from the tests for the other species.Higher scores represent higher anxiety, i.e. the X-axis is reversed for better comparability.Full data in TableS1.

Figure 5 :
Figure5: Overview and depiction of genes and processes that are influenced by SNORD115/116 copy number variations (CNV) at the transcriptional level.SNORD115 CNV targets directly the expression of the serotonin receptor Ht2cr and would this is known to be involved in anxiety responses.SNORD116 CNV, on the other hand, has as direct target the chromatin regulator Ankrd11 and this affects a whole range of target genes.Among them is the GABA receptor Gabrg1, which is also known to be involved in anxiety responses.Three other groups of genes regulated by Ankrd11 have effects on behavioral phenotypes as well, one group has general effects on cell differentiation and development and one group on osteogenesis.The figure is drawn based on the results provided in TableS4.

Figure 6 :
Figure 6: Skull shape changes in response to SNORD116 copy number variation.Left: depiction of landmarks used for morphometric analysis, right: plot of SNORD116 copy numbers (x-axis) versus PC1 values of the principal component analysis.Landmark descriptions an plot data are provided in TableS5.

Table 1 : SNORD copy numbers (mean) for humans scoring low or high on anxiety and activity
Bio-Rad, Hercules, CA, USA) was used in this study according to the manufacturer's instructions.Briefly, fluorescent PCR reactions for each sample were prepared in a 23 μL volume containing 12μL 2 X EvaGreen supermixes, 200nM of each forward and reverse primers, 1ng DNA or cDNA and water.