Background: We investigated the features of the genomic rearrangements in a cohort of 50 male individuals with proteolipid protein 1 (PLP1) copy number gain events who were ascertained with Pelizaeus-Merzbacher disease (PMD; MIM: 312080). We then compared our new data to previous structural variant mutagenesis studies involving the Xq22 region of the human genome. The aggregate data from 159 sequenced join-points (discontinuous sequences in the reference genome that are joined during the rearrangement process) were studied. Analysis of these data from 150 individuals enabled the spectrum and relative distribution of the underlying genomic mutational signatures to be delineated. Methods: Genomic rearrangements in PMD individuals with PLP1 copy number gain events were investigated by high-density customized array or clinical chromosomal microarray analysis and breakpoint junction sequence analysis. Results: High-density customized array showed that the majority of cases (33/50; ~ 66%) present with single duplications, although complex genomic rearrangements (CGRs) are also frequent (17/50; ~ 34%). Breakpoint mapping to nucleotide resolution revealed further previously unknown structural and sequence complexities, even in single duplications. Meta-analysis of all studied rearrangements that occur at the PLP1 locus showed that single duplications were found in ~ 54% of individuals and that, among all CGR cases, triplication flanked by duplications is the most frequent CGR array CGH pattern observed. Importantly, in ~ 32% of join-points, there is evidence for a mutational signature of microhomeology (highly similar yet imperfect sequence matches). Conclusions: These data reveal a high frequency of CGRs at the PLP1 locus and support the assertion that replication-based mechanisms are prominent contributors to the formation of CGRs at Xq22. We propose that microhomeology can facilitate template switching, by stabilizing strand annealing of the primer using W-C base complementarity, and is a mutational signature for replicative repair.
Keywords: PMD; Genomic rearrangements; Genome instability; Duplication; LCR; RBM; HR; BIR; MMBIR; Microhomeology
Grace M. Hobson and James R. Lupski contributed equally to this work.
Architectural features of the human genome, such as low copy repeats (LCRs) or segmental duplications (SegDup), are associated with genome instability and large-scale genomic changes [[
At the PLP1 locus, nucleotide substitutions and copy number gain events are associated with PMD [[
Mutagenesis mechanisms that underlie structural variation in nonrecurrent rearrangements include non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), break-induced replication (BIR), and Fork Stalling and Template Switching (FoSTeS)/microhomology-mediated break-induced replication (MMBIR) [[
Key to the delineation of structural variant mutagenesis mechanisms has been the determination of copy number states at a given locus that deviate from a control diploid genome and the delineation of breakpoint junctions. Breakpoint junctions are the end-products of recombination between substrate pairs in which the individual substrate sequences map to two different positions on the haploid reference genome (Fig. 1a). Breakpoint junctions seen on array comparative genomic hybridization (aCGH) are signified by a transition state from normal copy number to gain or loss of genomic segments. At the nucleotide sequence level, the breakpoint junction may reveal specific "signature sequences" that can include microhomology, blunt-end fusion of DNA substrate sequences, or the relatively newly recognized microhomeology (Fig. 1a). Microhomology refers to sequence identity (usually 2–9 bp) found at the recombinant junction and represented in both sequences of the substrate pair, but reduced from 2 to 1 copy at the junction. It has been proposed that microhomology facilitates TS and is consistent with non-homologous recombination because the extent of homology is far below the minimal efficient processing segment for homologous recombination (HR) [[
Graph: Fig. 1 Genomic rearrangements with different levels of complexity. At the array-resolution level, genomic rearrangements with the PLP1 gain can be apparently simple as a a single duplication or b a CGR. In aCGH figures, transitions of copy number alterations from copy neutral regions (black dots) to copy number gains (red dots) are demonstrated by gray vertical dashed arrows (breakpoints). At the nucleotide sequence level as shown in a, in the simplest case scenario, a single duplication has a breakpoint junction with only one join-point (a—left), a product of one TS by NHEJ (for blunt end), or microhomology and/or microhomeology-mediated rearrangement. Or, a breakpoint junction can contain several join-points (a—right). Such breakpoint junctions are products of iterative TS by different rearrangement mechanisms such as NHEJ or MMBIR. Bases indicated in red are in both the proximal and distal reference sequences. Rectangle with diagonal lines indicates a region of imperfect match between proximal and distal reference sequences. In addition to the iterative TS that lead to the appearance of complex breakpoints, iterative TS can result in copy number transitions of large genomic segments and formation of more complex genomic structures. b As a representative of such complex genomic structures, a schematic figure of a CGR with DUP-TRP/INV-DUP pattern resulted from two TSs creating breakpoint junctions Jct1 and Jct2, as shown. The horizontal bar below the aCGH depicts the rearrangement product. Duplications are represented in red and triplication in blue; yellow arrows represent inverted low copy repeats that mediate the TS in Jct1. Positions of the genomic segments are denoted as a, b, and c, duplicated segments as a′, b′, and c′, and the triplicated segment as b″. The TS between low copy repeats forming Jct1 switched the direction of replication resulting in an inversion of the TRP segment, and the second TS forming Jct2 switched the direction of the replication again resulting in directly oriented DUP segments. The Y-axis on the aCGH plots represents expected log2 ratios in male using a gender-matched control and that PLP1 maps to chromosome X. Jct: junction; JP: join-point
Iterative TS can result in complexities at breakpoint junctions with multiple join-points (Fig. 1a) wherein discontinuous sequences in the haploid reference are apparently "stitched" together in a template-driven directional way (i.e., priming strand versus target annealing strand) [[
Due to the relative rarity of PMD and the limited genomic resolution of clinical testing, the frequency of each particular type of CGR and the mutational signature(s) accompanying mutagenesis remain elusive. Investigating the complexities of genomic architecture and rearrangements at the PLP1 locus provides insights into the underlying mechanisms of genomic rearrangements in PMD. In addition, understanding architectural features of the genome potentially rendering susceptibility to genomic instability may help to predict loci with inherent genome instability [[
A total number of 50 male individuals with PMD were identified with an increased PLP1 gene copy number. Before performing customized high-resolution aCGH, most cases had been tested by either Affymetrix whole-genome microarray or NimbleGen X chromosome array and all cases had been tested by multiplex quantitative PCR throughout duplicated regions as described [[
To fine map the genomic rearrangements to genome-level resolution, we used a custom-designed, high-density oligonucleotide array from Agilent. The array comprises approximately 44,000 interrogating oligonucleotides spanning chrX: 98,028,855-113,513,744 (NCBI build 37/hg19) with an average genome resolution of 386 bp between probes (chrX: 97,915,511-113,400,000 in NCBI build 36/hg18 was converted to GRCh37/hg19 using UCSC Genome Browser; https://genome.ucsc.edu/cgi-bin/hgLiftOver). The experimental procedures were performed according to the manufacturer's protocol (Agilent Oligonucleotide Array-Based CGH for Genomic DNA Analysis, Version 7.2, Agilent Technologies) with some modifications as described [[
A whole-genome Cytogenetics 2.7M array (Affymetrix) was performed at the Coriell Institute Sequencing and Microarray Center to determine copy number changes on chromosome Yq of individual BAB8921. The array had an average marker spacing of 1086 bases between probes. The NCBI build 36/hg18 coordinates were converted to GRCh37/hg19 by using the Lift Genome Annotations tool at https://genome.ucsc.edu/cgi-bin/hgLiftOver.
Rearrangements in individual BAB8934 exceeded the coverage of our custom-designed high-density aCGH. A custom-designed oligoarray, BCM V11.2, was performed for this individual as described [[
Sample BAB8959 was genotyped using an Agilent Infinium CoreExome-24 version 1.3 genome-wide single nucleotide polymorphism (SNP) array at the human genome sequencing center (HGSC) at Baylor College of Medicine in Houston, TX. Of the 240,000 SNPs present on the array, 60 were located within the duplication of this sample for which the genotype was individually assessed.
A lymphoblastoid cell line was cultured from patient BAB8921 according to standard protocols. Metaphase chromosomes and interphase nuclei were prepared from the cell line and FISH was performed as described using a cosmid DNA probe containing the PLP1 gene (cU125A1) and an X-centromeric probe [[
Genomic positions of putative breakpoint junctions for CNVs were identified using the coordinates of interrogating oligonucleotides mapped to the upstream and downstream ends of each CNV. For both array-based single duplications as well as CGRs, outward primers were designed inside the duplication and close to predicted breakpoints. PCR was performed assuming the duplicated sequences are in a tandem orientation for single duplications or using a combination of outward primers (designed inside duplications) for CGRs. For deletions, inward primers were designed outside of the deleted regions. Breakpoint junctions were obtained by long-range PCR using TaKaRa LA Taq according to the manufacturer's protocol (TaKaRa Bio Company, Cat.No.RR002). The experimental procedures were performed as described [[
We aligned the breakpoint junction sequence with the proximal and distal ends of each breakpoint using the reference genome. Shared 100% nucleotide identity between the 5′ and 3′ reference strands at the join-point was considered microhomology [[
We analyzed the similarity of DNA sequences that are surrounding breakpoints using the R programming language [[
We performed custom-designed aCGH to better understand the full spectrum of copy number alterations at the PLP1 locus. Results showed that rearrangement products were nonrecurrent (Fig. 2). Single duplications varying from ~ 122 kb to ~ 4.5 Mb were seen in 66% of cases (33/50) (Additional file 1: Figures S1-S4 and Table 1, and Additional file 2: Table S1). The smallest region of overlap (122 kb), which included genes GLRA4, TMEM31 (embedded within GLRA4), and PLP1, is represented by the duplication in individual BAB8968 (Additional file 1: Figure S1–6). The largest duplication was found in individual BAB8954 and spanned ~ 4.5 Mb including 62 genes (ChrX: 99,762,680-104,246,638, GRCh37/hg19) (Additional file 1: Figure S1–4).
Graph: Fig. 2 An overview of genomic rearrangements as seen on aCGH in 50 individuals with PMD. Genomic rearrangements at Xq22 vary in size and genomic positions. The largest duplication (~ 4.5 Mb) is found in individual BAB8954. Three individuals show additional duplications distant from the duplicated PLP1 locus (individuals BAB8920, BAB8923, and BAB8934). The black numbers refer to genomic coordinates on chromosome X. The left column lists the 50 subjects studied. Slash lines indicate a break in numbering for genomic coordinates. The location of PLP1 is indicated by a black vertical broken line
Genomic rearrangement pattern at the PLP1 locus in this study
Rearrangement product pattern Frequency ( Single duplication 66% (33/50) DUP-NML-DUP 18% (9/50) DUP-TRP-DUP 6% (3/50) Other CGR 10% (5/50)
We detected CGRs in 17 individuals (34%) (Table 1 and Additional file 2: Table S2). Nine had an aCGH pattern of interspersed duplications separated by a copy neutral region (CNR), a pattern previously described as DUP-NML-DUP (Fig. 3a) [[
Graph: Fig. 3 CGRs detected by aCGH at the PLP1 locus. a Two duplications separated by CNRs were detected on aCGH in 9 individuals with PMD. The distance between the two duplications differs among these individuals, ranging from 16 to 7863 kb. In the schematic figure below each array, duplications are depicted in red and CNRs in gray. Three cases (BAB8940, BAB8955, and BAB8960) could be single duplications on the H2 inversion haplotype or could be two duplications with one TS involving reversal of the direction of replication between IRs LCRA1a and LCRA1b (Additional file 1: Figure S9); three (BAB8923, BAB8928, and BAB8965) have directly oriented DUP-NML-DUP structures (Additional file 1: Figures S6–1, S6–2 and S6–3); one has two tandem head to tail duplications (BAB8962; Additional file 1: Figure S6–4); and two (BAB8920, BAB8934) have DUP-NML-INV/DUP structures (Additional file 1: Figure S7). b A DUP-TRP-DUP pattern of rearrangement was detected on aCGH in three individuals with PMD (Additional file 1: Figure S10). Breakpoint junction analyses indicated that one of these individuals (BAB8964) probably has the previously reported DUP-TRP/INV-DUP pattern of rearrangement with inversion mediated by a TS between inverted repeats LCRA1a and LCRA1b. Based on aCGH data, BAB8970 probably has the same structure, although breakpoint junctions were not resolved (Additional file 1: Figures S10–1 and S10–2). Breakpoint junction analysis indicates that BAB8939 also carries a DUP-TRP/INV-DUP, but the inversion was not mediated by LCRA1a and LCRA1b (Additional file 1: Figure S10–3). Duplications are indicated in red, triplications in blue, and LCR blocks (LCRA1a and LCRA1b) in yellow. c Additional CGR patterns at the PLP1 locus were identified on aCGH. DUP-NML-DUP-NML-DUP rearrangement pattern in which duplications are separated by short CNRs (BAB8924, BAB8936, and BAB8959). In BAB8924, based on the sequenced breakpoint junction, this case may have two tandem head to tail duplications on the H2 haplotype that has an inversion within LCRA1a and LCRA1b (Additional file 1: Figure S12–1a) or may have three duplications with one TS between LCRA1a and LCRA1b resulting in an inversion (not shown). We were not able to resolve any breakpoint junctions in BAB8936 (Additional file 1: Figure S12–1b). Breakpoint junction sequencing in BAB8959 showed that the CGR based on aCGH may not have occurred during the same cell division (Additional file 1: Figures S12–2). One individual, BAB8931, exhibited DUP-NML-DEL pattern of rearrangement with a ~ 283-kb duplication (breakpoint junction in LCRA1a) followed by ~ 106 kb of CNR and then a ~ 16-kb deletion (breakpoint junction in LCRA1b). The most complex rearrangement in this study was observed in individual BAB8937 with a DUP-QUAD-TRP rearrangement pattern. In this case, duplication is followed by a quadruplication and then a triplication. The possible mechanism for such rearrangements is shown in Additional file 1: Figure S11. Duplications are indicated in red, CNRs in gray, deletion in green, triplication in blue, quadruplication in orange, and LCR blocks in yellow in the horizontal bar under each array
In this cohort, 28 samples (56% of all individuals) have breakpoints that map to a 186-kb genomic interval distal to PLP1 that contains both direct and inverted LCRs (Additional file 1: Figure S5) [[
We were able to resolve the breakpoint junctions at nucleotide-level resolution in 27 of the 33 individuals with a single duplication based on aCGH (one breakpoint junction per case with one or more join-points). In 26 out of 27, the breakpoint junction indicated that the rearrangement product was in a head-to-tail orientation (Additional file 2: Table S2, Additional file 1: Figures S1-S3). Most were single join-points with microhomology or microhomeology, and a few had insertion of one or more bases. The breakpoint junction in BAB8949 was an 861-bp insertion that originated from two flanking regions of the proximal (centromeric) end of the duplication, likely resulted from three TS, i.e., FoSTeS X3, one of which was AluY/AluY-mediated (Additional file 1: Figure S2) [[
Breakpoint junction analysis of four of the nine individuals with a DUP-NML-DUP pattern on aCGH (Fig. 3a) revealed that they had two directly oriented duplications with a CNR, i.e., a genomic interval with normal copy located between the duplicated segments (Additional file 1: Figure S6). BAB8923, BAB8928, and BAB8965 each had one breakpoint junction formed by a TS between the distal end of one duplicated segment and the proximal end of another, resulting in the CNR between the two duplications (Additional file 1: Figures S6–1, S6–2, and S6–3, respectively). The second TS was between the distal end of the distal duplication and the proximal end of the proximal duplication, resulting in the duplication of both segments in direct orientation. In the fourth individual with a DUP-NML-DUP pattern, BAB8962, TSs between the proximal and distal ends of each duplication created two separate duplications (Additional file 1: Figure S6–4). Junction sequencing in individual BAB8923 revealed that the first TS (Jct1) was mediated by directly oriented Alus with 90% identity (Additional file 1: Figure S6–1). In Jct2, we found a 3-bp insertion that could be the result of a replication error. In individual BAB8928, both junctions had microhomologies (Additional file 1: Figure S6–2). Junction sequencing of BAB8965 revealed a 38-bp insertion at Jct1 and a 182-bp insertion at Jct2 templated from four different discontinuous genomic segments resulting from six iterative TS events as evidenced by distinguishable join-points (Additional file 1: Figure S6–3). The breakpoint junction sequencing of BAB8962 revealed an insertion of 170 bp templated from two genomic regions, one of which is located in the region of the second duplication, suggesting the possibility that both duplications may have occurred during the replication event of one cell division (Additional file 1: Figure S6–4).
In the remaining five individuals with DUP-NML-DUP aCGH patterns, breakpoint junction analysis indicated that an inversion had occurred. Individuals BAB8920 (Additional file 1: Figure S7–1) and BAB8934 (Additional file 1: Figure S7–2) had a DUP-NML-INV/DUP structure. The TS at one breakpoint junction occurred between the distal ends of the two duplicated segments and the TS at the other was between the proximal ends, giving rise to an inverted duplicated segment (Additional file 1: Figure S7). There are three potential rearrangement structures that satisfy the two breakpoint junction sequences found in these individuals (Additional file 1: Figure S8). In addition to the rearrangement structure in which a distal duplicated segment was inverted between two directly oriented copies of the proximal duplicated segments (Additional file 1: Figure S8a), the proximal duplicated segment could be inverted between two directly oriented copies of the distal duplicated segments (Additional file 1: Figure S8b), or both proximal and distal duplicated segments and the CNR between them could be inverted (Additional file 1: Figure S8c). Distinguishing among these rearrangement structures for each individual with DUP-NML-INV/DUP would require additional studies [[
In three of the five individuals whose breakpoint junction indicated inversion, BAB8940, BAB8955, and BAB8960, the distal duplication maps within IRs LCRA1a to LCRA1b (Additional file 1: Figure S9). At least two structural haplotypes at this locus exist in the human population, the H1 allele with ~ 58% frequency and the H2 inverted allele with ~ 42% frequency (resulting from a recombination event between LCRA1a and LCRA1b). If the LCRA1a/LCRA1b region on the arrays of individuals BAB8940, BAB8955, and BAB8960 is inverted to represent the H2 haplotype, the CNVs are seen to be single duplications, so the aCGH pattern of DUP-NML-DUP may be due to displaying the data of an individual with the H2 inversion haplotype on an array designed using the H1 haploid reference genome (Additional file 1: Figure S9) [[
Interestingly, directly oriented Alus mediated the DUP-NML-DUP pattern of rearrangement (Additional file 1: Figure S6–1), while oppositely oriented LINEs or Alus mediated the DUP-NML-INV/DUP rearrangement pattern (Additional file 1: Figure S7). Further, in individuals BAB8920, BAB8923, and BAB8934 with relatively large CNR ranging from 3084 to 7863 kb between duplications, Alu-Alu- or LINE-LINE-mediated rearrangements are involved in facilitating the long-distance TS events, resulting in a chimeric LINE or Alu element at one breakpoint junction (Additional file 1: Figures S6–1 and S7) [[
In this study, we report three individuals with DUP-TRP-DUP on aCGH (Fig. 3b and Additional file 1: Figure S10). We previously reported that individuals with this aCGH pattern at the MECP2 and PLP1 loci had an inversion, and we proposed a mechanism of TS between IRs for formation of the DUP-TRP/INV-DUP structure [[
In the rearrangement of the third individual with a DUP-TRP-DUP structure, BAB8938, the triplication did not border the LCRs and was in a different region from that in the other two patients with the DUP-TRP-DUP structure in this report and in previously published individuals with triplication (Additional file 1: Figure S10–3) [[
The most complex rearrangement in this study was observed in individual BAB8937 who carries a duplication followed by a quadruplication and a triplication (Additional file 1: Figure S11). Previously, breakpoint junction analysis in another individual with this pattern of rearrangement revealed three breakpoint junctions of which two (Jct1 and Jct2) were identical and the third was likely due to a TS between the proximal end of the quadruplicated genomic interval and the distal end of duplication [[
Our high-resolution aCGH platform could detect altered CNRs as small as 2 kb represented by 9 to 11 interrogating probes, allowing us to detect a complex DUP-NML-DUP-NML-DUP pattern in three individuals, BAB8924, BAB8936, and BAB8959 (Fig. 3c and Additional file 1: Figure S12). In individual BAB8924, a ~ 987-kb duplication, a small CNR of ~ 5 kb, and a larger CNR of ~ 72 kb were observed (Fig. 3c). In individual BAB8936, two small CNRs of ~ 3 kb and ~ 6 kb (Fig. 3c), and for individual BAB8959 a small CNR of ~ 2 kb and a relatively large CNR of ~ 30 kb were detected within CGRs (Fig. 3c).
In individual BAB8924, the 72-kb CNR maps within IRs LCRA1a to LCRA1b (Additional file 1: Figure S12–1a), like CNRs in DUP-NML-DUP individuals BAB8940, BAB8955, and BAB8960 (Additional file 1: Figure S9). As in those individuals, the resolved breakpoint junction indicated inversion, and the rearrangement in BAB8924 may have occurred on the H2 haplotype (Additional file 1: Figure S12–1a) [[
Individual BAB8959 had breakpoint junctions for two deletions and a duplication (Additional file 1: Figure S12–2). Jct1, the duplication breakpoint junction, was indicative of a tandem head-to-tail duplication encompassing the duplicated region on aCGH, and the other two, Jct2 and Jct3, were indicative of deletions in one copy of the duplicated region. We checked the database of genomic variants (DGV) to determine whether a CNV polymorphism could explain either of the CNRs. There are three CNVs in the DGV that colocalize with the 30 bp deletion in Jct3 of our patient, one of which, esv2672539, has the same bases deleted as our patient (Additional file 1: Figure S12–2). This deletion was seen in 26 DNAs from 1092 human genomes (population frequency of 2.4%) [[
Interestingly, individual BAB8931 exhibited a DUP-NML-DEL pattern of rearrangement on aCGH that consists of an ~ 283-kb duplication with distal breakpoint mapped to the proximal end of LCRA1a, followed by ~ 106 kb of CNR and then an interstitial ~ 16-kb deletion whose proximal breakpoint maps to the distal end of LCRA1b (Additional file 1: Figure S13). The rearrangement could be a result of two independent TSs in which the first TS leading to a gain at the PLP1 locus is facilitated by NAHR between LCRA1a and LCRA1b that reverses the direction of replication, and the second TS that creates the deletion and resolves the direction of replication (Additional file 1: Figure S13). Alternatively, the presence of such a deletion in the ancestral chromosome that underwent an intrachromosomal duplication event may explain the generation of such apparent copy number complexities (Additional file 1: Figure S13). We were not able to resolve breakpoint junctions in BAB8931, and we were not able to further test the second hypothesis, as neither parental nor grandparental samples were available for molecular studies.
Microhomology refers to short stretches (2–9 bp) of nucleotide identity between the two substrate reference sequences at breakpoint junctions of genomic rearrangements that facilitate TS and represents one mutational signature of replicative repair including FoSTeS/MMBIR [[
Graph: Fig. 4 Representative similarity plots (heat maps) between reference sequences surrounding CNV breakpoint junctions containing a only microhomology (> 2 bp of nucleotide similarity) flanked by solid vertical lines), b both microhomeology and microhomology, and c only microhomeology. We present here an example for each type of the observed junctional sequences using heat map (top) and the sequence alignment at a nucleotide level (bottom). Reference sequences were aligned using the Needleman-Wunsch algorithm, as described in the "Methods" section. The 5′ reference sequence is indicated in blue color and 3′ reference sequence is indicated in green. In the upper panel of heat map plot, the 5′ reference sequence was plotted as a rectangle on the top while the 3′ was on the bottom. The heat map shading indicates the sequence similarity level of a 20-bp moving window: orange-high similarity, blue-low similarity, and white-gap. Schematic figures in b and c indicate microhomeology-mediated priming strand (blue) invasion to the target annealing strand (green). Microhomology is shown in red. d An aggregative plot showing the change of similarity levels between reference sequences along an increase in the distance to the breakpoint junctions. We compared such patterns among four junction categories: blunt junctions (red), junctions containing a microhomology only (green), and the priming sides (blue) and target annealing sides (purple) of junctions containing a microhomeology
Sequence characteristics of join-points in the breakpoint junctions from this study and meta-analysis of aggregate data1
Product of rearrangement join-point Frequency (~%, count/sum) This study Aggregate data1 Join-points with 1 bp match 5.3% (3/57) 6.3% (10/159) Microhomology > 2 bp 26.3% (15/57) 22% (35/159) Microhomeology2 33.3% (19/57) 32.1% (51/159) 5.3% (3/57) 7.5% (12/159) LINE-LINE 1.75% (1/57) 1.9% (3/159) Blunt 3.5% (2/57) 5.7% (9/159) Insertion3 22.8% (13/57) 23.9% (38/159) Others4 1.75% (1/57) 0.6% (1/159)
We also found chimeric LINE-LINE or Alu/Alu potentially resulted from TS in ~ 7% (4/57) of rearrangements including both single duplications and CGRs (Additional file 2: Table S5). The join-points with small insertions (1–8 bp) contributing to breakpoint junction complexity were observed in 11/57 join-points and large insertions with unknown origin in 2/57 (Additional file 2: Table S5). Join-points with one base pair match or blunt end were less frequently observed (5/57) while one join-point was the result of NAHR mediated by a pair of paralogous repeats identified in the self-chain track (1/57) of the UCSC browser (Additional file 2: Table S5).
We next computationally examined the nucleotide similarity between two substrate reference sequences surrounding each breakpoint junction with microhomology (2 bp or more, 100% match) and/or microhomeology. For this study, we obtained 300 bp of reference sequence with the join-point in the middle for each side of each join-point. Since we noticed that some of the join-points with microhomeology also had microhomology (see "Methods"), the join-points were grouped into three categories: microhomology only, both microhomology and microhomeology, and microhomeology only. One example for each characteristic group is shown in Fig. 2; the computational output for all junctions from this study are summarized in Additional file 1: Figure S14. For each event, 300 bases were examined for sequence similarity between the proximal and distal references such that the reference sequence derived from 150-base extensions of the proximal reference on either side of a join-point was used as the base for alignment on the top plots while that from the distal reference was used as the base for alignment on the bottom plots. The heat map shading indicates the sequence similarity level of a 20-bp moving window, in which orange indicates high similarity, blue indicates low similarity, and white represents gaps in the alignment.
The join-points are mostly in a local region of higher similarity (i.e., more orange) in comparison to its surrounding region (more blue and sometimes containing gaps), indicating that the sequence similarity is not limited to the breakpoint junction and suggesting that TS events might frequently occur in association with such microhomeology blocks in the genome (Additional file 1: Figure S14). We found that in the join-points with both microhomeology and microhomology, in most cases the microhomology locates to one end of the microhomeology or to overlapping microhomologies, one on either end of the microhomeology, supporting the donor-acceptor hypothesis, wherein microhomology facilitates W-C base pair complementarity and strand annealing to prime DNA replication during TSs (e.g., BAB8967 in Fig. 4b, Additional file 1: Figure S14) [[
In aggregate, 159 join-points from 124 unrelated patients with PMD are available for breakpoint junction data meta-analysis at this PLP1 locus; 61 individuals, i.e., almost half, had a CGR with more than one CNV and showed evidence that multiple copy number variant states were generated in the same structural-variation event, potentially due to iterative TS [[
We re-analyzed breakpoint junction data from previous studies using additional computational analyses described in the "Methods" section; results (including the current cohort) revealed that microhomology is present in ~ 22% (35/159) of join-points, whereas 19/159 (~ 12%) of join-points have ≤ 1 bp match (including join-points with blunt ends) (Table 1). Microhomeology was observed in 51/159 (~ 32%) of reported join-points (Table 1, Additional file 2: Tables S4 and S6). Heat map similarity analyses between the reference sequences surrounding each breakpoint junction with microhomology (2 bp or more, 100% match) and/or microhomeology (> = 70% similar) from other studies [[
Based on junction sequencing results, ~ 9% of breakpoints coincided with LCRs/SegDups; PMD-LCRs were observed at ~ 7% of breakpoints, including LCRA1a (~ 1%), LCRA1b (~ 0.6%), LCRC (~ 3%), LCRD (~ 1%), LCR2 (~ 1%), and LCR3 (0.3%), while SegDups were observed at ~ 2% of breakpoints (Additional file 2: Table S3C). Additionally, ~ 2% of join-points mapped within a haploid reference genome "self-chain" region signifying an IR (Additional file 2: Table S3-C). Altogether, ~ 11% of sequenced PLP1 breakpoints coincide with paralogous repeats. Nevertheless, this number may be an underestimate considering the high similarity of LCRs, in particular LCRA1a and LCRA1b, and the experimental limitation of obtaining sequence of the breakpoint junctions that coincide with them. Based on aCGH results, 37 breakpoints mapped to, and were likely mediated by, LCRA1a/LCRA1b (Additional file 2: Table S3-D).
Although LINE elements were present at 19% of join-points, LINE-LINE-mediated rearrangements (forming chimeric LINEs) are responsible for only ~ 2% (3/159) of join-points while evidence for Alu-Alu-mediated rearrangement (forming chimeric Alus) was found at ~ 8% (12/159) of join-points; the structure of different Alu family members can be conceptually considered as an ~ 300-bp track of microhomeology [[
PMD is a rare X-linked disorder of the CNS with an estimated incidence of 1.9 per 100,000 male live births in the USA [[
Non-random grouping of the distal breakpoints into the LCR cluster was observed in 28/50 (56%) of individuals (Additional file 1: Figure S5), implicating a role for repeated sequences in genomic instability and generation of nonrecurrent genomic rearrangements, potentially by facilitating TS [[
In our study, LINEs were present in ~ 19% of breakpoints at the PLP1 locus, but only one chimeric LINE was identified (BAB8920). In a recent study, 17,005 directly oriented LINE pairs (> 4 kb length and > 95% similarity) with the distance of less than 10 Mb have been identified, putting ~ 82.8% of the human genome at risk of LINE-LINE-mediated rearrangement [[
Our results provide further evidence supporting the contention that RBMs play the predominant role in the generation of nonrecurrent structural variants. A collapsed DNA replication fork can result in a seDSB that upon further processing exposes a 3′ single-stranded DNA. The exposed single strand can then be utilized to prime synthesis on a template strand using either homology as provided by repetitive elements, e.g., Alu and LINE elements or microhomology at sites lacking long stretches of homology to reestablish a productive and processive replication fork (MMBIR) [[
In our study, breakpoint junction complexities such as genomic insertions ranging from 1 to 959 bp were observed in several breakpoint junctions, including samples with array-based single duplications (Additional file 1: Figures S1-S4). These findings, in addition to the rearrangements being copy number gain events, are consistent with a replicative repair process where the polymerase acts with reduced processivity and hence undergoes one (small insertion) or multiple TS before forming a highly processive migrating replisome; establishment of this processive replisome perhaps signifies a switch to utilization of a different DNA polymerase. Therefore, both small (< 20 bp) and large insertions can result from multiple fork collapses and iterative strand invasions (Additional file 1: Figures S2 and S1–4 for individuals BAB8949 and BAB8950, respectively). Alternatively, small templated insertions can result from replication errors (Additional file 1: Figures S1–2 and S1–6, BAB8933 and BAB8966) and small non-templated insertions can arise potentially from MMEJ or NHEJ (random insertions; Additional file 1: Figures S1–3 to S1–6, BAB8946, BAB8951, BAB8963, and BAB8969).
Among 17 individuals with CGRs identified in this study, nine individuals showed interspersed duplications (Fig. 3a, and Additional file 1: Figures S6, S7 and S9). Three of these rearrangements could be either single duplications that occurred on the H2 haplotype or two duplications with one of two TSs involving reversal of the direction of replication between IRs LCRA1a and LCRA1b. Four rearrangements had directly oriented DUP-NML-DUP structures and two had DUP-NML-INV/DUP structures. We note a relatively large size interval for regions between duplications in individuals BAB8920, BAB8923, and BAB8934. Interestingly, one out of two breakpoint junctions in all three individuals appeared to be either LINE/LINE or Alu/Alu mediated. Highly identical SINE or LINE pairs at breakpoints can be mediating the underlying replicative mechanism by stimulating long-distance TS [[
A rearrangement pattern consistent with DUP-TRP/INV-DUP was found in two individuals and suspected in a third (Fig. 3b and Additional file 1: Figure S10). This pattern of CGR was initially described at the MECP2 locus in which unrelated individuals with complex duplication/triplication alterations indicated shared genomic architectural features [[
A very rare CGR involving a quadruplicated genomic segment distal to PLP1 was observed in individual BAB8937 (DUP-QUAD-TRP) (Fig. 3c and Additional file 1: Figure S11). A CGR with the same pattern, but with a quadruplicated segment proximal to PLP1, has been previously reported [[
In this cohort, we found three individuals with more than two duplications separated by CNRs (BAB8924, BAB8936, and BAB8959, Fig. 3c and Additional file 1: Figure S12). There are two possible explanations for the appearance of such CNVs. These CNRs can be deletion products in hotspot regions of the human genome. Genomic rearrangement with interchromosomal TS during oogenesis can potentially explain the presence of such genomic rearrangements in some cases, although a SNP array performed on BAB8959 did not support this hypothesis (Additional file 1: Figure S12–2). However, we could not exclude the presence of a copy number neutral absence of heterozygosity (AOH) region involving the CNV in BAB8959. Another possibility is the coincidence of three independent genomic rearrangement events including two deletions and one intrachromosomal duplication during gametogenesis or early embryogenesis. For BAB8936, we do not know if the two small CNRs are inherited or related to the formation of the CGR (Additional file 1: Figure S12–1b). However, based on the genomic position of the CNRs in UCSC Genome Browser (GRCh37/hg19), it is unlikely that they are due to rearrangements mediated by repeats or repetitive elements.
We found multiple breakpoint junction sequences showing microhomeology. The aggregate results of breakpoint junctions and surrounding genomic sequence suggest that not only a higher similarity at the junctions, represented by either a microhomology or microhomeology, is facilitative, but also a higher sequence complementarity of the surrounding regions could potentially contribute to the TS during the DNA replicative repair process. To gain insight into the frequencies and distribution of RBM mutational signatures at different rearrangement join-points, we performed a meta-analysis of all published breakpoint sequences from genomic rearrangements with PLP1 gain events in association with PMD. We combined our data with six other studies, all but one of which used the same genomic assay: oligonucleotide array-based CGH (Fig. 5) [[
Graph: Fig. 5 An overview of genomic rearrangements with gain at the PLP1 locus. a Genomic rearrangements in the present cohort with 50 PMD individuals (Table 1). b Meta-analysis of combined results from six previously published studies (Additional file 2: Table S3a). Genomic rearrangements involving triplications are the most frequent CGRs at the PLP1 locus
This study extends our knowledge about the distribution of genomic rearrangements with copy number gains at the PLP1 locus, their underlying molecular mechanisms, and potential mutational signatures accompanying structural variant mutagenesis. Importantly, CGRs occur in ~ 45% of all rearrangements involving this locus. We provide evidence for the role of microhomeology in genomic rearrangements at the PLP1 locus, perhaps facilitating TS, and thus, it may be considered a mutational signature of MMBIR. This strongly supports the role of FoSTeS/MMBIR, as microhomology/microhomeology-mediated TS, as the driving mechanism leading to the generation of nonrecurrent rearrangements at the PLP1 locus.
This work was funded in part by the US National Human Genome Research Institute (NHGRI)/National Heart Lung and Blood Institute (NHLBI) grant UM1HG006542 to the Baylor-Hopkins Center for Mendelian Genomics (BHCMG), National Institute of Neurological Disorders and Stroke (NINDS) Grants R01 NS058529 and R35 NS105078, and National Institute of General Medical Sciences (NIGMS) grant GM106373. The work was also supported by National Institute of Neurological Disorders and Stroke R01 NS058978 and National Institute of General Medical Sciences P30 GM114736. PS was supported by Ministry of Health of the Czech Republic AZV16-30206A and DRO 00064203. We acknowledge the PMD Foundation for their support. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or other granting agencies.
The authors would like to thank the individuals and their families who contributed to this study. We would also like to thank the following people who provided technical assistance: Linda Banser, Danielle Stubbolo, Kristi Clark, Serhat Ozdemir, Victoria Snell, Kaitlin McLean, Jon Bachman, Megan Ross, Tom Alberico, Selena Driscoll, Elisabet Eppes, and Glenn Simon. KJW would like to acknowledge the late Professor Sue Malcolm for her support at the Institute of Child Health, London, UK, and her contribution to the understanding of PMD.
CMBC, GMH, and JRL conceived and designed the experiments; VB, KS, KJW, XS, SG, CMG, CRB, and HH performed the experiments; VB, XS, HH, CMG, CMBC, JRL, GMH, KS, and KJW analyzed the data; XS performed the bioinformatics analysis; JRL, GMH, KJW, and PS contributed reagents/materials/analysis tools; VB, CMBC, JRL, and GMH wrote the paper; GMH, JRL, CMBC, XS, HH, and CMG revised the manuscript. All contributing coauthors read and approved the final draft.
The aCGH data have been deposited in NCBI's Gene Expression Omnibus [[
Ethics approval for work in this paper was obtained from the Institutional Review Board at Nemours/Alfred I. duPont Hospital for Children, the Institutional Review Board for research involving human individuals at Baylor College of Medicine, and Great Ormond Street Hospital for Children NHS Trust and Institute of Child Health Research Ethics Committee. Ethics approval covered molecular experiments on patient tissues to investigate the genetic basis of the patient disease. Patient clinical information is not presented in the paper. The research conformed to the principles of the Helsinki Declaration. Written informed consent was obtained for all the patient samples used in this study.
Not applicable.
J.R.L. has stock ownership in 23andMe, is a paid consultant for Regeneron Pharmaceuticals, and is a co-inventor on multiple US and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis (CMA) and clinical exome sequencing offered in the Baylor Genetics Laboratory (BMGL:
Graph: Additional file 1: Figure S1. (S1-1 to S1-6). aCGH and breakpoint junction sequencing results for 30 of the 33 PMD individuals with single duplications at the PLP1 locus. Figure S2. Breakpoint junction sequencing in subject BAB8949 with a single duplication revealed insertions with multiple join-points at the breakpoint junction. Figure S3. Replication errors at the breakpoint junction and/or flanking regions in BAB8929. Figure S4. The aCGH result for BAB8921 showed a 666 Kb single duplication at the PLP1 locus. Figure S5. The distal breakpoint junction points of genomic rearrangements in 28 PMD subjects are grouped within the LCR distal of PLP1. Figure S6. (S6-1 to S6-4). Breakpoint junction analysis indicates that three patients have a directly oriented DUP-NML-DUP pattern of rearrangement. Figure S7. (S7-1 and S7-2). Breakpoint junction analysis indicates that two patients have a DUP–NML–INV/DUP pattern of rearrangement. Figure S8. Three possible rearrangements for the generation of DUP–NML–INV/DUP structures satisfy the breakpoint junctions that we obtained on patients BAB8920 and BAB8934. Figure S9. Three individuals with a DUP-NML-DUP pattern on aCGH (BAB8940, BAB8955, and BAB8960) have the distal duplication and copy neutral region between the two duplications mapping within IRs LCRA1a to LCRA1b. Figure S10. (S10-1 to S10-3). CGRs with DUP-TRP-DUP pattern of rearrangement on aCGH. Figure S11. The most complex rearrangement in this study, DUP-TRP-QUAD, was observed in individual BAB8937. Figure S12. (S12-1 and S12-2). Samples with DUP-NML-DUP-NML-DUP pattern of rearrangement (based on aCGH). Figure S13. One individual, BAB8931, exhibited DUP-NML-DEL pattern of rearrangement. Figure S14. The sequence similarity comparison of reference sequences surrounding join-points. Figure S15. Similarity comparisons of reference sequences surrounding join-points were done after re-analyzing of break-point junction sequences by a retrospective study.
Graph: Additional file 2: Table S1. Samples with single duplications at the PLP1 locus. Table S2. A summary of genomic rearrangements, coordinates and breakpoint junctions in the cohort of 50 PMD patients. Table S3. Original data from 7 studies on genomic rearrangements at the PLP1 locus. Table S4. Microhomeologous sequences at the join-points found in this study. Table S5. Other features at the join-points found in this study. Table S6. Microhomeologous sequences at the join-points found by re-analyzing breakpoint sequences from previous studies.
• aCGH
- Array comparative genomic hybridization
• BIR
- Break-induced replication
• CGRs
- Complex genomic rearrangements
• CMA
- Chromosomal microarray analysis
• CNR
- Copy neutral region
• DGV
- Database of genomic variants
• FISH
- Fluorescent in situ hybridization
• FoSTeS
- Fork Stalling and Template Switching
• HR
- Homologous recombination
• IR
- Inverted repeat
• LCR
- Low copy repeat
• LINE
- Long interspersed nuclear elements
• MMBIR
- Microhomology-mediated break-induced replication
• MMEJ
- Microhomology-mediated end joining
• NAHR
- Non-allelic homologous, recombination
• NHEJ
- Non-homologous end joining
• PLP1
- Proteolipid protein 1
• PMD
- Pelizaeus Merzbacher disease
• RBMs
- Replication-based mechanisms
• SegDup
- Single-ended, double-stranded DNA break
• SNP
- Single nucleotide polymorphism
• SNV
- Single nucleotide variants.
Supplementary information accompanies this paper at 10.1186/s13073-019-0676-0.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
By Vahid Bahrambeigi; Xiaofei Song; Karen Sperle; Christine R. Beck; Hadia Hijazi; Christopher M. Grochowski; Shen Gu; Pavel Seeman; Karen J. Woodward; Claudia M. B. Carvalho; Grace M. Hobson and James R. Lupski
Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author; Author