Tetranucleotide Profile of Herpesvirus DNA Тетрануклеотидный профиль герпесвирусных ДНК

Introduction . Herpesvirus DNAs (about 90% of the total genomic sequences of the Herpesvirales family presented in GenBank) contain at a minimum concentration one of the two tetranucleotides, CTAG or TCGA. The “underrepresentation” of CTAG was previously observed only in the DNA of some bacteria and phages. The aim of the study was the further analysis of the formal characteristics of herpesvirus DNA, as well as their comparison with the density of the virus/host DNA microhomology and with the genomic macrostructure of herpes viruses. Materials and methods. Twenty strains and isolates of each of the five types of human herpes viruses (HHV1, HHV2, HHV3, HHV4, HHV5), 10 strains of HHV8, 5 strains of HHV6A, 4 strains of HHV6B and 3 strains of HHV7 were analyzed. GenBank tools were used to determine the frequency of tetranucleotides, and human DNA fragments with size matched herpesvirus DNA were used for comparison. Results. Minimum CTAG concentration in DNA of herpes viruses is mainly characteristic of two- and single- segment genomes with direct or inverted terminal repeats (classes A,D,E), while the minimum TCGA density is characteristic mainly for DNA that is significantly less structured (classes B,C,F). By increasing CTAG density, human herpes viruses form a sequence close to the sequence of increasing the homology density of 20 nt with human DNA, which also correlates with the macrostructure of DNA. A parallel of this minimization with the DNA structure of herpes viruses or with their belonging to one or another subfamily — as well as the context of the “minimal” CpG (that is, TCGA) — is not noted in the literature. Although herpesvirus DNA is quite large (125– 295 Kb), some of them (for example, HHV4, HHV5 and HHV7 DNA) show noticeable deviations from the second DNA parity rule, and can thus serve as a component of the molecular signature. The Discussion suggests possible hypotheses for the origin of some of the observed phenomena. Parity


Introduction
Herpesviruses of the family Herpesviridae, including human HV, HHV, are divided into three subfamilies: alpha-HV, beta-HV and gamma-HV [1]. Another classification of HV is the classification by the macrostructure of DNA (Fig. 1). It coincides not strictly with the division into subfamilies and, in accordance with generally accepted views, forms 6 classes from A to F [2].
Alpha-HHV (HHV1, HHV2 and HHV3), as well as beta one (HHV5) contain two-segment DNA; each segment is bounded by mutually inverted monomeric terminal repeats, TR 1 (DNA classes D and E). Class A (HHV6A, HHV6B and HHV7) is an unsegmented, unique linear sequence limited with direct monomeric terminal repeats containing two "islands" of telomere-like hexanucleotides each. Gamma-HHV contain DNA classes B and C, which have a unique sequence limited to direct tandemly (non-monomeric) organized short repeats, which number is not fixed (up to 45 in case of HHV8), TR 2 .
Class F DNAs are less structured or not structured at all. There are herpesvirus DNAs with a more exo-  tic macrostructure (e.g., scutaviruses), but there are not many. The data obtained by us in the proposed work allowed us to combine classes A, D, E into one group (DNA segments bounded by monomeric terminal repeats), and classes B, C, F into another (single DNA segment bounded unfixed number of tandemly organized short terminal repeats). Earlier, we noticed that the DNA molecules of herpesvirus and its host contain short (20-29 nt) mutually identical sequences, microhomologies, the concentration of which is not random and, we believe, is explained by long (on an evolutionary scale) close intergenomic relations between partners [3]. Later, we found that such microhomologies have characteristic distribution features in the herpesvirus genome, concentrating mainly in its terminal (direct or inverted) repeats, especially in those regions of TR in which there are no genes [4,5]. But the most interesting thing is that HHV species form the sequence of the virus/ host genomic microhomology by increasing the density, which is consistent with the DNA macrostructure: lower density in two-segment DNA, higher in singleor non-segmented. We hypothesized that segments of As an approach to analysis we used the comparison of nucleotide frequencies in DNA molecules and also the second Chargaff rule of parity, CPR2 [6], which becomes evident in DNA of more than 100,000 nt [7,8]. CPR2 is formulated in the same way as the first one (CPR1), but refers to only one DNA strand. It has an approximate accuracy, which increases as the analyzed chain lengthens. It applies not only to monobut also to oligonucleotides up to 10-15 nt -with a decrease in accuracy as the analyzed oligonucleotide lengthens [7,9]. In metagenomics, tetranucleotide analysis is often used to form molecular signatures [10]. The frequency of tetranucleotides in the genomes of the herpes virus quite reliably corresponds to CPR2 and provides a more detailed characterization of DNA than mono-, di-and trinucleotides [7,11]. In principle, the symmetries of the tetranucleotides of the herpes virus genome have been described previously [12], but they only confirmed the correspondence of CPR2. Our approach discovers other unusual properties of these genomes.
The aim of the study is to continue the analysis of the formal characteristics of herpesvirus DNA, as well as their comparison with the density of the virus/host DNA microhomology and with the genomic macrostructure of herpes viruses.

Materials and Methods
We analyzed ~ 90% of the nucleotide sequences of full-sized viral DNA molecules of each genus of all three families of vertebrate and invertebrate herpes viruses contained in GenBank. After analyzing 20 strains and isolates of each of the five types of human herpes viruses (HHV1, HHV2, HHV3, HHV4, HHV5), 10 strains of HHV8, all 5 strains of HHV6A, all 4 strains of HHV6B and all 3 strains of HHV7, we were convinced of the practical identity of intraspecific results (especially expressed in percent) and therefore, we present in the tables data only on the DNA of the reference strains of each type of herpes viruses.
For comparison, we used human DNA with a length of 1.5 megatons (5 fragments of 300,000 nt each): •

Results
We analyzed the tetranucleotide composition of the fully sequenced DNA of almost all herpes viruses of the Herpesvirales Order contained in the GenBank. The DNA type, that is GC (the predominance of G+C) or AT (the predominance of A+T) in one of its chains, does not give too much in this regard, dividing all HH-Vs into two groups according to the types: • type АТ -HHV3 (alfa) and HHV6А, 6В, 7 (beta); • type GC -HHV1,2 (alfa), 5 (beta), and 4,8 (gamma). However, dinucleotide analysis illustrates well CPR2 [13], according to which A≈T, C≈G, С+Т≈A+G and С+А≈T+G for one DNA strand. This is determined by the size of herpesvirus DNA -from 125 to 295 Kb.
The total number of tetranucleotides is 256 (4 4 ). To avoid the influence of a type of DNA on the results, which was shown previously [13,14], we analyzed only tetranucleotides containing all four different bases, 4TNs. In HHV1 DNA (type GC), the smallest ("underrepresented") is precisely such a tetranucleotide -CTAG (91 nt for the whole genome instead of ~600 nt in case of equal representation of all tetramers in the genome). In HHV6A DNA (type AT), the CTAG number is also close to the smallest (303 nt) among all tetramers and is the smallest of the tetranucleotides containing all four different bases. Only four tetramers, GGGC (245), ACCG (287), GGCC (288) and GGCT (296), are smaller -according to the type of DNA.
Of the 256 tetranucleotides, only 24 consist of all four nucleotides (P 4 =4!=24). These 24, in turn, are divided into two groups: 8 of them (octet A) do not change during inversion, for example, CTAG|CTAG, the rest 16 (two octets B) are pairs B1 and B2 of mutually inverted non-identical tetranucleotides, for example, CTAG|TCAG. The tables and figures of octets A and B are shown separately. For correct comparison, the data are presented as a percentage of the sum of the frequencies of the tetranucleotides of each octets A and B. Table 1 compares the data for all 24 discussed tetranucleotides for all known types of control strains of human herpes virus. Table 1 shows that CTAG is "underrepresented" in the genomes of all HHVs, with the exception of HHV7. In HHV4 DNA, the TCGA tetramer is even more underrepresented (as in the human genome).
In accordance with the decrease in the "underrepresentation" of CTAG, the HHV genomes form a sequence that resembles the sequence of DNA microhomology virus/host by increasing their number (Fig. 2): the greatest "underrepresentation" of CTAG is characteristic of two-segment DNA, the smallest -for single-segment.
At the same time, the DNA of each HHV contains noticeably "overrepresented" tetranucleotides, which are also characteristic of the genomes of a certain mac-rostructure: ACGT for two-segment DNA (classes D, E), TGCA for one-segment (class A, roseoloviruses) and CATG for one-segment (classes B, C). However, since the DNA of non-human herpes viruses is very poorly represented by host species in the GenBank, we did not further analyze the "maximum" tetranucleotides.
The columns of numbers related to the DNA of each virus are tetranucleotide DNA profiles, and they -in the case of HHV4, 8 and 7 -show a certain similarity with the profile of human DNA. In some cases (HHV4, 5, 7), octet B tetranucleotides presented in pairs (B1 and B2) demonstrate characte ristic devia-tions from CPR2, which probably are associated with an insufficient DNA length of these viruses (HHV4, 5) or with an insufficient number of strains in the Gen-eBank, which do not provide sufficient reliability of the relevant data. The positive side of deviations from CPR2 is that they can be used as components of the molecular signatures of these viruses.
It is noteworthy that the difference between the maximum and minimum values in octet A is much larger, that is, more obvious than in octets B. In cases where the minimum density indices of tetranucleotides of octets B are less than octet A, their "underrepresentation" is directly related to type of DNA, that is, they have the format [TA/AT|GC/CG]; the left and right tetramer pairs can be swapped, and the "/" means "or". There are 8 such tetranucleotides, but for use -together with other tetramers -as molecular signatures, their involvement in the formation of a DNA type does not matter. Table 2 summarizes the data on the minimum (underrepresented) tetranucleotides of HHV DNA. Next, we carried out a tetranucleotide analysis of completely sequenced DNAs of almost all other viruses of the Herpesvirales Superfamily of the GenBank NCBI. The data obtained are summarized in Table 3. Table 3 shows that all herpesviruses are divided into two groups according two main underrepresented tetranucleotides -CTAG or TGCA. The difference between the two groups is parallel to their genomic macrostructure. Minimum CTAG (CTAGmin) is char-acteristic of structured DNA classes ADE, with large monomeric terminal repeats, TR 1 , TCGAmin is characteristic of less strictly structured DNA classes BCF with non-fixed tandem terminal repeats, TR 2 .

Discussion
The "underrepresentation" of CTAG tetranucleotide (CTAGmin) in the genomes of Escherichia and Salmonella, as well as some phages, has been known for quite some time [15] and continues to be studied [16]. For the first time, we systematically note here this feature for one of two large groups of herpesviruses and its parallel with their genomic structure. The larger group (ADE) is characterized by the presence of one or two segments bounded by TR 1 monomeric terminal repeats, direct or mutually inverted. A smaller group of herpesviruses (BC[F]) contains a single-segment genome, limited by an undetermined number of tandem organized direct terminal repeats, TR 2 , and is characterized by the "underrepresentation" of another tetranucleotide, TCGA. The more general nature of the CTAG|ADE and TCGA|BC[F] parallels among animal viruses can be evidenced by the "underrepresentation" of CTAG beyond the Herpesvirales, in particular in the DNA of African swine fever viruses (Asfarviridae) and Shope rabbit fibromas (Poxviridae), structured similarly to the herpes viruses ADE. At the same time, the genomes of the smallpox and smallpox vaccines (Poxviridae) viruses are not structured in this way, and there is no "underrepresentation" of CTAG in them. These observations require serious expansion of the studies under discussion in other taxonomic groups of animal viruses.
An analysis of a series of strains (mainly up to 20) of the same type of HHV showed almost complete identity of the results, which, to a first approximation, allowed us to consider the results obtained quite reliable.   posite and is 303:391 with close DNA sizes of both viruses. If the LeTang et al. [16] observation is also applicable to herpesviruses, then HHV7 is obviously closer to the evolutionary predecessor of both roseoloviruses than HHV6, in which many CTAG was replaced by CTGG. At the same time, HHV6 acquired the ability to integrate its genome into the host genome, which is not, as a rule, a prerequisite for closer relations with the host DNA, as evidenced by the similarity of the profile of the HHV7 tetranucleotide (not HHV6) and human DNA (Fig. 3), as well as a higher level of virus/ host DNA microhomology in HHV7 than in HHV6, or a lower level of such microhomology in mardiviruses with pronounced telomeric islands in terminal repeats of DNA segments.
In Figure 3 some features of the analyzed 4TN profiles are additionally indicated. In accordance with the Second DNA Parity Rule, the similarity between B1 and B2 in human DNA is much greater (than in virus ones), since the fragments of human DNA we have analyzed are 300 Kb long, and the HPV genomes are much shorter. In HHV7, the differences between B1 and B2 The thermodynamic model of RNA shows that the tetramer CTAG(CUAG) violates the optimal structure of the stem loops of the molecule, which control the expression of genes, increasing their free energy. The authors of this hypothesis [15] suggested that the common ancestor of Salmonella and Escherichia had a significantly higher CTAG density, but evolutionary degeneration led to the replacement of CTAG in its descendants with a tetranucleotides neutral in this respect, and this trend is currently maintained. In a number of genes and in intergenic spaces in Escherichia and Salmonella, the indicated degeneracy led to the evolutionary replacement of CTAG, primarily with CTGG (to a lesser extent with ATAG, CTTT, CTTG).
In this regard, it is most appropriate to compare phylogenetically related (p.e. the same genus) human roseoloviruses HHV6 and HHV7. In the DNA of both viruses -compared with other herpes viruses -the frequencies of CTAG and CTGG are most different. The comparison shows that if in HHV7 (NC_001716) the frequency ratio CTAG/CTGG is 530:301, respectively, then in HHV6A (NC_001664) it is even the op- below). The fact that GenBank represents the complete (almost complete) DNA sequences of only three HHV7 strains allows the use of statistical methods to validate the data presented here with great reservations. For this reason, we did not use these methods here, noting only that today it looks like a fact. Figure 4 compares the 4TN DNA profile of another pair of viruses, HHV1 and HHV4. In the case of HHV1, a low content of CTAG allows the virus to cause an acute productive infection and accumulate in the cells of the entrance gate (fibroblasts), and then go into neurons, where it will remain for life -in particular, due to the inhibitory effect of host epigenetic mechanisms, one of which is methylation viral DNA. The concentration of CpG dinucleotides in the genome of HHV1 exceeds the average value, Table 1.
Low levels of CTAG can play a role in exacerbating latent infections. In the case of HHV4, the primary lytic infection is not characterized by a high level of viral syntheses, and after the transition to the chronic phase it is also regulated by epigenetic tools, including methylation C in CpG [17,19]. At the same time, the obvious proximity of the 4TN profiles of the genome of HHV4 and the host indicate a similar response to these tools. The same can be said about HHV8 and the epigenetic regulation of its genes [20,21]. Of the many epigenetic mechanisms that modify the expression of viral and host genes, we consider here only DNA methylation, more precisely, cytosine methylation in CpG, since this dimer is part of 4TN (TCGA), which allows it to be compared with another tetramer, CTAG, in proposed here aspect.
The hypothesis of a low density of CTAG tetramer due to its evolutionary degeneration does not explain the obvious limitations of its use and does not at all concern the reasons for the low density of another tetramer of octet A, TCGA, in the DNA of the members of the same superfamily. About 40% of CpG, the central pair of this tetramer, is located in the promoter zones of mammals [21,22] and has a much lower density in complete sequences of vertebrate genomes than Octet A: human DNA is highlighted in gray, viral DNA -in black. Octet B: human B1 is highlighted in light gray, human B2 is highlighted in dark gray, virus B1 is highlighted in bold black, B2 -in thin black. might be expected [23,24]. This underrepresentation is a consequence of the high frequency of mutations of methylated CpG sites in the genomes of hosts and their viruses, especially of those that interact closely with host DNA.
The reasons for the lowered CpG content were repeatedly discussed before [25], however, the issue is not the low density of CpG, but rather the context of this pair, i.e. in the TCGA, since this tetramer is present in herpesvirus DNA in a much lower concentration than ACGT.
Data from Le Tang et al. [16] show that the minimum content of CTAG (and TCGA) alone is not limited to herpesvirus DNA. We analyzed the tetranucleotide profile of large DNAs together with terminal repeats of some other viruses. CTAG was found to be "minimal" in African swine fever viruses (Asfarviridae family) and Shope fibroma virus (Poxviridae family), but not in smallpox and vaccinia viruses (also in the Poxviridae family), whose DNA does not have terminal repeats. This means that when constructing phylogenetic trees, it is necessary to take into account not only changes in genes and proteins, but also the evolution of the DNA molecule, including its characteristics discussed here.
In a first approximation, to analyze the density of potentially methylated cytosine in the herpesvirus genomes, it suffices to estimate the CpG:GpC ratio, which is not related to the genome type (AT or GC). This estimate is shown in Table 2: HHV DNA with CpG>GpC (darkened cells). In this case, the results presented here would concern only the concentration and the ratio CTAG/CpG in herpesvirus DNA, which may affect the level of viral synthesis. In its most general (non-strict) form, this ratio has a mirror character: the lowest concentration of CTAG is accompanied by the highest concentration of CpG (Table 1) -at least within the framework of the groups of classes ADE, BC[F]. Nevertheless, the ratio CTAG/CpG depletes our results, which indicate a difference in the tetranucleotide profile of herpesvirus DNA, specifying this ratio to CTAG/TCGA. In other words, the component of this ratio is the ratio of TCGA/ACGT, clearly expressed in the framework of the classes DE/A/BC[F], Table 1. In turn, this means the need to take into account the context that determines the functional value of the CpG dimer. Perhaps this context goes beyond the tetramer. But for reliable conclusions, it is necessary to expand the research beyond the scope of herpesviruses and seriously enrich GenBank with new complete viral DNA sequences. But in any case, the results demonstrated by us here indicate that the biological meaning of the macrostructure of herpesvirus DNA is much deeper than is commonly believed. Л И Т Е РА Т У РА / R E F E R E N C E S