Antigenic and genetic characterization of Streptococcus pneumoniae strains isolated from patients with invasive and non-invasive pneumococcal infections by using high-throughput sequencing

The objective of this study was to characterize and compare antigenic and genetic characteristics of Streptococcus pneumoniae strains isolated from patients with invasive and non-invasive pneumococcal infections (PIs) by using the data of high-throughput sequencing . Materials and methods. A total of 158 S. pneumoniae strains were studied. All of them were isolated during different stages of the PEHASus multicenter study performed in 2015–2020. The data analysis was based on the information about whole-genome sequences of 46 strains isolated during the above study. Real-time PCR methods and h igh-throughput sequencing (the Illumina platform) were used for identification of serotypes. The SeroBA, PneumoCaT software and PubMLST.org website resources were used in the data processing. Results and discussion. The serotypes of all the studied strains were identified. A number of discrepancies among serotypes in serogroup 6 and one discordant result were revealed by the analysis of whole-genome sequences using 2 programs. The PCR methods were effectively used to characterize serotypes in 87% and 69% of the pathogens of invasive and non-invasive PIs, respectively. The serotypes contained in PCV13 accounted for 59% and 37%, while PPV23 serotypes accounted for 78% and 53% of the strains isolated from patients with invasive and non-invasive PIs, respectively. The data analysis was unable to identify either the dominant sequence type (a total of 81 sequence types have been identified) or clonal complexes, except for serotype 3 strains, thus demonstrating consistency with the data from previous studies suggesting the absence of a well-represented clonal structure of S. pneumoniae associated with pneumococcal meningitis in Russia. Conclusion. The obtained data made it possible to identify the distribution of the circulating serotypes and genetic characteristics of the strains isolated from PI patients, thus being instrumental for assessment of the effectiveness of the existing polyvalent vaccines and providing information for improvement of the PCR-based methods of serotyping.


Introduction
Streptococcus pneumoniae is one of the most com mon human pathogens that can cause variously locali zed infections. The clinical spectrum of pneumococcal infections (PIs) ranges from invasive to noninvasive infections [1]. In invasive infections, the pathogen is isolated from normally sterile sites, such as blood or ce rebrospinal fluid. In non-invasive infections, including nonbacteremic pneumonia, the pathogen can be isola ted from the lower respiratory tract. Therefore, microbi al cultures from patients with invasive and noninvasive PIs will be hereinafter referred to as "invasive strains" (ISs) and "noninvasive strains" (NISs), respectively.
The microbiological monitoring is a critical com ponent of PI epidemiological surveillance, giving spe cial attention to antigenic and genetic characteristics of PI pathogens, and to the data on antibiotic sensitivity. The analysis of a pathogen's antigenic properties is used to identify serotypes and, consequently, to assess the effectiveness of the existing polyvalent vaccines. In Russia, the commonly used vaccines are the 13valent pneumococcal conjugate vaccine (PCV13, Prevenar 13) and the 23valent pneumococcal polysaccharide vaccine (PPV23, Pneumovax 23). The identification of the PI pathogens' serotypes is essential for planning preventive immunization measures and for assessment of their effectiveness for people involved in the epidem ic process.
While the antibiotic sensitivity is generally as sessed by using standard microbiological methods, following the EUCAST recommendations [2,3], as in most cases the variety of resistance mechanisms [1] prevents using molecular methods, the efficiency of the latter methods used for identification of serogroups and serotypes (the key elements in the antigenic characteri zation of S. pneumoniae) has been demonstrated in mul tiple Russian and foreign studies [4][5][6][7]. The identifica tion of genetic characteristics of S. pneumoniae strains by using multilocus sequence typing (MLST) or other approaches based on the analysis of wholegenome data provides efficient tools for describing the clonal structure of microorganisms involved in the epidemic process, for evaluating the recombination potential of a bacterial population and for analyzing the evolutionary processes leading to emergence of new, potentially vir ulent or antibioticresistant strains [1,[8][9][10].
Antigenic and genetic characteristics can be identified by using molecular and biological meth ods such as realtime PCR (RTPCR) and sequencing. Highthroughput sequencing gives a comprehensive in sight into microorganisms, underlying the analysis of their wholegenome sequences: sequences of cpslocus genes for identification of serotypes and data on prima ry sequences of core genome loci, the analysis of which has the highest discriminative power for identification of genetic relationships among strains and for identifi cation of the clonal structure of a bacterial population.
Despite its multiple advantages for serotype identifica tion, RTPCR has certain limitations, as pathogens go through continuous adaptation, trying to survive under pressure of herd immunity, including the immunity ac quired through preventive vaccination using polyvalent vaccines. At least 100 serotypes of S. pneumoniae have been identified to date, most of them being associated with invasive PIs [1,11], thus proving the importance of studying strains, which are not typeable with stan dard serological or PCRbased methods, and the urgen cy of improving the existing laboratory techniques, for example, by using additional serotype-specific targets.
The resulting information about epidemiological fea tures of circulating pathogens is required for develop ment of vaccination programs and for monitoring of their effectiveness.
Thus, the objective of this study was to character ize S. pneumoniae strains isolated from patients with invasive and noninvasive PIs by using high-through put sequencing, to analyze and compare their antigenic and genetic properties in the microbiological monitor ing context.

Materials and methods
The study was performed on 22 invasive (isolated from blood or cerebrospinal fluid) and 90 non-invasive (isolated from sputum samples of patients with com munityacquired pneumococcal pneumonia) strains of S. pneumoniae. All the strains were isolated during different stages of the PEHASus multicenter study per formed in 2015-2020 [2]. Most of the strains character ized in this study as well as ISs also isolated during the PEHASus studies and described earlier [5] were isola ted in 2019 (n = 58) and 2020 (n = 47).
The transportation and storage, microbiological methods, species-level identification methods, DNA isolation techniques, wholegenome sequencing and genomeassembly procedures were described in previ ous works [2,5]. All the strains were studied addition ally by using RT-PCR methods for identification of 16 serotypes [4] and by using the specially designed tech niques for identification of serotypes 12F, 15BC, 22FA, and 8. The RT-PCR reaction mixtures contained a set of 4 serotype-specific oligonucleotides corresponding to the groups shown in Table 1. The serotype identification based on the wholegenome sequencing data was per formed with SeroBA [6] and PneumoCaT programs [7].
The nucleotide sequences were deposited, the se quencing results, including assignment of alleles and sequence types, were processed and the MLST data were analyzed with BURST and Genome Comparator tools on PubMLST.org 1 [9]. At the end of the study, the database contained information about more than 37 thousand genomes of S. pneumoniae, including 288 wholegenome sequences of Russian isolates, most of which were characterized in the earlier works [5,10]. As we previously studied the sample containing 46 ISs isolated during the PEHASus study, we used the pooled sample of ISs (n = 68), which included previously de scribed strains, to compare the results of antigenic and genetic characterization of ISs and NISs [5].

Results
The wholegenome nucleotide sequences of the studied strains, the data on serotypes and antibiotic sensitivity (for most of the strains) as well as the in formation about sources of strains were deposited in the PubMLST database under the following accession numbers: ISs -73010, 73011, 73013-73015, 73017-73033, NISs -142542, 142543, 142546-142569, 142572-142574, 142578, 142579, 142581, 142583-142604, 142606-142610, 142612-142625, 142627-142643. Allelic profiles and the respective sequence types were identified for all of the strains, some of them being described for the first time.
The analysis of wholegenome data by using two programs [6,7] helped assign the studied strains to se rotypes and serogroups. NISs demonstrated discrepan cies during the identification of serotypes belonging to serogroups 6 (only serotypes B or C, A or B and D or C) in 5 strains, 15 (B or C) in 3 strains, and 35 (A or C) in 1 strain; one discordant result was obtained in iso late with id142633, where the serotype was identified controversially (35A or 42). No discordance between the serotype identification in silico and with RTPCR methods was found.

Antigenic characterization
ISs and NISs had 28 and 33 variants of serotypes, respectively (42 were nonrepeating). The most fre quent serotypes found in ISs (more than in 5%) were serotypes 3 (18%), 19F (9%) and 23F (7%), while in NISs, these were serotypes 3 (11%), 19F (10%), 15C (8%) and 11A (8%), 23F (7%) and 23A (6%). Table 1 shows the data on frequencies of groupassigned sero types for both groups of strains. The serotype groups have the respective serotype-specific targets detect able by RTPCR methods and capsular antigens used for polyvalent vaccines PCV13 and PPSV23. As seen from Table 1, the strains with the serotype detectable with the RT-PCR method applicable to 16 serotypes [4] and including all serotypes of vaccine PCV13 account for 67% for ISs and 53% for NISs. These percentages are almost identical to the percentage of the serotypes (65%) obtained during the studies of pathogens causing pneumococcal meningitis and circulating in Moscow during the reference period (2016-2019) [12] and are lower (79%) than those circulating before (2007-2010) [4]. The decrease in the proportion of serotypes present in vaccine PCV13 can be explained both by the chang ing range of the antigenic diversity of S. pneumoniae due to vaccination and by the territorial diversity of the patho gens included in this study and the PEHASus study [2]. The additionally detected serotype-specific targets 12F, 15BC, 22FA, and 8 increased the proportion of identified serotypes to 87 and 69%, demonstrating significant dif ference between the samples. The RTPCR method de signed in 2014 and its algorithm [4] were not well suited for ISs and NISs in the studied sample collected in several Russian regions and circulating primarily in 2019-2020. The best option used for RT-PCR-based identification of serotypes should include additional serotype-specific targets 12F, 15BC, 22FA, and 8, while the detection of serotypes 2, 5, 7AF, and 19A can be omitted.
The proportion of serotypes present in vaccines PCV13 and PPSV23 varies in the studied strain sam ples, while the proportion of serotypes is significantly higher in the IS sample for both vaccines.

Genetic characterization
During this and earlier studies [5], a total of 81 sequence types were identified, the most frequent be  Table 2.
Although the detected sequence types differ in their composition and number, they did not show any difference in their Simpson's diversity index, which was properly calculated [13] and reached 98.3% and 98.5% for ISs and NISs, respectively. On the one hand, the high values of the diversity index demonstrate the absence of a wellrepresented clonal structure of the studied pathogens; on the other hand, the BURST-based MLST analysis identifies two groups of genetically related strains with core sequence types ST311 and ST505 (if strains with minimum 2 discrepancies in the allelic profile are grouped together). Group ccST-311 includes  Table 2). Group ccST-505 corresponds to clonal group CC180 identified in the study [3] and representing one of Russia's most common clonal complexes of S. pneumoniae, which were isolated in 1980-2017.
The comparison of the detected sequence types with the sequence types of 81 Russian strains of S. pneumoniae (39 sequence types), which were isolated from meningitis patients in 2011-2015 and characterized in the subsequent study [10] revealed 12 matches, while no differences in the number of matches were found in IS and NIS samples. The core genome (cgML ST)based analysis, which includes the calculation of a genetic distance for 1367 loci, makes it possible to specify individual groups of strains of certain sero types (for the most common serotypes: 3, 19F, 11A, 15BC, and 9V), while some groups contain strains be longing to several serogroups (for example, 10A and 6C). The comparison of the findings of two studies and the genetic characterization confirm the previous assumptions regarding the absence of a wellrepresen ted clonal structure of S. pneumoniae associated with invasive PIs in Russia [5].
The wholegenome sequencing combined with the molecular monitoring of PI pathogens provides timely acquisition of reliable data on changes in the structure of circulating S. pneumoniae serotypes, improving the efficiency of planning and the assessment of the effec tiveness of preventive immunization measures. The genetic characterization based on wholegenome data is a powerful tool for the intraspecific classification of pathogens, which is required for the extended microbi ological monitoring being a constituent part of the PI epidemiological control. The accumulation and analy sis of wholegenome data will enhance the insight into basic genetic variations associated with the ability of certain representatives of S. pneumoniae to cause inva sive PIs.