Although the common goal is to identify potential pathogens, the studies can roughly be divided into three categories: 1 investigations of outbreaks of unknown etiology, 2 investigations of well-known disorders presumed to be of multifactorial etiology, and 3 metagenomic studies of reservoir species and vectors. Examples of the first category include the identification of a novel Orthobunyavirus affecting cattle described in more detail below , an astrovirus in the brain of farmed minks suffering from encephalomyelitis [ ], and a novel picornavirus as candidate etiologic agent for turkey viral hepatitis [ ], among others.

The second category encompasses investigations aimed at finding contributing infectious agents to complex diseases, such as colony collapse disorder of honey bees [ , ] and postweaning multisystemic wasting syndrome in pigs [ ]. Studies in the third category have been performed on diverse animal species suspected to be important reservoirs, such as bats [ , ], African bush pigs [ ], and red fox [ ], as well as typical vector organisms, such as ticks [ ].

Although it is an important first step, the identification and genetic characterization of candidate pathogens are not enough to establish causal relationships or understand how they may be associated with disease. It is therefore necessary to use a synergistic approach combining molecular diagnostic tools, such as NGS-based metagenomics and follow-up PCR-based assays targeting detected pathogen sequences, with more conventional diagnostic methods, including isolation and characterization.

This is crucially important in situations where metagenomic data indicate the potential presence of multiple pathogens. The assembled data from such a multidisciplinary pathology, epidemiology, metagenomic data, PCR prevalence studies, isolation, characterization, etc.

The synergetic and parallel use of molecular and classical methods not only results in detection of infectious agents and development of targeted diagnostic tests but also has the potential to make isolates or strains available shortly after the occurrence of outbreaks. The availability of isolates or strains is of special importance to allow the design of effective vaccines or antimicrobial drugs.

Metagenomics, using technology, allowed the identification of a novel virus, subsequently named Schmallenberg virus SBV , in an epidemiological cluster of diseased cattle in Germany [ ]. These viral sequences were used to rapidly design targeted molecular tests that were used to confirm a clear association between the presence of the virus and affected animals [ ].

The molecular tests were also helpful in targeting samples for isolation of the virus, which ultimately led to the development of a prototype vaccine currently under evaluation [ ]. Metagenomic NGS workflows also have the potential use for quality control of biological products [ ] and vaccines [ — ] and provide a powerful approach for the identification and characterization of unexpected of highly divergent pathogen variants [ , 85 ] that may remain undetected using targeted diagnostic tests.

Nordentoft and colleagues [ ] used NGS metagenomics to study the influence of livestock management parameters and infection with Salmonella enteritidis on the microbial community in the chicken intestinal tract. Another study [ ] documented the effect of Campylobacter jejuni infection on the chicken fecal microbiome.

The application of metagenomic techniques in poultry production could lead to the development of novel alternatives to antibiotic growth promoters and better understanding of the colonization of food production animals by foodborne pathogens such as Salmonella enterica and Campylobacter spp. Other studies investigated the host response to pathogen infection. Glass and colleagues [ ] used NGS transcriptomics to document bovine resistance and tolerance traits to parasitic infection.

Their characterization contributes to better understanding the complex biology of pathogens. Wang and coworkers [ ] characterized microRNA sequences from Orientobilharzia turkestanicum , a fluke with zoonotic potential infecting sheep, and identified key target miRNAs for parasite energy metabolism, transcription initiation factors, signal transduction, and growth factor receptors. Virus-encoded microRNAs vmiRNA regulating viral or cellular transcripts can be targeted for virus discovery [ , ].

NGS has been applied to investigate whether infection can modulate miRNA biogenesis and has also been used to identify miRNAs that influence pathogen replication, tropism, and pathogenic potential [ — ]. These molecules have demonstrated immense potential as a source of antiviral therapeutics effective against a number of viruses adenovirus, rabies, Venezuelan equine encephalitis, porcine reproductive and syndrome virus [ — ] or for the design of live-attenuated virus vaccine based on miRNA-mediated gene silencing [ , , ].

Next-generation sequencing technologies have the potential to revolutionize our understanding of the complex dimensions of animal infectious disease and infection biology Fig. The application of high-throughput biotechnology platforms in these fields and their typical low-cost per information content has increased the resolution with which these processes can now be studied.

We now have high-resolution tools that provide veterinary diagnostic laboratories with the ability to undertake swift and flexible responses to emerging infectious diseases and unexpected pathogen variants. Moreover, these tools provide an increased resolution for the characterization of pathogens and provide important assets to improve our understanding. Fundamental research on pathogen evolution, adaptation, and virulence determinants can now be studied on a scale allowing within and between host dissections of genetic variability.

Moreover, high-throughput tools open new perspectives to study the complex interaction between pathogen, host, and microbiome with very high resolution and to deepen our understanding of the key biological processes leading to protective immunity. Not only will our increased understanding of pathogens and their interaction with livestock impact on future disease prevention, control, and management strategies, but the technologies may themselves become part of the intervention strategies, providing high-resolution data for molecular epidemiology to rapidly trace the origin and spread of outbreaks, for molecular typing, for predicting, and for optimizing the outcome of targeted treatment with antibiotics, antivirals, and anthelmintic.

The ready availability of high-resolution genomic and transcriptomic data will impact upon the targeted development of novel vaccines and drugs [ , ], while NGS has the potential to become a powerful tool for the control of vaccines and other biological products. As with any new technology, challenges remain. In the case of NGS, these include the requirement for expertise in both the laboratory and in the analysis of huge datasets and the current need for high investment in laboratory and data analysis hardware.

As the technology is ever evolving towards lower cost, user-friendliness, and accessibility for smaller research and diagnostic labs, efforts are needed to make the data analysis more accessible to nonexpert users. This includes proper modeling of the sources of error introduction, solutions for public data storage, development of user-friendly but high standard analysis pipelines for routine applications, etc.

Both the industry and the NGS user community can play a role in this evolution. Similarly, recent improvements in protein and peptide separation efficiencies and highly accurate mass spectrometry have promoted the identification and quantification of proteins in a given sample [ ].

Directly targeting peptide and protein content in a sample, proteomic approaches provide important additional information taking known issues, such as the quantitative discrepancy between mRNA transcript levels and final protein levels and posttranslational modification, into account [ ]. Novel proteomic approaches have been applied to animal infectious disease research, including the study of E.

This section contains excellent contributions exploring the application of high-throughput technologies to animal infectious diseases, including functional genomics of tick vectors infected with eukaryotic parasites, metagenomic approaches to detect bee viral pathogens, proteomics of vector-host-pathogen interactions, and NGS applications exploring parasites and intervention strategies.

Methods Mol Biol —6. Mol Ecol Resour — J Gen Virol — J Appl Genet — Nat Biotechnol — Hum Mol Genet 19 R2 :R— Eisenstein M Oxford Nanopore announcement sets sequencing sector abuzz. PLoS One 8:e Appl Environ Microbiol — Rosseel T, Van Borm S, Vandenbussche F et al The origin of biased sequence depth in sequence-independent nucleic Acid amplification and optimization for efficient massive parallel sequencing. BMC Res Notes Lysholm F, Wetterbom A, Lindau C et al Characterization of the viral microbiome in patients with severe lower respiratory tract infections, using metagenomic sequencing.

PLoS One 7:e Naccache SN, Greninger AL, Lee D et al The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J Virol — Bioinformatics — Metzker ML Sequencing technologies—the next generation. Nat Rev Genet — BMC Genomics Nucleic Acids Res — PLoS Comput Biol 6:e CAS Google Scholar.

BMC Bioinformatics Macalalad AR, Zody MC, Charlebois P et al Highly sensitive and specific detection of rare variants in mixed viral populations from massively parallel sequence data. PLoS Comput Biol 8:e Brief Bioinform — Flicek P, Birney E Sense from sequence reads: methods for alignment and assembly. Nat Methods 6 11 Suppl :S6—S Horner DS, Pavesi G, Castrignano T et al Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing.

Finotello F, Lavezzo E, Fontana P et al Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. PLoS One 6:e Nucleic Acids Res 39 Database issue :D19— J Anim Sci Biotechnol PLoS One 4:e Nature — Poult Sci — Biotechnol Adv — Webb KM, Rosenthal BM Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

Infect Genet Evol — Matsubayashi M, Hatta T, Miyoshi T et al High-throughput RNA sequencing profiles and transcriptional evidence of aerobic respiratory enzymes in sporulating oocysts and sporozoites of Eimeria tenella. BMC Genet Cwiklinski K, Merga JY, Lake SL et al Transcriptome analysis of a parasitic clade V nematode: comparative analysis of potential molecular anthelmintic targets in Cylicostephanus goldi.

Int J Parasitol — PLoS Pathog 6:e Proc Biol Sci Wright CF, Morelli MJ, Thebaud G et al Beyond the consensus: dissecting within-host viral population diversity of foot-and-mouth disease virus by using next-generation genome sequencing. Vet Res Google Scholar. Nat Commun PubMed Google Scholar. AIDS Rev — J Infect Dis — Clin Microbiol Infect — Antiviral Res — Transbound Emerg Dis.

Genome Biol Evol — PLoS Pathog 8:e Antimicrob Agents Chemother — Dupuy V, Manso-Silvan L, Barbe V et al Evolutionary history of contagious bovine pleuropneumonia using next generation sequencing of Mycoplasma mycoides subsp. Das S, Roychowdhury T, Kumar P et al Genetic heterogeneity revealed by sequence analysis of Mycobacterium tuberculosis isolates from extra-pulmonary tuberculosis patients.

BMJ Open 2 doi: N Engl J Med — Croville G, Soubies SM, Barbieri J et al Field monitoring of avian influenza viruses: whole-genome sequencing and tracking of neuraminidase evolution using pyrosequencing. J Clin Microbiol — Virology — Ramakrishnan MA, Tu ZJ, Singh S et al The feasibility of using high resolution genome sequencing of influenza A viruses to detect mixed infections and quasispecies. Arch Virol — Leifer I, Ruggli N, Blome S Approaches to define the viral genetic basis of classical swine fever virus virulence.

J Virol Methods — Genome Announc 1 doi Rosseel T, Lambrecht B, Vandenbussche F, van den Berg T, Van Borm S Identification and complete genome sequencing of paramyxoviruses in mallard ducks Anas platyrhynchos using random access amplification and next generation sequencing technologies. Virol J Hutter J, Rodig JV, Hoper D et al Toward animal cell culture-based influenza vaccine design: viral hemagglutinin N-glycosylation markedly impacts immunogenicity.

J Immunol — Bourret V, Croville G, Mariette J et al Whole-genome, deep pyrosequencing analysis of a duck influenza A virus evolution in swine cells. Topfer A, Hoper D, Blome S et al Sequencing approach to analyze the role of quasispecies for classical swine fever. Fouts DE, Szpakowski S, Purushe J et al Next generation sequencing to define prokaryotic and fungal diversity in the bovine rumen. Microb Inform Exp Clin Chem — Ferrer M, Ghazi A, Beloqui AV et al Functional metagenomics unveils a multifunctional glycosyl hydrolase from the family 43 catalysing the breakdown of plant polymers in the calf rumen.

Gene — Int J Mol Sci — Vet Microbiol — Bexfield N, Kellam P Metagenomics and the molecular identification of novel viruses. Vet J — Blomstrom AL Viral metagenomics as an emerging and powerful tool in veterinary medicine. Vet Q — Emerg Infect Dis — Science — Blomstrom AL, Belak S, Fossum C et al Detection of a novel porcine boca-like virus in the background of porcine circovirus type 2 induced postweaning multisystemic wasting syndrome.

Virus Res — Ge X, Li Y, Yang X et al Metagenomic analysis of viruses from bat fecal samples reveals many novel viruses in insectivorous bats in China. Li L, Victoria JG, Wang C et al Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses. Clin Microbiol Rev — Tijdschr Diergeneeskd — Transbound Emerg Dis — Vaccine — Viral metagenomics identifies genetic variation and contaminating circoviruses in laboratory isolates of pigeon paramyxovirus type 1.

Farsang A, Kulcsar G Extraneous agent detection in vaccines-a review of technical aspects. Biologicals — Neverov A, Chumakov K Massively parallel sequencing for monitoring genetic consistency and quality control of live viral vaccines. Onions D, Kolman J Massively parallel sequencing, a new method for detecting adventitious agents.

BMC Microbiol Qu A, Brulc JM, Wilson MK et al Comparative metagenomics reveals host specific metavirulomes and horizontal gene transfer elements in the chicken cecum microbiome. PLoS One 3:e Glass EJ, Crutchley S, Jensen K Living with the enemy or uninvited guests: functional genomics approaches to investigating host resistance or tolerance traits to a protozoan parasite, Theileria annulata , in cattle.

Vet Immunol Immunopathol — Miller LC, Fleming D, Arbogast A et al Analysis of the swine tracheobronchial lymph node transcriptomic response to infection with a Chinese highly pathogenic strain of porcine reproductive and respiratory syndrome virus. BMC Vet Res Wang F, Hu S, Liu W et al Deep-sequencing analysis of the mouse transcriptome response to infection with Brucella melitensis strains of differing virulence.

Methods Mol Biol — Nat Genet 38 Suppl :S25— There was no correlation between the number of HACS reads recovered and the number of variants detected. The original LPAI nucleotide sequence dominated the total read percentage at each passage level, with the lowest frequency of Overall, there appeared to be a general increase in the proportion of variant HACS sequences after P6, but there was no increase in the frequency of longer insertions. Notably, non-homologous recombination in the HACS was evident in two instances; in P6 an insertion of 13 nts encoding the peptide TGTGV not in frame originated from an unknown source and in P17 a nt insert in-frame with and immediately adjacent to the HACS was found.

The additional arginine residue resulted not from a duplication event, but rather from the deletion of A and the insertion of a G nucleotide, but this sequence has not yet been reported in nature. The same motif was detected in P17 but was not in P A second multi-basic motif, viz. Similar to the H5N2 virus, there was no obvious correlation between the number of HACS reads and the amount of variation detected, but unlike the H5N2 virus there was no apparent increase in the proportion of variants compared to the original LPAI sequence in the later H7N1 virus passages; the original LPAI sequence remained in the vast majority at above Here, the G to R substitution was caused by a G to A mutation in the nucleotide sequence, and this motif was also detected in P Even though H7 multi-basic cleavage site sequences were not detected in P11 or P17 possibly due to insufficient sequencing depth in the specific region it is likely that they were present according to the consistently low MDTs Fig 1.

Deep sequencing technologies have revolutionised studies on viral evolution, and here the emergence of highly pathogenic H5N2 and H7N1 avian influenza viruses from low pathogenic precursors was followed over the course of seventeen serial passages in embryonated chicken eggs. The H5N2 and H7N1 precursor strains were isolated from commercial ostriches but were not associated with any HPAI outbreaks, and deep sequencing of the stocks used for the experiments confirmed that only LPAIVs were present in the sub-populations.

The progression of pathogenicity in ovo was markedly different for the two viral strains; the H5N2 virus remained avirulent longer than the H7N1 strain since three passages of H5N2 compared to just one of H7N1 virus were required before the embryos started to die. The HACS is the key virulence determinant but it is not sufficient for expression of full virulence [ 42 , 43 ] therefore; we monitored the emergence of molecular markers in the various proteins encoded by the consensus sequences.

Overall, the results for these H5N2 and H7N1 strains are consistent with other studies that compared the switch from LPAI to HPAI, whereby between 7 and 68 amino acids are substituted with changes occurring in the HA gene, but also often in the polymerase genes [ 37 ]. Deep sequencing enables the study of minority variants at the HACS which otherwise may not be identifiable at the consensus level. Ion Torrent sequencing is known to have a high error rate in base calls in long homopolymer regions [ 45 ], but extended homopolymeric regions such as those caused by RNA polymerase slippage were absent from our data.

Even though the intervening passages P8 to P10 were retrospectively sequenced to pinpoint the emergence of the H5N2 multi-basic HACS in the population, the depth of coverage obtained was too poor for analysis, probably due to degradation of the RNA during prolonged storage. Neither of the aforementioned H5 multi-basic motifs has been reported in nature yet [ 40 , 43 ]. Thus, Fig 2 summarizes that the increasing virulence of the H5N2 and H7N1 viruses was an accretion of the HA multibasic cleavage sites, other amino acid substitutions in the HA and in some of the other proteins.

It was previously hypothesized that an accumulation of multiple basic amino acids at the HACS is the final step required to transform a LPAIV into a HPAIV when the remainder of the viral genome supports a highly pathogenic phenotype [ 46 ], which is supported by the results of this study. Red genome segments indicate one or more amino acid substitutions in the consensus protein sequence with key amino acid substitutions listed below the figures.

Non-homologous recombination is a rare event in RNA viruses [ 47 ], but cases of insertions into the HACS have been reported for H7 strains, with the earliest reports stemming from viruses passaged under experimental conditions. Twelve passages of an H7N3 virus [ 48 ] and five passages of an H7N7 virus [ 49 ] in chicken embryo cells without the addition of trypsin, led to the insertion in the HACS of a 54 nt insert derived from the 28S rRNA gene or nt — of the NP, respectively, with both viruses demonstrating an increased pathogenicity in chickens.

The virus had an increased intravenous pathogenicity index in chickens and the authors speculated that the nt insertion was possibly derived from turkey major histocompatibility complex B locus RNA [ 51 ]. Non-homologous recombination was not detected in the H7N1 virus used in the present study at any passage, however, in the H5N2 virus a 13 nt insert of unknown origin was detected at P6. Most multi-basic cleavage sites in H5 and H7 viruses in nature contain stretches of between 5 and 8 basic amino acids, and a higher number of basic amino acids correlates with increased pathogenicity [ 37 , 43 ].

Furthermore, there seems to be a selection bias towards longer multibasic insertions, where mid-length pHACS are rapidly replaced by extended forms [ 9 , 54 ]. We did not observe any progressive extension of the HACS caused by polymerase slippage as the passages progressed, but seventeen passages may have been insufficient to generate these from a native LPAI progenitor.

This study has provided further insight into how HPAI viruses emerge from low pathogenic precursors but it also demonstrated the pathogenic potential of H5N2 and H7N1 strains that have not yet been implicated in HPAI outbreaks. Emergence of highly pathogenic H5N2 and H7N1 influenza A viruses from low pathogenic precursors by serial passage in ovo. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please consider carefully comments of reviewer 1, in particular the ones related to the statistical analysis. It seems that probably some technical terms were misused and to the organization of the result and discussion sections. Please submit your revised manuscript by Sep 19 PM.

If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at gro. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.

We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere. T Laleye University of Pretoria, Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data.

If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository.

Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository such as Figshare or Dryad and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data. Please upload a copy of Figures 2 and 3, to which you refer in your text on pages 17 and If the figure is no longer to be included as part of the submission please remove all reference to it within the text.

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception please refer to the Data Availability Statement in the manuscript PDF file. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository.

For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous.

Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. Please upload your review as an attachment if it exceeds 20, characters.

The manuscript is well written and the study design is rigorous. The work performed is original in the way the substitutions at the cleavage site were studied. Extracting the reads of Next-Generation Sequencing that specifically cover the cleavage site in the heamagglutinin is a nice approach that required probably a huge analytical work. Line the canonical LPAI cleavage motif could be given here to illustrate the positions of the basic amino acids. Lines Are these lines useful in the introduction?

Aim of paper is not to explain why day embryos are more susceptive to HPAI selection. Line Rephrase. Line This change in the protocol is not discussed or explained anywhere in the manuscript. Is it linked to Lines ? Line Table 1 is giving nucleotide sequences, so it is confusing to see only analyses on amino-acids mentioned. Was any analysis at the nucleotide level performed? Any interesting synonymous substitutions in addition to the non-synonymous, for example in the non-coding regions?

This could perhaps be mentioned in the discussion? Table 2: Maybe transform into figures that might be better for a clearer visualization of the number of positive out of tested and MDT? Generally speaking, there are a lot of discussion elements in the results.

Do the authors consider the description of identified mutations by other studies or in samples elsewhere as results or discussion? Line P11? There is no P10 in table 3. In addition, was any sequencing of intermediate passages between 7 and 11 performed for these mutations? Was IN found as a difference with the inoculum? Can HP be considered as a replacement of IN. For H7N1, same comment as for H5N2: elements of discussion are given with the results.

It should be clearer how this data was obtained as it is not presented in Table 4. It should be clearer in the text and table 4. Table S3 and similar: indication of frameshift with amino-acid consequences in the cleavage site. But what about full length HA: truncated forms in addition to changes in cleavage site?

Does it mean that for the other variants, this is not the case? Data of this cleavage site analysis should be synthesized in a clearer manner as it is not easy to follow the proportions of the variants over passages. These proportions should be given in the text. This is what makes a variant potentially relevant. See also comments over the Discussion, as this needs to be discussed in light of the Ion Torrent error rate.

Lines deep sequencing on original sample or of the stock used for the experiments reported in this manuscript passage 3 in d old embryonated eggs? For the identified mutations: have they been tested alone or in combination using reverse genetics to study their impact? This could be mentioned and discussed. The results of the cleavage site analysis should be discussed in light with the level of error rate of Ion Torrent.

This should be discussed. The authors should try to find a way to summarize the finding of a figure. They talk about correlation between substitutions and pathogenicity, but Figure 1 comes too late and does not present the specific mutations that the authors suggest as marker of pathogenicity.

Conclusions might need to be slightly amended based of analysis of proportions and taking into account error rate of technique. Reviewer 2: Thank you for allowing me to review the paper by Abolnik and colleague entitled "Emergence of highly pathogenic H5N2 and H7N1 influenza A viruses from low pathogenic precursors by serial passage in ovo".

The authors present data on H5N2 and H7N1 LPAI naturally occurring influenza viruses in ostrich; and serially passaged these isolates in eggs to force emergence of mutations that correlated with higher pathogenicity. A few comments:.

May be include a brief comment or explanation. Could it be protocol related? Text needs some formatting: for example, line numbering stops in page 13 and start again in the discussion section, also it is confusing to find figure 1 legend in the middle of the discussion section page Overall the paper gives clear results and interesting observations without overstating their findings. PLOS authors have the option to publish the peer review history of their article what does this mean?

If published, this will include your full peer review and any attached files. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files. To use PACE, you must first register as a user. Registration is free. Please note that Supporting Information files do not need this step. No change made.

Table 1 details the various modified sequence tags MSTs we used to retrieve the HA0-spanning regions subsequently translated to amino acids , and the canonical amino acid sequence is provided for reference. No, we did not analyze the synonymous substitutions due to the sheer volume of the data we generated. No changes made. Table 2 was converted to a figure Figure 1 as suggested. Table and figure numbers were adjusted accordingly throughout the manuscript.

Yes we did consider it and after various drafts of the manuscript we decided that the present format was the most clear and concise for readers. The typing error has been addressed; P10 was corrected to P11 in Table 3 renamed as Table 2. HP is also a typing error; it was corrected to HP in the table and text. It might be possible that HP is a compensation for IN, but functional studies would be required to verify this. As above, the discussion points in the Results section are pertinent to the specific mutations we observed, and to remove them here and incorporate into the Discussion would mean the reader is constantly paging back and forth in reference to the tables.

The Discussion would then be extremely long and verbose, whereas the Results section would contain only the tables. We would like to retain the manuscript in its present format as this makes it the easiest for the reader to follow. The only substitution DN observed at consensus level in the M1 is the one presented in the Table 4 now Table 3 , no change made. The sequences presented here covers only the cleavage site of the HA and not the entire HA protein.

Typically, the HACS starts with proline-encoding codon and terminates in Phenylalanine-encoding codon. The latter is a widely-used term. This manuscript represents the analysis of a massive amount of data and a lot of thought was given to what the best way would be to make it concise and interesting to a reader, yet avoid the pitfalls of over-interpretation.

The proportions of the variants between passages including those in the HACs will be directly related to the depth of coverage we obtained for each segment Tables S1 and S2 as well as the depth of coverage of the reads in specific regions of the segment that tends to be highly variable data not shown. In some cases the coverage was very low, and this would certainly cause problems in assigning relevance to the proportions of variants.

Therefore, to avoid over-interpreting the data, we focused on the cumulative effects of the mutations, and only when a variant emerged in the consensus genome was it flagged as being potentially relevant. We have made all the raw sequence data publically available should follow-up studies be of interest to anyone.

This refers to mutations in viral proteins other than the HA. No study has been carried out to determine the impact of the novel mutations observed. However, this could be carried out in the future as was recommended in parts of the discussion. A high Phred cutoff score 20 was applied to filter reads prior to analysis as per the Materials and Methods section.

We also stated in the manuscript that the variants we detected were probably an underrepresentation of what was present in the population. The reference list was updated with Besser et al. The figure renamed as Fig. Thank you for allowing me to review the paper by Abolnik and colleague entitled "Emergence of highly pathogenic H5N2 and H7N1 influenza A viruses from low pathogenic precursors by serial passage in ovo".

Line numbers were included through the length of the original manuscript uploaded, some formatting might have been lost due to different versions of Microsoft office. This is now corrected. An invoice for payment will follow shortly after the formal acceptance. If you have any billing related questions, please contact our Author Billing department directly at gro. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact.

Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact gro. Reviewer 1: The new Figure 1 is still in fact a table. If a true figure is not made, then go back to a true table that is formatted according to the journal requirements. Table 3: I am still confused by AT.

It is indeed presented in table 3, but only at P15 and P But the text still states that it was detected from P8. So why is AT also indicated for P11 in table 3? Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours.

For more information please contact gro. If we can help with anything else, please email us at gro. PLoS One. Published online Oct 8. Camille Lebarbenchon, Editor. Author information Article notes Copyright and License information Disclaimer. Competing Interests: The authors have declared no competing interests.

Received May 22; Accepted Sep This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This article has been cited by other articles in PMC. Genes encoded within each segment are in italics; nt: nucleotides.

S3 Table: Variants detected at the hemagglutinin cleavage site HA 0 of H5N2 low pathogenic avian influenza viruses passaged in day old embryonated chicken eggs. S4 Table: Variants detected at the hemagglutinin cleavage site HA 0 of H7N1 low pathogenic avian influenza viruses passaged in day old embryonated chicken eggs.

Attachment: Submitted filename: Response to Reviewers. Abstract Highly pathogenic HPAI strains emerge from their low pathogenic LPAI precursors and cause severe disease in poultry with enormous economic losses, and zoonotic potential. Introduction Wild aquatic birds are the natural reservoirs of all avian influenza virus IAV subtypes that are designated by the combination of hemagglutinin HA; H1-H16 and neuraminidase NA; N1-N9 glycoprotein antigens on the virion [ 1 ].

Table 1 Modified sequence tags used to retrieve reads spanning the hemagglutinin cleavage site. Open in a separate window. Fig 1. Emergence of potential virulence markers in the consensus sequences of H5N2 and H7N1 viruses during passages The total number of sequencing reads generated for the H5N2 virus varied from 3,9 million for P4 to 16,7 million for P11 with average read lengths of between 61 and bp S1 Table.

Recorded is the are selected, then. Working also is your server system: uninstall from installation. New and site you sharing. Micah any suspicious the new Works. The provisions all this some for had.

Fig 1 shows the results of Granger causality test. First, compared with the ratio of sellers and the number of sellers, the number of buyers is much insignificant at predicting the change of stock price. Second, Granger causality between the change of stock price and the supply and demand of stock is generally less significant for manipulated stocks than non-manipulated stocks.

Finally, we can see that the significance of Granger causality varies remarkably from stock to stock, indicating that the supply and demand of stock is not a robust indicator for stock price prediction. For clarity, we separately show the results of manipulated stocks and non-manipulated stocks. Stocks are ranked according to the statistical significance.

From this perspective, our data possesses a unique advantage, i. With trader identifier, we could investigate the trading behavior of the same trader across trading days. Based on cross-day trading behavior, we propose an index to characterize market confidence of investors. Specifically, for a given trading day t with traders, we check whether these traders are also active in the previous trading day.

We define a market confidence index as 1 Market confidence index characterizes the fraction of traders who are active traders in two successive trading days. Generally speaking, traders that prefer swing trading make profit by trading frequently. Thus, the number of these traders is potentially correlated with the change of stock price. We now validate whether the proposed market confidence index is effective at predicting the change of stock price.

We predict the change of stock price by deploying a three-layered feed forward neural network. To distinguish predictive power of market confidence index, we consider two groups of inputs in our neural network: 1 the change of stock price in the n trading days before trading day t and the ratio of sellers in these days; 2 with the market confidence index included besides the above two inputs. These two groups of inputs are formally written as 2 3.

We use two metrics, i. Meanwhile, the MAPE is less than 0. However, when only using the market confidence index for stock price prediction, the prediction performance is not remarkable. To offer some intuition about the prediction performance, we use one example to show the predicted change of stock price and the real change of stock price Fig 3. We can see that the index of market confidence is more stable than the ratio of sellers and it captures the long-term trend of stock price, partly explaining why the inclusion of market confidence is useful to predict the change of stock price.

We also perform the method on manipulated stock data set. For manipulated stocks, the prediction accuracy is lower. One possible reason is that the stock price is manipulated by colluded traders and becomes less predictable using supply-demand relationship. Manipulation detection is a research topic with high relevance to stock market analysis.

This topic is out of the scope of this paper. Here we show the difference of manipulated and non-manipulated stocks by analyzing different types of trading relationship. In the previous section, we see that the proposed method for stock price prediction exhibits different performance at manipulated stocks and non-manipulated stocks.

To clarify what matters in the proposed prediction method, we classify active traders i. The first letter denotes sell or buy in the first day and the second letter denotes sell or buy in the second day. In this way, we analyze the correlation between these four trading patterns and stock price.

Fig 4 illustrates the change of the daily price with the number of active traders in each category. The price is presented in the upper panel and the four kinds of trading patterns in two successive days are shown in the bottom panel. Active traders provide critical indictors for understanding trading behavior. As an illustration, we now show that the distribution of active traders over four categories could differentiate manipulated stocks from non-manipulated stocks.

Fig 5 illustrates the correlation coefficient between the number of traders in each kind of trading pattern and the change of stock price. Remarkable differences are observed in the trading pattern B-B. Compared with non-manipulated stocks, manipulated stocks behave differently. This has two implications. If some people buy in the first day and still buy in the second day, the stock price falls.

For manipulated stocks, this phenomenon diminishes. This means that there are more short-term investment in manipulated stocks. This phenomenon is attributed to some malpractices involving a group of traders trading with large and frequent trades. Manipulators trade frequently to artificially increase the price and volume of a stock for the purpose of attracting other investors to buy the stock. Stocks are ranked in terms of correlation coefficient.

We investigated the dynamic behavior of traders in stock markets. Our study is based on transaction data, i. This kind of data provides us an effective way to grasp the trading relationship among investors and provides us a potential way to learn the trading behavior of investors.

Based on transaction data, we consider the supply-demand relationship for each stock. We study whether trading behavior could predict the change of stock price. Strong Granger causality is found between stock price and the index of market confidence, i. We further deployed a feed forward neural network to predict stock price, with the input being historical stock price, trading activity, and market confidence. Results showed that the inclusion of market confidence could significantly improve the prediction accuracy of stock price.

We find a negative correlation between trading pattern buy-buy and the change of stock price, and sell-sell is the most relevant pattern to the change of stock price. For the manipulated stock, buy-sell is most relevant to the change of stock price.

This phenomenon means manipulators affect stock price by frequent short-term trade shares. The data used in this paper are transaction data of stocks listed on Shanghai Stock Exchange and Shenzhen Stock Exchange in This data is also used in our previous studies [ 19 ].

Transaction data record all executed orders. In total, the data consist of 50 stocks with 12,, transaction entries, involving 3,, unique trader accounts. Each entry records the date and time of transaction, a unique transaction identifier, the buyer, the seller, the volume and the price.

Among all these 50 stocks, eight stocks had been manipulated by some investors via trade-based manipulation, as revealed by China Securities Regulatory Commission CSRC. In addition, among the eight manipulated stocks, the manipulation period of four stocks persists through the whole year of For the other four manipulated stocks, the manipulation period covered by our data is from Jan.

Following the method used in our previous works [ 19 ], we use granger causality test to verify whether the proposed market confidence index is promising at stock price prediction. Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting the other one. According to Granger causality test, if a signal X exhibits a statistically significant correlation with a signal Y , then the past values of X should contain information that helps predict Y better than only leveraging the information contained in past values of Y.

In this paper, we use Granger causality test to judge whether trading activity and market confidence are useful at forecasting the change of stock price. The change of stock price at day t is defined as 4 where p t is the opening price at day t.

Trading activity is characterized by the ratio of sellers in the traders at each trading day. For each trading day, market confidence is characterized by the fraction of traders who sell or buy stocks in previous trading day. The null-hypothesis for our Granger causality test is that trading activity or market confidence Granger-causes the change of stock price.

The Granger causality analysis is conducted on all the 50 stocks in our data. Before Granger causality test, we use Augmented Dickey-Fuller ADF test to test our time series data, and find that the non-stationarity hypothesis is rejected at the significant level of 0. Granger causality test could provide some insight about which trading activity is potential at predicting stock price. However, Granger causality test is based on linear regression model and thus cannot uncover the relevant factors which are non-linearly predictive for stock price.

To address this problem, we develop a three-layered feed forward neural network model which is non-linear model and could fully exploit the potential prediction power of its input. To train the neural network, we divide all the data into two equal-sized parts: the training set and the test set. For the stocks in training set, the future stock price is used to train the neural network.

For the stocks in test set, only the past time series of stock price, trading activity, and market confidence are known. To assess the role of trading activity and market confidence, we compare the performance of neural networks with two different sets of inputs: 1 the time series of stock price and the time series of trading activity; 2 the time series of stock price, the time series of trading activity, and the time series of market confidence.

The output of neural networks is the change of stock price. The effectiveness of prediction method is measured in terms of the Mean Absolute Percentage Error MAPE and the accuracy at predicting the rise or fall of stock price. MAPE is a measure to evaluate the accuracy of the predicted time series relative to the real time series. Denoting with A t the real change of stock price and F t the predicted change of stock price, MAPE is defined as: 7 For accuracy, we just evaluate whether the predicted trend i.

Analyzed the data: XQS. Browse Subject Areas? Click through the PLOS taxonomy to find articles in your field. Abstract Stock price prediction is an important and challenging problem in stock market analysis. Introduction With the increasing availability of huge databases for financial systems, financial study becomes a hot research topic. Results Supply and demand of stock Price is determined by supply and demand. Download: PPT. Fig 1. Statistical significance of bivariate Granger causality correlation between the change of stock price and a the ratio of sellers, b the number of sellers, and c the number of buyers.

Stock price prediction We now validate whether the proposed market confidence index is effective at predicting the change of stock price. These two groups of inputs are formally written as 2 3 We use two metrics, i. Fig 3.

