Genome sequencing and assembly A 3kb paired-end library was gener

Genome sequencing and assembly A 3kb paired-end library was generated and sequenced at the Functional Genomics Center Zurich on a Roche Genome Sequencer FLX+ platform. A total of 872,570 high-quality filtered reads with a total of 188,465,376 bases were obtained, resulting in 31.8-fold average sequencing coverage. The obtained reads were assembled de novo using Newbler 2.5.3. This resulted in 150 contigs combined into one 6 Mb-long super-scaffold and 3 smaller scaffolds of 5.29 kb, 2.84 kb and 2.74 kb in size. The largest of the minor scaffolds constituted a ribosomal RNA operon, the other two showed sequence similarity to non-ribosomal peptide synthase modules. A portion of intra-scaffold gaps have been closed by sequencing of PCR products using Sanger technology, decreasing the total number of contigs to 41 with a contig N50 value of 329.

4 kb, the longest contig being 766.5 kb long. Note that the Genbank record contains 42 contigs due to fact that one of the contigs was split into two parts in order to start the assembly with the dnaA gene. While closing gaps it became possible to allocate the positions of all ribosomal operons by sequence overlap and thus to incorporate the largest of the minor scaffolds. However, it was not possible to precisely map the remaining two minor scaffolds. These must be located within two distinct remaining large gaps, but due to insignificance to the project they have been excluded from the assembly. Genome annotation Initial open-reading frame (ORF), tRNA, and rRNA prediction and functional annotation has been performed using the RAST (Rapid Annotation using Subsystem Technology) server [50].

For the purpose of comparison, the genome has also been annotated using Prokka [51], which utilizes Prodigal [52] for ORF prediction (the RAST server utilizes a modified version of Glimmer [53]). Start codons of all the predicted ORFs were further verified manually, using the position of potential ribosomal binding sites and BLASTP [54] alignments with homologous ORFs from other P. syringae strains as a reference. Functional annotations have also been refined for every ORF using BLASTP searches against the non-redundant protein sequence database (nr) and the NCBI Conserved-Domain search engine [55]. Functional category assignment and signal peptide prediction was done using the Integrated Microbial Genomes/Expert reviews (IMG/ER) system [56].

Genome properties The genome of the strain B64 is estimated to be comprised of 5,930,035 base pairs with an average GC-content of 58.55 % (Table 3 and Figure 2), which is similar to what is observed in other P. syringae strains [12,13,53]. Of the 5,021 predicted genes, 4,947 were protein coding genes, 4 ribosomal RNA operons, and 61 tRNA genes; 78 were identified GSK-3 to be pseudo-genes. The majority of the protein-coding genes (83.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>