Amplification Levels & Copy Number from Solexa

We can calculate levels of amplification as well as plasmid ploidy in a straightforward fashion from Solexa data. Consider our analysis of TT25790, which contains Elisabeth’s array EK568, fully characterized at both join points.

When we look at the “read density” map for reads crossing plasmid F’128(FC40) in this strain, we see that the frequency of reads increases abruptly at reference position ~132098 and returns to baseline abruptly at ~158297. If we go to the raw read data, we find 504973 reads in this interval of 26200 bp, for an average read density of 504973/26200 = 19.3 reads/nucleotide.

We can calculate in a similar manner the read density for the remaining, unamplified region of the plasmid, a circle of size 231427 bp. The total reads for the entire plasmid are 1569700, and for the unamplified region, 1569700-504973 = 1064727. Thus, the average unamplified read density will be 1064727/(231427-26200) = 5.2 reads/nucleotide.

Thus, the EK568 array is amplified with respect to the remainder of the plasmid by a factor of 19.3/5.2 = 3.7. This seems unusually low, given the fact that the samples were grown in minimal lactose medium. Even though the strain is rec, we expect this number to differ from preparation to preparation, as rec-independent recombination by mechanisms such as annealing, snap-back extension, and strand switching appears to be fairly frequent in F plasmid derivatives, perhaps the consequence of continuous rolling circle generation of long single-stranded DNA ends.

Returning to the raw data, we can ask what the read density across the 4068 bp lacIZ fusion gene itself is. We find 88003 reads across the gene, for a density of 88003/4068 = 21.6, which yields an amplification level of 21.6/5.2 = 4.2, still lower than expected.

Now, let’s look at the copy number of the plasmid itself. The chromosome contains 4857432 bp, across which we gathered 15716836 reads for an average density of 15716836/4857432 = 3.2 reads/nucleotide. We know that the unamplified region of the plasmid has a density of 5.2. Therefore, the ploidy of this plasmid with respect to the chromosome is 5.2/3.2 = 1.6, a trifle smaller than our working estimate of 2. Bear in mind, though, that this sample came from an overnight culture in stationary state. Under these conditions, we expect the copy number of F to be at its lowest.

We can use the read densities of the lacIZ gene and the chromosome to determine the copy number of the fusion per chromosome, equal to 21.5/3.2 = 6.8. If the activity of the mutant gene is 2% of wild-type and we assume strict additivity of gene expression, we calculate about 2×6.8 = 13.6 % final wild-type activity.

This may be enough to allow significant growth, but why wasn’t a greater growth rate selected by simple continued amplification? The answer may simply be that rec-independent amplification is slow compared to the relatively brief time to grow to stationary state with an already appreciable amount of lac activity.

Finally, why is the amplification level of lacIZ greater than the average amplification level of the array itself? The answer lies in the word “average”. Remember that this is a TID array in which the elements can be of different sizes. If many of the elements containing lacIZ are smaller than average, the actual level of the gene would be correspondingly greater, as observed.

Many thanks to Yong Lu for helping me collate this data!

cfu/pfu calculation worksheet

While doing an experiment recently, I found that I was doing a lot of guesswork to figure out how much I should dilute resuspended colonies to get nicely spaced single colonies for viable counts. Since I was doing the simple calculations a lot, I decided to put together a worksheet to do it for me.

I also added a panel that does the calculation the other direction –from # of colonies on a plate back to the original # of cells in the original culture.

Trivial? Yes.

Useful? Hopefully.

Accurate? Probably.

cfu_worksheet_ss

cfu_worksheet

First Solexa Data In!

(For those of you who may have forgotten, Solexa sequencing is a rapid, highly automated method of generating millions of short sequences at random across a DNA sample — often, an entire genome).

I have just received the first set of Solexa data from our collaboration with Fritz Roth and his colleagues, Yong Lu & Joe Mellor. The image below shows “read depth” (the number of runs which cross a given point) in the neighborhood of lacZ for strains TT24815 and TT25790. We expect this measurement to increase in proportion to the degree of amplification. Coverage over non-amplified areas of the plasmid and chromosome exceeded 50-fold for both strains.

Small red arrows show my guess at the amplification endpoints. The TT24815 array stretches from approximately 138256 to 166250 (~28 KB), and TT25790,  from 131600 to 159300 (~28 KB), where the coordinates refer to our standard F’128 sequence counting clockwise from the first nucleotide of IS3A.

Strain TT25790 contains Elisabeth’s known inversion duplication array (EK568), for which we have sequenced a single join point (134075->134087 recombined into 132108<-132098). Small blue arrows show these two tracts. In our simple models of inverted duplication formation, join 1 forms from their recombination, either directly (“Flying Walendas”) or by assymetric deletions of a larger toxic structure (“Slytherin”). Furthermore, all Solexa data in the array should begin at the leftmost blue arrow, gratifyingly close to my guessed endpoint. Join point 2 will be defined by a sequence near the righthand red arrow and its inverted complement at a position yet to be found in the amplified region. I shall go hunting!

Strain TT24815 was chosen for its recalcitrant nature — we were never able to find any join points, but assumed for this reason that it was a likely candidate for an amplified inverted duplication, as crossover sites in these entities are truly difficult to locate and sequence. We were hoping to get two new bits of previously unknown information out of it. Once again, half of each of the join points should be defined by small sequence inversions in the neighborhood of the red arrows, assuming that this is a truly simple array of elements representing one kind of inverted duplication. More hunting!

If you use your browser’s zoom feature, you can inspect the image with better resolution. You can also download a detailed PDF file.

readdepth-3-20092

Yally Pally to Doug

Doug gets a triple Yally Pally for figuring out the function of the Mysterious Apparatus which has travelled with the lab for years. It is <fanfare of trumpets>… a TLC plate spreader!

The Mysterious Apparatus
The Mysterious Apparatus

Why is natural selection hard to beat and when do you need to beat it?

[This is a stub entry I’m making for John under his name. He should re-edit it with his own words. — Eric]

Here’s a brief review I just wrote with Dan.

Why is natural selection hard to beat and when do you need to beat it?
John R. Roth and Dan I. Andersson

Bacterial genetics defeats natural selection — it uses positive selection to detect large-phenotype mutants without influencing their frequency.  Metazoans maintain organism integrity by defeating natural selection on somatic cell growth.  Bacterial genetics relies on selection strong enough to prevent growth of both the parent and common slightly-improved mutants.  When selective stringency is reduced, frequent small-effect mutations allow growth and initiate a cascade of successive improvements.  This rapid response rests on the unexpectedly high formation rate of small-effect mutations (particularly duplications and amplifications). Duplications form at a rate 104 times that of null mutations.  The high frequency of small-effect mutations reflects features of replication, repair and coding that minimize the costs of mutation.
The striking effect of small-effect mutations is seen in a system designed by John Cairns to test the effect of growth limitation on mutation rate.  A leaky E. coli mutant (lac) is plated on lactose medium.  Revertant (Lac+) colonies appear over 6 days above a lawn of (108) non-growing parent cells. These colonies have been attributed to stress-induced mutagenesis of the non-growing parent. This conclusion ignores natural selection, assuming that only large-effect mutants appear– as is true for lab genetic selections.  However, selection is not stringent in the Cairns system — small increases in lac enzymes allow growth.  Common cells with a lac duplication (and 2x the mutant enzyme level) initiate slow-growing colonies, in which selection drives a multi-step adaptation process – higher amplification, reversion to lac+ and loss of mutant lac alleles.  The high yield of revertant colonies under selection does not reflect mutagenesis, but rather the high spontaneous rate of gene duplication (10-5), amplification (10-2/step) and the selective addition of mutation targets (more cells with more mutant lac copies/cell).
Metazoan somatic cells may escape natural selection by the same mechanism.  Metazoans reduce the basal level of unexpressed genes 1000-fold (compared to bacteria) by their epi-genetic modification of DNA and histones – making it impossible for small-effect mutations to provide growth.

The origin of mutants under selection: Interactions of mutation, growth and selection

[This is a stub entry I’m making for John under his name. He should re-edit it with his own words. — Eric]

Here’s the abstract to a new article.

The origin of mutants under selection: Interactions of mutation, growth and selection

Dan I Andersson, Diarmaid Hughes and John R Roth

In microbial genetics, positive selection detects rare cells with an altered growth phenotype (mutants or recombinants).  The frequency of mutants signals the rate of mutant formation – an increased frequency suggests a higher mutation rate.  Increases in mutant frequency are never attributed to growth under selection.  The converse is true in natural populations, where changes in phenotype frequency reflect selection, genetic drift or founder effects, but never changes in mutation rate.   The apparent conflict is resolved because restrictive rules allow laboratory selection to detect mutants without influencing their frequency.  With these rules, mutant frequency can reliably reflect mutation rates. When the rules are not followed, selection rather that mutation rate dictates mutant frequency – as in natural populations.  In several laboratory genetic systems, non-growing stressed populations show an increase in mutant frequency that has been attributed to stress-induced mutagenesis (adaptive mutation).  Since the mutant frequency is used to infer mutation rate (standard lab practice), the rules must be obeyed.  A breakdown of the rules in these systems may have allowed selection to cause frequency increases that were attributed to mutagenesis.  These systems have sparked interest in interactions between mutation and selection. This has led to a better understanding of how mutants arise, and how very frequent, small-effect mutations, such as duplications and amplifications, can contribute to mutant appearance by increasing gene dosage and mutational target size.

Burnt by Prior Cassettes

For the second time, I have been burnt by the presence of a previously inserted cassette in a background I was using for linear transformation. In both cases, it was a pro::spec swap. I noticed an abnormally high number of camR transformants the next day. I printed the colonies to spectinomycin medium , and discovered that most were sensitive! What I had succeeded in swapping was the pre-existing cassette for the new one. Very few of the transformants were those which I had intended.

In parallel experiments, I found that the number of transformants increased over the course of a couple of days. Our standard UNI cassette (with the exception of tetAR) is a drug resistance gene embedded in an otherwise constant context, originally derived from the chloramphicol resistance locus of pACYC184. This means that sequence blocks about 200 bp at each end of the cassettes are always the same. Apparently, these regions are being used as recombination sites during the transformation.

I surmise that lambda red is not causing these insertions, but rather the host recombination system, because of the kinetics of their appearance. We incubate our transformations overnight at 42°C, specifically to eliminate pKD46 — this should confine all transformation to the first few minutes after electroporation. In addition, lambda red contains no mechanism for resecting the 3′ end of the transforming DNA back through the 40 bp intended homology block to make available the core of the cassette for strand invasion. This could only be done by the host after the red system had decayed or been diluted by growth,  freeing the cell from red repression of the host system.

If I had used tetAR, I would never have noticed this effect. The only homology blocks available are the UNI ends, too small to be used by the host recombination system, but still requiring host resection to be exposed, which could only happen after elimination of red.

Effects of L-Arabinose on Growth

I have traditionally used 20 mM L-Arabinose to induce lambda red on pKD46. I have also noticed that r-m+ strains grew exceptionally slow during linear transformation. I decided to trace the cause of this defect by comparing their growth along with that of wild type plus or minus ampicillin (used to hold the plasmid) and plus varying concentrations of L-Arabinose. Results: 20 mM L-Arabinose inhibits growth of the r-m+ strain and inhibits yield of both strains. Ampicillin exacerbates the growth inhibition of the r-m+ strain, but not of the wild type. Experiment and data.