Gene Tilling in Xenopus tropicalis – Mitchell Sogin, Hilary Morrison, Joseph Vineis
Written by Hilary Morrison
We described initial work on Development of a TILLING Resource for the Xenopus Research Community (R21 HD065713) in our 2011 report. We have since completed this pilot study, going beyond our original goals because of the greater sequencing capacity available through our Illumina instruments. We have analyzed all exon targets for 291 high-impact genes from 100 animals in a multiplex design. We also sequenced the exon targets of 30 samples individually including the parental animals of mutagenized offspring. We achieved an average coverage of 1000X for 90% of genes and discovered 52 potential missense mutations in 50 genes across an analyses of 130 animals. Null mutations accounted for 13% of the total number of missense mutations. Using the mutations in all animals, we calculated a hit rate of ~2 per million positions. Some animals in the study exhibited as many as five hits while others had zero indicating a significant variation in the efficacy of the mutagenesis technique.
Our sequence analysis used commercially available software (CLC Workbench), publicly available software (Bowtie mapper and samtools) and custom python scripts. Trimmed data were mapped to full-length annotated sequences for each of the 291 target genes derived from build 7.1 of the Xenopus tropicalis genome and build 4.2 when annotations for build 7.1 were unavailable. An average of 67% ± 8% of total reads passed quality filtering and were successfully mapped to the reference Xenopus tropicalis genome, build 7.1. Although a single mutated allele in a pool of ten homozygous diploid animals predicts a ratio of 19 wild type sequences for every mutant, we employed the threshold of 1% to allow detection of variants that might arise because of potential pooling errors. For example, the frequency of the known pax8 mutant allele was 2%. If the trimming or mapping steps had removed additional reads covering this position, the mutation would have been missed. Using these criteria, the initial pooling phase of the experiment detected an average of 17,226 variants per pool.
Because the variant detection process reveals both polymorphisms (SNPs) and induced mutations, we used filtering steps to distinguish between the two. The initial filter generated variant information from parental animals and siblings of a single mating event. This information eliminated variants that likely represented polymorphisms when calculating a mutagenesis hit rate. The mean number of potential mutations reduced to 1,000/pool upon subtraction of the SNPs in the parental animals. The process also required that potential mutations from any single animal must occur only at the intersection of exactly two pools. After filtering parental SNPs, we identified the variants detected in two pools and thus unique to a single animal. This procedure identified a mean of 5,000 potential mutations per animal. A final subtraction of SNPs found in multiple animals produced a set of potential mutant alleles. Detecting an expected mutation in pax8 confirmed the potential power of this approach. After subtracting parental and sibling variants from each animal, we found 45 remaining variants in protein coding regions.
We confirmed a number of potential mutations in specific animals by capillary sequencing. Oligonucleotides flanking the site of variation generated amplicons for bidirectional sequencing. We inspected the electropherograms for the mutant nucleotide. Our analysis indicated that 80% of the variants detected in the exon capture TILLING approach were real and correctly assigned to the source animal. De novo mutations included a null mutation in ift88 and 31 missense mutations from several genes including pax8, smad9, tbx15 and foxe3.