CRISPR screen for host factors required for influenza A virus replication

Please refer to the manuscript for full details: doi.org/10.1038/s41467-019-13965-x

Raw data

Raw data for the CRISPR screen with Bo Li, John Doench and Nir Hacohen can be downloaded here.

Experimental design

Samples in screen 1 are labelled as follows:

  • C1 - uninfected Day 9 post-lenti, no flu

  • C2 - infected 16h not sorted

  • P3 - extreme high HA - top 1.5%

  • P4 - middle of distribution of HA

  • P5 - low HA (1% further into distribution than P6)

  • P6 - very low HA (bottom 1.5% of HA )

Samples in screen 2 are labelled as follows:

  • C1 - uninfected Day 9 post-lenti, no flu

  • C2 - infected 16h not sorted

  • P3 - high HA - top 15%

  • P4 - middle of distribution of HA

  • P5 - low HA (2.5% further into distribution than P6)

  • P6 - very low HA (bottom 2.5% of HA ) - ie. comparable to the combined P5+P6 sample in screen 1. The primary comparison of (average $P_{5,6}$ vs $P_4$ for screen 1, $P_6$ vs $P_4$ for screen 2) was chosen a priori.

Data were prepared from raw reads using the following procedure

  1. Use FASTQ QC on initial read file, convert to FASTA

  2. Exclude “unexpected” files contain reads that don’t match guide RNAs - these are usually sequencing errors.

  3. count sequences ocurring in each barcode (ie. each well on the PCR plate for library preparation has a unique barcode, but the same sample was split between many wells because we get so much DNA)

  4. quality control - compare read counts against expected guide sequences for AVANA4

  5. normalise the read counts across wells because differences between wells (for the same original sample) are random, so we don’t want to amplify these. Normalise to $log_2$ reads per million. \(\begin{aligned} log_2(\frac{sgRNA count*1000000}{total reads in well}+1) \end{aligned}\) [All subsequent values are $log_2$ reads per million]

  6. combine different wells by averaging the $log_2$ reads per million