CRISPR screen for host factors required for influenza A virus replication

Please refer to the manuscript for full details: doi.org/10.1038/s41467-019-13965-x

Raw data

Raw data for the CRISPR screen with Bo Li, John Doench and Nir Hacohen can be downloaded here.

Samples in screen 1 are labelled as follows:

Samples in screen 2 are labelled as follows:

C1 - uninfected Day 9 post-lenti, no flu
C2 - infected 16h not sorted
P3 - high HA - top 15%
P4 - middle of distribution of HA
P5 - low HA (2.5% further into distribution than P6)
P6 - very low HA (bottom 2.5% of HA ) - ie. comparable to the combined P5+P6 sample in screen 1. The primary comparison of (average $P_{5,6}$ vs $P_4$ for screen 1, $P_6$ vs $P_4$ for screen 2) was chosen a priori.

Use FASTQ QC on initial read file, convert to FASTA
Exclude “unexpected” files contain reads that don’t match guide RNAs - these are usually sequencing errors.
count sequences ocurring in each barcode (ie. each well on the PCR plate for library preparation has a unique barcode, but the same sample was split between many wells because we get so much DNA)
quality control - compare read counts against expected guide sequences for AVANA4
normalise the read counts across wells because differences between wells (for the same original sample) are random, so we don’t want to amplify these. Normalise to $log_2$ reads per million. $\begin{aligned} log_2(\frac{sgRNA count*1000000}{total reads in well}+1) \end{aligned}$ [All subsequent values are $log_2$ reads per million]
combine different wells by averaging the $log_2$ reads per million