CRISPR screen for host factors required for influenza A virus replication
Please refer to the manuscript for full details: doi.org/10.1038/s41467-019-13965-x
Raw data
Raw data for the CRISPR screen with Bo Li, John Doench and Nir Hacohen can be downloaded here.
Experimental design
Samples in screen 1 are labelled as follows:
-
C1 - uninfected Day 9 post-lenti, no flu
-
C2 - infected 16h not sorted
-
P3 - extreme high HA - top 1.5%
-
P4 - middle of distribution of HA
-
P5 - low HA (1% further into distribution than P6)
-
P6 - very low HA (bottom 1.5% of HA )
Samples in screen 2 are labelled as follows:
-
C1 - uninfected Day 9 post-lenti, no flu
-
C2 - infected 16h not sorted
-
P3 - high HA - top 15%
-
P4 - middle of distribution of HA
-
P5 - low HA (2.5% further into distribution than P6)
-
P6 - very low HA (bottom 2.5% of HA ) - ie. comparable to the combined P5+P6 sample in screen 1. The primary comparison of (average $P_{5,6}$ vs $P_4$ for screen 1, $P_6$ vs $P_4$ for screen 2) was chosen a priori.
Data were prepared from raw reads using the following procedure
-
Use FASTQ QC on initial read file, convert to FASTA
-
Exclude “unexpected” files contain reads that don’t match guide RNAs - these are usually sequencing errors.
-
count sequences ocurring in each barcode (ie. each well on the PCR plate for library preparation has a unique barcode, but the same sample was split between many wells because we get so much DNA)
-
quality control - compare read counts against expected guide sequences for AVANA4
-
normalise the read counts across wells because differences between wells (for the same original sample) are random, so we don’t want to amplify these. Normalise to $log_2$ reads per million. \(\begin{aligned} log_2(\frac{sgRNA count*1000000}{total reads in well}+1) \end{aligned}\) [All subsequent values are $log_2$ reads per million]
-
combine different wells by averaging the $log_2$ reads per million