rnai-query¶
rnai-query
builds on the previously parsed Matlab files (see
rnai-parse) and uses them for quickly subsetting the complete dataset
of single-cells using an SQLite.
For that, you first need to create a database that indexes the plates` meta information. Having the database set up, you can query for different plates and compile different data-sets.
The following sections will explain how rnai-query
and its subcommands
are used. So far the following subcommands are available:
rnai-query insert
for inserting meta information to a database,rnai-query query
for querying the database and writing plates to a file,rnai-query compose
for creating of data sets,rnai-query select
for selecting single variables from the database.
The steps have to be taken in succession (or at least insert has to be the first command to be executed), so make sure to read it all.
Inserting meta information¶
Before being able query the database, we need to insert the parsed meta files. We can to that by calling:
rnai-query insert /i/am/a/file/called/tix.db
/i/am/a/path/to/parsed/data
where /i/am/a/path/to/parsed/data
points to the folder where the *meta.tsv
and *data.tsv
files lie (the result from rnai-parse).
This creates an SQLite database called tix.db
which we will use for
querying the data and creating datasets.
Creating data-sets¶
Having the database set up, we can query it and create custom single-cell data-sets by filtering on meta information. As a motivating example consider these two scripts:
rnai-query compose --sample 10 /i/am/a/file/called/tix.db OUTFILE
rnai-query compose --plate dz05-1e --gene pik3ca
/i/am/a/file/called/tix.db
OUTFILE
The first query would return 10 single cells randomly sampled from each well from all plates and write it to the file OUTFILE. The second query would only look at plate dz05-1e and gene pik3ca and write the single cells that fit the criteria to OUTFILE.
The next sections walk you through using rnai-query compose
.
Command line arguments¶
Say we would want to filter the database on some critera and only write the
single-cell features that fit these conditions. Using rnai-query compose
you
can choose which plates/gene/sirnas/etc. to choose from, by setting the
respective command line arguments:
--normalize | The normalization methods to use, e.g. like ‘zscore’ or a comma-separated string of normalisations such as ‘bscore,loess,zscore’. Defaults to ‘zscore’. If you do not want to normalize you need to explicitely set to ‘none’. |
--from-file | You can provide an optional tsv file that has been created using rnai-query query such that only on these files will be searched. The filters you provide, like –study or –pathogen, still need to be given. |
--study | The study to query for, e.g. like ‘infectx’, or a comma-separated string of libraries, such as ‘infectx,infectx_published’. |
--pathogen | The pathogen to query for, e.g. like ‘adeno’, or a comma-separated string of pathogens, such as ‘adeno, bartonella’. |
--library | The library to query for, e.g. like ‘d’, or a comma-separated string of libraries, such as ‘d,q’. |
--plate | The plate to query for, e.g. like ‘dz03-1k’, or a comma-separated string of plates, such as ‘dz03-1k,dz04-1k’. |
--design | The design to query for, e.g. like ‘p’. |
--replicate | The replicate to query for, e.g. like ‘1’, or a comma-separated string of replicates, such as ‘1,4’. |
--gene | The gene to query for, e.g. like ‘pik3ca’, or a comma-separated string of genes, such as ‘pik3ca,pik4ca’. |
--sirna | The sirna to query for, e.g. like ‘s12312’, or a comma-separated string of sirnas, such as ‘s12312,s123112’. |
--well | The well to query for, e.g. like ‘a01’, or a comma-separated string of wells, such as ‘a01,l05’. |
--featureclass | The featureclass to query for, e.g. like ‘cells’ or a or a comma-separated string of cells, such as ‘cells,perinuclei,nuclei’. |
--sample | The amount of single cells that are sampled per well,like ‘100’. If unset defaults to all cells. |
--debug | Dont write the files, but only print debug information. |
--help | Print a help message. |
If any argument is not specified it is internally set to None, the whole database will be searched and no filters applied.
Examples¶
Here, we show some examples how you can query. In these examples we use a
SQLite database called database.db
.
Sample 100 cells from every well for every plate and write standardized data to OUTFILE.
rnai-query compose --sample 100
database.db OUTFILE
Filter by pathogens shigella and bartonella and write standardized data to OUTFILE.
rnai-query compose --pathogen shigella,bartonella
database.db OUTFILE
Filter by pathogens Shigella and Bartonella and gene pik3ca and write standardized data to OUTFILE.
rnai-query compose --pathogen shigella,bartonella
--gene pik3ca
--normalize zscore
database.db OUTFILE
Filter by pathogens Shigella and Bartonella and gene pik3ca and only write debug info.
rnai-query compose --pathogen shigella,bartonella
--gene pik3ca
--debug
database.db OUTFILE
Filter by gene nfkb1, pathogen Shigella, study infectx, pooled designs, sample 1000 cells per well and write un-normalized data to OUTFILE.
rnai-query compose --gene nfkb1
--pathogen shigella
--study infectx
--design p
--sample 1000
--normalize none
database.db OUTFILE
Filter by gene pik3ca and mock, feature classes cells and perinuclei, pathogens Shigella and Bartonella, library Dharmacon with a pooled siRNA design, sample 100 cells from each well and write standardized data to OUTFILE.
rnai-query compose --featureclass cells,perinuclei
--gene pik3ca,mock
--library d
--design p
--pathogen shigella,bartonella
--sample 100
database.db OUTFILE
Filter from a pre-made list of plates and the same filters as before.
rnai-query compose --from-file file.tsv
--gene pik3ca,mock
--library d
--design p
--pathogen shigella,bartonella
--sample 100
database.db OUTFILE
Querying for plates¶
If you are only interested in getting the plates the fullfil some criteria and writing them to a file rnai-query query does the job.
rnai-query query --gene pik3ca,mock
--library d
--design p
--pathogen shigella,bartonella
--sample 100
database.db OUTFILE
The file you are getting can then be used input for –from-file for rnai-query compose. Sometiems this is required because the queries we want to submit to the data base are so big that it crashes. The arguments are quite the same as above.
Selecting single variables from the database¶
Sometimes we might want to select single features from the database without writing them to a file, for instance
- if we want to see which genes are available for a pathogen,
- to see which libraries are available for a pathogen,
- to see which plates carry which genes,
- …
We can use rnai-query select
for this kind of question. For example, if
we are interested in finding which genes are available on plate dz05-1e, we
would call
rnai-query select --plate dz05-1e database.db gene
rnai-query select
takes the same filters as rnai-query compose
, except
sample, normalize and debug, so check section Command line arguments.
Examples¶
Here, we show some examples how you can select variables. In these examples we
use a SQLite database called database.db
.
Select which genes are available for pathogens shigella and bartonella.
rnai-query select --pathogen shigella,bartonella
database.db gene
Select which libraries are available for pathogens shigella and bartonella and gene pik3ca.
rnai-query select --pathogen shigella,bartonella
--gene pik3ca
database.db library
Select pathogens for which pik3ca and mock, feature classes cells and perinuclei, library Dharmacon with a pooled siRNA design are available.
rnai-query select --featureclass cells,perinuclei
--gene pik3ca,mock
--library d
--design p
database.db pathogen