Getting Started¶
Installation¶
Install using pip:
pip install resgen-python
Logging in¶
For now, we recommend storing your username and password in environment variables and using them to create a ResgenConnection
:
import os
import resgen as rg
rgc = rg.connect(
os.getenv('RESGEN_USERNAME'),
os.getenv('RESGEN_PASSWORD')
)
If the parameters to rg.connect
are omitted, it will automatically try to load the username and password from the environment variables RESGEN_USERNAME
and RESGEN_PASSSWORD
. It’s often easiest to place these in a .env
file. This can
then be loaded using python-dotenv:
from dotenv import load_dotenv
import os.path as op
load_dotenv(op.expanduser('~/.resgen/credentials'))
import resgen as rg
rgc = rg.connect()
Projects¶
Once logged in, all activity needs to take place within a project. This is managed by a ResgenProject
object.
Finding an existing project¶
If a project already exists, you can look it up using the find_project
function:
project = rgc.find_project('My project', gruser='Group or username')
The gruser
parameter stands for group or user. It is used to look up projects in
namespace that does not belong to the person logged in to the ResgenConnection object.
Creating a new project¶
The find_or_create_project
function checks to see if a project exists for this user and returns it if it does or creates it if it doesn’t.
project = rgc.find_or_create_project('My project')
Projects can also be associated with groups. To retrieve a project from a group, pass in the group name as a parameter:
project = rgc.find_or_create_project('My project', group='Group Name')
Projects are private by default. To create a public project, pass in the private=False parameter:
project = rgc.find_or_create_project("My project", private=False)
Datasets¶
Adding Datasets¶
Use sync_dataset
to upload data to a project. This function will check if a dataset with this filename exists in the project and uploads the local file if it doesn’t. If a dataset with an equivalent filename exists in the project, this command will simply return its uuid.
project.sync_dataset(
'AdnpKO.1000.mcool',datatype="matrix",
sync_remote=False, filetype="cooler", assembly="mm10"
)
If the passed in dataset is a url and sync_remote
is set to True
, then it will first be
downloaded and then added to the project. This may take some
time during which the dataset will appear to be there but
actually be incomplete. If sync_remote
is set to false, the dataset will be added as a
remote dataset.
Updating metadata¶
Metadata can be passed in piecewise and only the fields that are included will be updated:
import resgen
rgc = resgen.connect()
rgc.update_dataset('daTaSetUuiD',
{
"name": "newname",
"description": "newdescription",
"tags": [
{"name": "some:tag"},
{"name": "another:tag"}
]})
Finding Data¶
To find data, search for it using a ResgenConnection (find operations are not project specific). It’s often useful to place them into a dictionary for future use:
datasets = dict([
(d.name, d) for d in rgc.find_datasets("search_string",project=project, limit=20)
])
In the following examples, we assume that the first result is the one we’re looking for. In practice, this should be verified.
Downloading Data¶
Once you’ve found a dataset, you can retrieve a download link:
datasets[0].download_link()
import urllib.request
urllib.request.urlretrieve(ret['url'], '/tmp/my.file')
Finding chromsizes¶
chromsizes = rgc.find_datasets(
datatype='chromsizes', assembly='mm9'
)[0]
Using genomic coordinates¶
Using the chromsizes
dataset found in the previous section, we can create
a ChromosomeInfo
object to convert genomic locations to absolute positions
assuming all the chromosomes are concatenated.
>> chrominfo = rgc.get_chrominfo(chromsizes)
>> chrominfo.to_abs('chr8', 8.67e6)
1149815680.0
We can also use a genomic range and (optionally) pad it.
>> chrominfo.to_abs_range('chr1', 0, 100, padding=0.1)
[-10.0, 110.0]
This will come in handy when we make interactive figures centered on a particular region.
Finding gene annotations¶
gene_annotations = rgc.find_datasets(
datatype='gene-annotations', assembly='mm9'
)[0]
Using gene annotation coordinates¶
>> gene = rgc.get_gene(gene_annotations, 'CXCR3')
>> chrominfo.to_gene_range(gene, padding=0.1)
[2951868790.8, 2951871913.2]
Viewing Data¶
To view a dataset, we typically need the dataset itself (see Managing Data above) as well as a location. Locations in genomic data typically consist of a chromosome and a position. Because HiGlass shows a concatenated version of chromosomes, we need to convert genomic (chromosome, position) to “absolute” coordinates using a chromsizes file.
Creating interactive figures¶
Datasets can be interactively viewed using the higlass-python package. An example can be seen below:
import higlass
from higlass.client import View
initialXDomain = [
chrominfo.to_abs('chr8', 8.67e6),
chrominfo.to_abs('chr8', 14.85e6)
]
view1 = View([
ds_dict['AdnpKO.1000.mcool'].hg_track(height=300),
], initialXDomain=initialXDomain, x=0, width=6)
view2 = View([
ds_dict['WT.1000.mcool'].hg_track(height=300),
], initialXDomain=initialXDomain, x=6, width=6)
display, server, viewconf = higlass.display([view1, view2])
display
Saving Figures¶
Interactive figures can be saved to a project using a higlass-python
- generated viewconf. Note that the figure will be re-rendered and may not look exactly like the one generated by the HiGlass Jupyter widget. For finer control over figure quality, use the resgen web interface.
project.sync_viewconf(viewconf, "Figure 1D")
To export the figure as SVG or PNG, use the config menu in one of the higlass view headers.
Saving a notebook¶
If running in a Jupyter notebook, it can be helpful to sync the notebook itself with the resgen project. This can be done using some cell magic. First some javascript:
%%javascript
var nb = IPython.notebook;
var kernel = IPython.notebook.kernel;
var command = "NOTEBOOK_FULL_PATH = '" + nb.notebook_path + "'";
kernel.execute(command);
Followed by a Python sync:
import os
import os.path as op
project.sync_dataset(op.join(os.getcwd(), NOTEBOOK_FULL_PATH), force_update=True)