grnsight

Version:

Web app and service for visualizing models of gene regulatory networks

dondi.github.io/GRNsight/

72 lines (48 loc) • 3.7 kB

Markdown

# Protein-Protein Database All files pertaining the protein-protein database live within this directory. ## The basics #### Schema All network data is stored within the protein_protein_interactions schema on our Postgres database. The schema is located within this directory at the top level in the file `schema.sql`. It defines the tables located within the protein_protein_interactions schema. Usage: To load to local database ``` psql -f schema.sql postgresql://localhost/postgres ``` To load to production database ``` psql -f schema.sql <address to database> ``` ### Scripts All scripts live within the subdirectory `scripts`, located in the top-level of the network database directory. Any source files required to run the scripts live within the subdirectory `source-files`, located in the top-level of the network database directory. As source files may be large, you must create this directory yourself and add any source files you need to use there. All generated results of the scripts live in the subdirectory `script-results`, located in the top-level of the network database directory. Currently, all scripts that generate code create the directory if it does not currently exist. When adding a new script that generates resulting code, best practice is to create the script-results directory and any subdirectories if it does not exist, in order to prevent errors and snafus for recently cloned repositories. Within the scripts directory, there are the following files: - `generate_protein_network.py` - `remove_duplicates.sh` - `loader.py` #### Data Generator This script (`generate_protein_network.py`) generates the genes, protein information and the physical interactions between these genes from Yeastmine; then it writes this into the csv files used to load the database. Please make sure you have enough time (around 1.5 - 2 hours) to run this script. The files (`gene.csv`), (`physical_interaction.csv`), (`protein.csv`) will be generated in the script-results directory located in the sub-directory processed-loader-files. Usage: ``` python3 generate_protein_network.py ``` Once you have finished generating the loader files, you need to remove duplicate entries from the physical interactions file. The bash script (`remove_duplicates.sh`) does this for you. The resultant file (`no_dupe.csv`)will be generated in the script-results directory located in the sub-directory processed-loader-files. If your machine doesn't support bash shell scripts, then you have to make a new script that removes duplicate lines from a file and writes the results to a file. Sorry! Usage: ``` chmod u+x remove_duplicates.sh ./remove_duplicates.sh ``` #### Database Loader This script (`loader.py`) is to be used to load your preprocessed expression data into the database. This program generates direct SQL statements from the source files generated by the data preprocessor in order to populate a relational database with those files’ data Note: You may get an error saying that there was a duplicate protein. You have to manually check which protein was being inserted twice, go to the SGD website (or Yeastmine) and confirm the correct protein gene interaction. Currently this occurs with the protein 'Aad6p'. To fix it go to your protein.csv file and make sure that 'Aad6p' is paired with the gene 'YFL056C', and 'Aad16p' is paired with the gene 'YFL057C'. If any other issues arise, you must manually confirm on the SGD website. Sorry! Usage: To load to local database ``` python3 loader.py | psql postgresql://localhost/postgres ``` To load to production database ``` python3 loader.py | psql <path to database> ```