Development Installation

WARNING THIS DOC GOES OUT OF DATE QUICKLY INFO MAY NOT BE CURRENT j Analytics Automated (A_A) is a lightweight framework for automating long running distributed computation principally focused on executing Data Science tasks.

Today it is trivially easy for Scientists, Researchers, Data Scientists and Analysts to build statistical and predictive models. More often than not these don’t get turned in to useful and usable services; frequently becoming reports on work which does not get actioned. In short, organisations often have trouble operationalising the models and insights which emerge from complex statistical research and data science.

Analytics automated is targeted at streamlining the process for turning your predictive software into usable and maintainable services.

With A_A Researchers and data scientists can build models in the modelling tool of their choice and then, with trivial configuration, Analytics Automated will turn these models in to an easy to use API for integration in to websites and other tools.

The other principal benefit of this system is to reduce technology lock-in. Statistical modeling and Data Science expertise is now spread across a wide range of technologies (Hadoop, SAS, R and more) and such technological proliferation shows no sign of slowing down. Picking a single modeling technology greatly reduces the pool of possible employees for your organisation and backing the “wrong horse” means if you have to change it can be very costly in terms of time, staffing and money.

A_A is agnostic to the modeling software and technologies you choose to build your group around.

How it works

This is the briefest of overviews of how the system, once in place, will function. Users send data as a REST POST request call to a pre-configured analysis or prediction task and after some asynchronous processing they can come back and GET their results. It’s as simple as that and you are free to build this in to any system you have or build the UI of your choice.

Requirements

A_A has a number of requirements in order to run. You will need:

  • python3
  • postgres
  • django
  • celery

Setup of analytics automated

Notes for our group members who may be less than familiar with setting up python development environments.

Setup for a Mac which you control

  1. Install latest python3.x

  2. Install git

  3. Install Redis

    brew install redis

  4. Install postgres for your system, MacOSX version can be found at:

    brew install postgres
    
  5. Install virtualenv and virtualenvwrapper:

    pip install virtualenv
    pip install virtualenvwrapper
    
  6. Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added this to my .bash_profile

PATH="/Library/Frameworks/Python.framework/Versions/3.4/bin:${PATH}"
export PATH

VIRTUALENVWRAPPER_PYTHON='/Library/Frameworks/Python.framework/Versions/3.4/bin/python3'
export VIRTUALENVWRAPPER_PYTHON

source virtualenvwrapper.sh
  1. Then the following to start virtualenv wrapper and create and env:

    > source virtualenvwrapper.sh
    > mkvirtualenv analytics_automated
    > workon analytics_automated (FYI discontect with deactivate)
    
  2. Install these libraries to this env:

    > pip install setuptools
    > pip install distribute
    
  3. Once configured add a postgres user for analytics automated

CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
CREATE DATABASE analytics_automated_db;
GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
ALTER USER a_a_user CREATEDB;
  1. On Mac you probably have to link some psql bits (mind the version):

    > sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libssl.1.0.0.dylib /usr/lib
    > sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libcrypto.1.0.0.dylib /usr/lib
    > sudo mv /usr/lib/libpq.5.dylib /usr/lib/libpq.5.dylib.old
    > sudo ln -s /Library/PostgreSQL/9.4/lib/libpq.5.dylib /usr/lib
    
  2. Check out analytics_automated from github:

    > git clone https://github.com/AnalyticsAutomated/analytics_automated.git
    
  3. Install Celery:

    > pip install celery
    
  4. Install the AnalyticsAutomated requirements from the relevant project requirements (probably requirements/dev.txt):

    > pip install -r requirements/dev.txt
    
  5. add some configuration bits which are omitted from github:

    > cd analytics_automated_project/settings/
    > touch base_secrets.json
    > touch dev_secrets.json
    
  6. Add a blank json object to base_secrets.json. In a more advanced set up you can use this object to store system specific and non-public config such as BUGSNAG keys

{
}
  1. Add the dev database and secret key to the dev_secrets.json as per
{
  "USER": "a_a_user",
  "PASSWORD": "thisisthedevelopmentpasswordguys",
  "SECRET_KEY": "SOME ABSURDLY LONG RANDOM STRING"
}
  1. Run the migrations (don’t forget –settings=analytics_automated_project.settings.dev) and create and admin user for the project.:

    > python manage.py migrate --settings=analytics_automated_project.settings.dev
    
  2. Start the server by defining the settings you are using:

    > python manage.py runserver --settings=analytics_automated_project.settings.dev
    
  3. Test the code also defining the settings you are using:

    > python manage.py test --settings=analytics_automated_project.settings.dev analytics_automated
    

Setup for a linux machine on our network

  1. Set yourself up so you’re using bash rather than csh, this will make virtualenv much easier to deal with

  2. Get your own python3, somewhere local rather than on the network:

    > /opt/Python/Python-3.4.1/bin/virtualenv [SOME_PATH]
    
  3. Add [SOME_PATH]/bin to your PATH in your .bashrc

  4. Install virtualenv and virtualenvwrapper:

    > pip install virtualenv
    > pip install virtualenvwrapper
    
  5. Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added all this to my .bash_profile

export WORKON_HOME=/scratch0/NOT_BACKED_UP/dbuchan/virtualenvs
export PROJECT_HOME=$HOME/Code
VIRTUALENVWRAPPER_PYTHON='/scratch0/NOT_BACKED_UP/dbuchan/python3/bin/python3'
export VIRTUALENVWRAPPER_PYTHON

source virtualenvwrapper.sh
  1. Install these libraries to this env:

    > pip install setuptools
    > pip install distribute
    > pip install celery
    
  2. Initialise postgres (you can add the path to PGDATA env var), this should add a superuser with your user name:

    > initdb -D [SOME_PATH]
    
  3. start postgres, You may additionally need to get /var/run/postgres made writeable by all to run this.:

    > postgres -D [SOME_PATH] >logfile 2>&1 &
    

or:

> pg_ctl start -l /scratch0/NOT_BACKED_UP/dbuchan/postgres/logfile -D /scratch0/NOT_BACKED_UP/dbuchan/postgres/

You can now log in with:

> psql -h localhost -d postgres
  1. Once configured add a postgres user for analytics automated
CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
CREATE DATABASE analytics_automated_db;
GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
ALTER USER a_a_user CREATEDB;
  1. Install Redis

    > yum install redis

  1. Check out analytics_automated from git:

    > git clone https://github.com/AnalyticsAutomated/analytics_automated.git
    
  2. Install Celery:

    > pip install celery
    
  3. Install the requirements from the relevant project requirements (probably requirements/dev.txt):

    > pip install -r requirements/dev.txt
    
  4. add some configuration bits which are omitted from github:

    > cd analytics_automated_project/settings/
    > touch base_secrets.json`
    > touch dev_secrets.json`
    
  5. Add the BUGSNAG key to base_secrets.json as per

{
  "BUGSNAG": "YOUR KEY HERE"
}
  1. Add the dev database and secret key to the dev_secrets.json as per
{
  "USER": "a_a_user",
  "PASSWORD": "thisisthedevelopmentpasswordguys",
  "SECRET_KEY": "SOME ABSURDLY LONG RANDOM STRING"
}
  1. Run the migrations (don’t forget –settings=analytics_automated_project.settings.dev)and create and admin user for the project:

    > python manage.py migrate --settings=analytics_automated_project.settings.dev
    
  2. Start the server by defining the settings you are using:

    > python manage.py runserver --settings=analytics_automated_project.settings.dev::
    
  3. Get Celery going. You probably want to read something about celery and django http://michal.karzynski.pl/blog/2014/05/18/setting-up-an-asynchronous-task-queue-for-django-using-celery-redis/For dev purposes we can start the workers with:

    > export PYTHONPATH=~/Code/analytics_automated/analytics_automated:$PYTHONPATH
    > celery --app=analytics_automated_project.celery:app worker --loglevel=INFO -Q localhost,celery
    
  4. Consider also pip installing flower whereever your redis install is

  5. Test the code also defining the settings you are using:

    > python manage.py test --settings=analytics_automated_project.settings.dev