Development Installation¶

WARNING THIS DOC GOES OUT OF DATE QUICKLY INFO MAY NOT BE CURRENT j Analytics Automated (A_A) is a lightweight framework for automating long running distributed computation principally focused on executing Data Science tasks.

Today it is trivially easy for Scientists, Researchers, Data Scientists and Analysts to build statistical and predictive models. More often than not these don’t get turned in to useful and usable services; frequently becoming reports on work which does not get actioned. In short, organisations often have trouble operationalising the models and insights which emerge from complex statistical research and data science.

Analytics automated is targeted at streamlining the process for turning your predictive software into usable and maintainable services.

With A_A Researchers and data scientists can build models in the modelling tool of their choice and then, with trivial configuration, Analytics Automated will turn these models in to an easy to use API for integration in to websites and other tools.

The other principal benefit of this system is to reduce technology lock-in. Statistical modeling and Data Science expertise is now spread across a wide range of technologies (Hadoop, SAS, R and more) and such technological proliferation shows no sign of slowing down. Picking a single modeling technology greatly reduces the pool of possible employees for your organisation and backing the “wrong horse” means if you have to change it can be very costly in terms of time, staffing and money.

A_A is agnostic to the modeling software and technologies you choose to build your group around.

How it works¶

This is the briefest of overviews of how the system, once in place, will function. Users send data as a REST POST request call to a pre-configured analysis or prediction task and after some asynchronous processing they can come back and GET their results. It’s as simple as that and you are free to build this in to any system you have or build the UI of your choice.

Requirements¶

A_A has a number of requirements in order to run. You will need:

python3
postgres
django
celery

Setup of analytics automated¶

Notes for our group members who may be less than familiar with setting up python development environments.

Setup for a Mac which you control¶

Install latest python3.x
Install git
Install Redis

brew install redis
Install postgres for your system, MacOSX version can be found at:
```
brew install postgres
```

Install virtualenv and virtualenvwrapper:

pip install virtualenv
pip install virtualenvwrapper

Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added this to my .bash_profile

PATH="/Library/Frameworks/Python.framework/Versions/3.4/bin:${PATH}"
export PATH

VIRTUALENVWRAPPER_PYTHON='/Library/Frameworks/Python.framework/Versions/3.4/bin/python3'
export VIRTUALENVWRAPPER_PYTHON

source virtualenvwrapper.sh

Then the following to start virtualenv wrapper and create and env:

> source virtualenvwrapper.sh
> mkvirtualenv analytics_automated
> workon analytics_automated (FYI discontect with deactivate)

Install these libraries to this env:

> pip install setuptools
> pip install distribute

Once configured add a postgres user for analytics automated

CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
CREATE DATABASE analytics_automated_db;
GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
ALTER USER a_a_user CREATEDB;

On Mac you probably have to link some psql bits (mind the version):

> sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libssl.1.0.0.dylib /usr/lib
> sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libcrypto.1.0.0.dylib /usr/lib
> sudo mv /usr/lib/libpq.5.dylib /usr/lib/libpq.5.dylib.old
> sudo ln -s /Library/PostgreSQL/9.4/lib/libpq.5.dylib /usr/lib

Check out analytics_automated from github:

> git clone https://github.com/AnalyticsAutomated/analytics_automated.git

Install Celery:
```
> pip install celery
```
Install the AnalyticsAutomated requirements from the relevant project requirements (probably requirements/dev.txt):
```
> pip install -r requirements/dev.txt
```

add some configuration bits which are omitted from github:

> cd analytics_automated_project/settings/
> touch base_secrets.json
> touch dev_secrets.json

Add a blank json object to base_secrets.json. In a more advanced set up you can use this object to store system specific and non-public config such as BUGSNAG keys

{
}

Add the dev database and secret key to the dev_secrets.json as per

{
  "USER": "a_a_user",
  "PASSWORD": "thisisthedevelopmentpasswordguys",
  "SECRET_KEY": "SOME ABSURDLY LONG RANDOM STRING"
}

Run the migrations (don’t forget –settings=analytics_automated_project.settings.dev) and create and admin user for the project.:
```
> python manage.py migrate --settings=analytics_automated_project.settings.dev
```

Start the server by defining the settings you are using:

> python manage.py runserver --settings=analytics_automated_project.settings.dev

Test the code also defining the settings you are using:

> python manage.py test --settings=analytics_automated_project.settings.dev analytics_automated

Setup for a linux machine on our network¶

Set yourself up so you’re using bash rather than csh, this will make virtualenv much easier to deal with
Get your own python3, somewhere local rather than on the network:
```
> /opt/Python/Python-3.4.1/bin/virtualenv [SOME_PATH]
```
Add [SOME_PATH]/bin to your PATH in your .bashrc

Install virtualenv and virtualenvwrapper:

> pip install virtualenv
> pip install virtualenvwrapper

Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added all this to my .bash_profile

export WORKON_HOME=/scratch0/NOT_BACKED_UP/dbuchan/virtualenvs
export PROJECT_HOME=$HOME/Code
VIRTUALENVWRAPPER_PYTHON='/scratch0/NOT_BACKED_UP/dbuchan/python3/bin/python3'
export VIRTUALENVWRAPPER_PYTHON

source virtualenvwrapper.sh

Install these libraries to this env:

> pip install setuptools
> pip install distribute
> pip install celery

Initialise postgres (you can add the path to PGDATA env var), this should add a superuser with your user name:
```
> initdb -D [SOME_PATH]
```
start postgres, You may additionally need to get /var/run/postgres made writeable by all to run this.:
```
> postgres -D [SOME_PATH] >logfile 2>&1 &
```

or:

> pg_ctl start -l /scratch0/NOT_BACKED_UP/dbuchan/postgres/logfile -D /scratch0/NOT_BACKED_UP/dbuchan/postgres/

You can now log in with:

> psql -h localhost -d postgres

Once configured add a postgres user for analytics automated

CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
CREATE DATABASE analytics_automated_db;
GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
ALTER USER a_a_user CREATEDB;

Install Redis

> yum install redis

Check out analytics_automated from git:

> git clone https://github.com/AnalyticsAutomated/analytics_automated.git

Install Celery:
```
> pip install celery
```
Install the requirements from the relevant project requirements (probably requirements/dev.txt):
```
> pip install -r requirements/dev.txt
```