Development Installation
========================

**WARNING THIS DOC GOES OUT OF DATE QUICKLY INFO MAY NOT BE CURRENT**
j
Analytics Automated (A_A) is a lightweight framework for automating long running
distributed computation principally focused on executing Data Science tasks.

Today it is trivially easy for Scientists, Researchers, Data Scientists and
Analysts to build statistical and predictive models. More often than not these
don't get turned in to useful and usable services; frequently becoming reports
on work which does not get actioned. In short, organisations often have trouble
operationalising the models and insights which emerge from complex statistical
research and data science.

Analytics automated is targeted at streamlining the process for turning your
predictive software into usable and maintainable services.

With A_A Researchers and data scientists can build models in the modelling tool
of their choice and then, with trivial configuration, Analytics Automated will
turn these models in to an easy to use API for integration in to websites and
other tools.

The other principal benefit of this system is to reduce technology lock-in.
Statistical modeling and Data Science expertise is now spread across a wide
range of technologies (Hadoop, SAS, R and more) and such technological
proliferation shows no sign of slowing down. Picking a single modeling
technology greatly reduces the pool of possible employees for your organisation
and backing the "wrong horse" means if you have to change it can be very costly
in terms of time, staffing and money.

A_A is agnostic to the modeling software and technologies you choose to build
your group around.

How it works
------------

This is the briefest of overviews of how the system, once in place, will
function. Users send data as a REST POST request call to a pre-configured analysis or
prediction task and after some asynchronous processing they can come back and
GET their results. It's as simple as that and you are free to build this in
to any system you have or build the UI of your choice.

Requirements
------------

A_A has a number of requirements in order to run. You will need:

* python3
* postgres
* django
* celery

Setup of analytics automated
----------------------------

Notes for our group members who may be less than familiar with setting up python
development environments.

Setup for a Mac which you control
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. Install latest python3.x
2. Install git
3. Install Redis

   brew install redis

4. Install postgres for your system, MacOSX version can be found at::

    brew install postgres

5. Install virtualenv and virtualenvwrapper::

    pip install virtualenv
    pip install virtualenvwrapper

6. Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added this to my .bash_profile

.. code-block:: csh

    PATH="/Library/Frameworks/Python.framework/Versions/3.4/bin:${PATH}"
    export PATH

    VIRTUALENVWRAPPER_PYTHON='/Library/Frameworks/Python.framework/Versions/3.4/bin/python3'
    export VIRTUALENVWRAPPER_PYTHON

    source virtualenvwrapper.sh

7. Then the following to start virtualenv wrapper and create and env::

    > source virtualenvwrapper.sh
    > mkvirtualenv analytics_automated
    > workon analytics_automated (FYI discontect with deactivate)

8. Install these libraries to this env::

    > pip install setuptools
    > pip install distribute

9. Once configured add a postgres user for analytics automated

.. code-block:: sql

    CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
    CREATE DATABASE analytics_automated_db;
    GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
    ALTER USER a_a_user CREATEDB;

10. On Mac you probably have to link some psql bits (mind the version)::

    > sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libssl.1.0.0.dylib /usr/lib
    > sudo ln -s /usr/local/Cellar/openssl/1.0.2a-1/lib/libcrypto.1.0.0.dylib /usr/lib
    > sudo mv /usr/lib/libpq.5.dylib /usr/lib/libpq.5.dylib.old
    > sudo ln -s /Library/PostgreSQL/9.4/lib/libpq.5.dylib /usr/lib

11. Check out analytics_automated from github::

    > git clone https://github.com/AnalyticsAutomated/analytics_automated.git

12. Install Celery::

    > pip install celery

13. Install the AnalyticsAutomated requirements from the relevant project requirements (probably requirements/dev.txt)::

    > pip install -r requirements/dev.txt

14. add some configuration bits which are omitted from github::

    > cd analytics_automated_project/settings/
    > touch base_secrets.json
    > touch dev_secrets.json

15. Add a blank json object to base_secrets.json. In a more advanced set up you can use this object to store system specific and non-public config such as BUGSNAG keys

.. code-block:: json

  {
  }

15. Add the dev database and secret key to the dev_secrets.json as per

.. code-block:: json

  {
    "USER": "a_a_user",
    "PASSWORD": "thisisthedevelopmentpasswordguys",
    "SECRET_KEY": "SOME ABSURDLY LONG RANDOM STRING"
  }

16. Run the migrations (don't forget --settings=analytics_automated_project.settings.dev) and create and admin user for the project.::

    > python manage.py migrate --settings=analytics_automated_project.settings.dev

17. Start the server by defining the settings you are using::

    > python manage.py runserver --settings=analytics_automated_project.settings.dev

18. Test the code also defining the settings you are using::

    > python manage.py test --settings=analytics_automated_project.settings.dev analytics_automated

Setup for a linux machine on our network
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. Set yourself up so you're using bash rather than csh, this will make virtualenv much easier to deal with
2. Get your own python3, somewhere local rather than on the network::

    > /opt/Python/Python-3.4.1/bin/virtualenv [SOME_PATH]

3. Add [SOME_PATH]/bin to your PATH in your .bashrc
4. Install virtualenv and virtualenvwrapper::

    > pip install virtualenv
    > pip install virtualenvwrapper

5. Set up bashrc or bash_profile to point virtualevnwrapper at the correct python 3. I added all this to my .bash_profile

.. code-block:: csh

   export WORKON_HOME=/scratch0/NOT_BACKED_UP/dbuchan/virtualenvs
   export PROJECT_HOME=$HOME/Code
   VIRTUALENVWRAPPER_PYTHON='/scratch0/NOT_BACKED_UP/dbuchan/python3/bin/python3'
   export VIRTUALENVWRAPPER_PYTHON

   source virtualenvwrapper.sh

6. Install these libraries to this env::

    > pip install setuptools
    > pip install distribute
    > pip install celery

7. Initialise postgres (you can add the path to PGDATA env var), this should add a superuser with your user name::

    > initdb -D [SOME_PATH]

8. start postgres, You may additionally need to get /var/run/postgres made writeable by all to run this.::

    > postgres -D [SOME_PATH] >logfile 2>&1 &

  or::

    > pg_ctl start -l /scratch0/NOT_BACKED_UP/dbuchan/postgres/logfile -D /scratch0/NOT_BACKED_UP/dbuchan/postgres/

  You can now log in with::

    > psql -h localhost -d postgres

9. Once configured add a postgres user for analytics automated

.. code-block:: sql

    CREATE ROLE a_a_user WITH LOGIN PASSWORD 'thisisthedevelopmentpasswordguys';
    CREATE DATABASE analytics_automated_db;
    GRANT ALL PRIVILEGES ON DATABASE analytics_automated_db TO a_a_user;
    ALTER USER a_a_user CREATEDB;

10. Install Redis

    > yum install redis

12. Check out analytics_automated from git::

    > git clone https://github.com/AnalyticsAutomated/analytics_automated.git

13. Install Celery::

    > pip install celery

14. Install the requirements from the relevant project requirements (probably requirements/dev.txt)::

    > pip install -r requirements/dev.txt

15. add some configuration bits which are omitted from github::

    > cd analytics_automated_project/settings/
    > touch base_secrets.json`
    > touch dev_secrets.json`

16. Add the BUGSNAG key to base_secrets.json as per

.. code-block:: json

    {
      "BUGSNAG": "YOUR KEY HERE"
    }

17. Add the dev database and secret key to the dev_secrets.json as per

.. code-block:: json

    {
      "USER": "a_a_user",
      "PASSWORD": "thisisthedevelopmentpasswordguys",
      "SECRET_KEY": "SOME ABSURDLY LONG RANDOM STRING"
    }

18. Run the migrations (don't forget --settings=analytics_automated_project.settings.dev)and create and admin user for the project::

    > python manage.py migrate --settings=analytics_automated_project.settings.dev

19. Start the server by defining the settings you are using::

    > python manage.py runserver --settings=analytics_automated_project.settings.dev::

20. Get Celery going. You probably want to read something about celery and django http://michal.karzynski.pl/blog/2014/05/18/setting-up-an-asynchronous-task-queue-for-django-using-celery-redis/For dev purposes we can start the workers with::

    > export PYTHONPATH=~/Code/analytics_automated/analytics_automated:$PYTHONPATH
    > celery --app=analytics_automated_project.celery:app worker --loglevel=INFO -Q localhost,celery

21. Consider also pip installing flower whereever your redis install is

22. Test the code also defining the settings you are using::

    > python manage.py test --settings=analytics_automated_project.settings.dev