Documentation for Refinery

Documentation

View My GitHub Profile

Welcome to the documentation for Refinery.

Refinery was a project supported by the Knight Foundation's prototype fund to build an NLP web application that simplifies the use of complicated NLP tools in an easy to use web interface. For the analysis of large document corpuses, Refinery provides a simple drag and drop interface along with interactive visualizations that help provide intuitive insights into your data.

How is it built?

Refinery is deployed locally using Vagrant (tested on v1.8.1) and VirtualBox (tested on v5.0). The application is highly scalable and capable of processing large document corpuses due to a Bayesian nonparametric toolbox BNPy built around the latest advances for scalable inference in types of models. Refinery can be considered a general tool for quickly discovering a set of topics which can then be leveraged to quickly isolate and extract insights into relevant documents.

Running Refinery

Refinery is a browser driven web application built primarily off of Python. It was developed with the requirement that its implementation process be as simple as possible. Refinery requires three main packages - Git, Virtualbox, and Vagrant VM. Git is needed to clone the repository that will contain the main source code. VirtualBox and Vagrant VM allows Refinery to exist within a virtual machine that is accessible through your browser. The Vagrant package allows for the deployment of a Puppet manifest, which enables the automated installation of a large number of necessary software modules. To modify the installation process, the configuration file VagrantFile located within the root directory contains settings that help guide this process. Installation of Refinery is as follows from the command line:

git clone https://github.com/daeilkim/refinery.git
vagrant up

Further information

For machine learning, Refinery uses the BNPy package (BNPy Git Repository). The backend is supported by a PostgreSQL database used to store document corpuses and Redis for the pub/sub messaging framework used to see realtime updates. The web application is built on Python Flask, Gunicorn, Celery, and Nginx.

Troubleshooting Refinery

Most installation issues often are a result of outdated Vagrant links to Ubuntu images. The software was last checked to be functional as of 3/5/2016. Please contact @daeilkim if there are any installation issues.

Authors and License

Refinery is an open source project under the MIT License Copyright (C) Daeil Kim (@daeilkim)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.