Using AppLinks to link to other applications

AppLinks allow you to link applications to each other. Linking two applications allows you to access one application's properties from within the other.

In this tutorial we create an application that provides different datasets. The user can select one of these datasets and gets some information about it. This example can be used as a template for your own applications.

Dataset Analyzer

We have a Cloudgene application with the following folder structure:

dataset-analyzer
├── cloudgene.yaml
├── analyze-dataset.groovy
├── datasets
|   ├── dataset1.csv
|   └── dataset2.csv
└── README.md

In folder datasets we have different datasets and the user can select one of these datasets. For this propose, we create the following cloudgene.yaml file that contains one input parameter of type list:

name: dataset-analyzer
version: 1.0
workflow:
  steps:
    - name: Analyze Dataset
      type: groovy
      script: analyze-dataset.groovy
  inputs:
    - id: dataset
      description: Dataset
      type: list
      values:
        dataset1: Dataset 1
        dataset2: Dataset 2

To simulate our analysis pipeline, we create a small groovy script called analyze-dataset.groovy that prints the content of the selected dataset:

import genepi.hadoop.common.WorkflowContext

def run(WorkflowContext context) {

  def dataset = context.get("dataset");
  def directory = context.getWorkingDirectory();

  def content = '';
  switch(dataset){
    case 'dataset1':
      content = new File(directory + '/datasets/dataset1.csv').text;
      break;
    case 'dataset2':
      content = new File(directory + '/datasets/dataset2.csv').text;
      break;
  }

  // annalyze dataset. for demonstration: print content.
  context.ok(content);

  return true;
}

Next, we can install our application via the commandline and start a Cloudgene server:

cloudgene install dataset-analyzer dataset-analyzer/cloudgene.yaml
cloudgene server

If we open http://localhost:8082 in our broswer we can submit a new job for this application:

Depending on the selected dataset, we get different outputs:

The Cloudgene application works as expected. However, the following implementation has several shortcomings:

  • each time we want to add a new dataset, we have to adapt the groovy script and have to change values in cloudgene.yaml.
  • this leads to different cloudgene.yaml files for different setups on different servers
  • different cloudgene.yamlfiles are hard to deploy, update and to manage (especially in full automated Cloud deployments)

A more generic solution is to create one application for each dataset. This enables us to separate our application logic from our datasets. By using AppLinks we have the possibility to link them together and the cloudgene.yaml file of our application stays the same no matters how many new datasets we want to create.

Dataset Application

In a first step, we create an application for Dataset1 with the following folder structure:

dataset1
├── cloudgene.yaml
├── dataset1.csv
└── README.md

Next, we create the following cloudgene.yaml file for our dataset:

name: Dataset 1
version: 1.0
category: datasets
properties:
  filename: ${local_app_folder}/dataset1.csv

Our cloudgene.yaml file contains no workflow steps, instead we define properties. This properties can be used to share information from the Dataset application to the Dataset Analyzer application. In this case we define a property that contains the filename of our dataset. Moreover, we use the local_app_folder variable to get the correct directory and to avoid hard-coded paths.

The category property can be used to annotate our different Dataset applications. We give all datasets the same category so we can further filter them when we create an AppLinks list.

Finally, we can create an application for Dataset2 in the same way.

Dataset Analzer Application

Now, we can remove all datasets from Dataset-Analyzer application:

dataset-analyzer
├── cloudgene.yaml
├── analyze-dataset.groovy
└── README.md

In the cloudgene.yaml file we replace the input type with app_list and add a custom filter to display only applications that fall in category datasets:

name: dataset-analyzer
version: 1.0
workflow:
  steps:
    - name: Analyze Dataset
      type: groovy
      script: analyze-dataset.groovy
  inputs:
    - id: dataset
      description: Dataset
      type: app_list
      category: datasets

We adapt our groovy script to read the file from the location we shared through properties. With context.getData(param_id) we get access to all properties of the selected application:

import genepi.hadoop.common.WorkflowContext

def run(WorkflowContext context) {

  def datasetProperties = context.getData("dataset");
  def filename = datasetProperties.get('filename');
  def content = new File(filename).text;


  // annalyze dataset. for demonstration: print content.
  context.ok(content);

  return true;
}

We stop Cloudgene and restart it. When we try to submit a new job, we see that the list of datasets is empty.

Therefore, we stop Cloudgene again and install dataset1:

cloudgene install dataset1 dataset1/cloudgene.yaml
cloudgene server

Now, when we open the submit dialog of Dataset-Analyzer, we see Dataset1 and when we start the job we get the expected output.

  • screenshot

Advantages

  • we don't need to change our application when we want to add a new dataset
  • we have a simple and straightforward way to create new datasets
  • its very easy to share datasets and to install them (independently from the application)
  • admin has full control of permissions: access to specific datasets can be managed by user groups