Introduction
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What is Tapis?
What is the Streams API?
What is Abaco?
Objectives
Learn background about the Tapis’ Streams API and its organization.
Streaming Data Collection for Sensor Networks
Tapis
Tapis is a National Science Foundation (NSF) funded web-based API framework for securely managing computational workloads across infrastructures and institutions. The newest iteration, Tapis V3, has several new capabilities, including a multi-site security kernel, streaming data APIs, and high-level support for containerized applications. Tapis provides several APIs for data management and processing. We will be working with the streams API for handling stream style data typical of a sensor network.
Streams API
Tapis V3 introduces a Streams API for representing physical sensor networks and streaming and handling reported data. The API provides data-driven event support for real-time data coming through the API.
The API is structured into a hierarchy of organizational objects describing the sensor network being represented.
Projects
This is the broadest organizational structure available in the Streams API. This provides information describing the top level project data collection falls under.
Property | Description | Required |
---|---|---|
project_name | Project name | true |
description | Project description | false |
owner | string: Project owner | true |
pi | string: Project principle investigator (project lead) | true |
funding_resource | string: Funding source of the project | false |
project_url | string: URL for project online resources | false |
active | boolean: Current status of the project | false |
metadata | Object: Additional metadata for the project | false |
Sites
Sites are registered to projects and represent the physical location of the data collection instrumentation.
Property | Description | Required |
---|---|---|
project_id | string: ID of the project this site is associated with | true |
site_name | string: Project Site name | true |
site_id | string: Site ID | false |
description | string: Site description | true |
latitude | number: Site latitude | true |
longitude | number: Site longitude | true |
elevation | number: Site elevation | true |
metadata | Object: Additional metadata for the site | false |
Instruments
Instruments represent the physical instruments/sensors available at a given site. This is the hardware performing the data collection being monitored.
Property | Description | Required |
---|---|---|
project_id | string: ID of the project this instrument is associated with | true |
site_id | string: ID of the site this instrument is located at | true |
inst_name | string: Instrument name | false |
inst_id | string: Instrument ID | false |
inst_description | string: Instrument description | false |
topic_category_id | string: Instrument category | false |
tags | string[]: Array of instrument tags | false |
metadata | Object: Additional metadata for the isntrument | false |
Variables
Instruments will typically record properties of the environment they are placed in and report out these values. For example, a climatological sensor station may record levels of rainfall, air temperature, and relative humidity for its location. These would each be stored as a variable. The variable objects provide metadata for each of these measured properties.
Property | Description | Required |
---|---|---|
project_id | string: ID of the project this variable is associated with | true |
site_id | string: ID of the site this variable’s instrument is located at | true |
inst_id | string: ID of the instrument this variable is recorded by | true |
var_id | string: Variable ID | false |
var_name | string: Variable name | false |
units | string: Units variable is measured in | false |
measured_property_id | string: ID of the property measured by this variable type | false |
shortname | string: Short name for the variable | false |
metadata | Object: Additional metadata for the isntrument | false |
Measurements
The final structure in the Streams API is the actual measurments recorded by the instruments. Measurement objects contain all the recorded measurements indexed by variable and the datetime of their recording.
Measurement Creation
Property | Description | Required |
---|---|---|
inst_id | string: ID of the instrument this variable is recorded by | true |
var_id | Measurement[]: Measurment definitions | true |
Measurments Definition Object
{
datetime: string
[variable_id: string]: number
}
Measurements Return Object
{
instrument: Instrument
site: Site
measurments_in_file: number
[variable_id: string]: {
[datetime: string]: number
}
}
Abaco Containers
Tapis provides functions-as-a-service (FaaS) through Abaco, which is based on the actor model of concurrent computation and Docker. Users define computational primitives called actors with a Docker image, and Abaco assigns each actor a unique URL over which it can receive messages. Users send the actor a message by making an HTTP POST request to the URL. In response to an actor receiving a message, Abaco launches a container from the associated image, injecting the message into the container. Typically, the container execution is asynchronous from the message request, though Abaco does provide an endpoint for sending a message to an actor and blocking until the execution completes, providing synchronous execution semantics. Abaco maintains a queue of messages for each actor, and is capable of launching containers in parallel for a given actor when the actor is registered as stateless. The functions run with an authenticated context that allows them to make requests to other Tapis APIs to perform actions such as data transfers or job submissions.
Here is an actor we will use today for generating a plot of some of our mock-sensor data and uploading it to Jetstream Virtual Machine https://github.com/scleveland/workshop_actor
Streams Event Handling
The streams API includes the ability to register and kick off Abaco containers when a measurement hits a defined threshold, sending the measurement definition to the container. Messages sent to the container are stored in an environment variable called MSG. This will allow users to automatically receive notifications or start processing tasks for measurements in real time.
Key Points
Tapis is an API framework for managing computational workloads.
The Tapis streams API provides organizational structures for representing sensor networks and streaming real time data.
Abaco provides function-as-a-service functionality to Tapis by allowing containerized workflows to be kicked off via messages sent by a client.